MongoDB - resizing the oplog of replica sets

03.07.2016 3 min read

MongoDB replica sets are used to provide redundancy for MongoDB. For the data replication, MongoDB uses the so-called oplog. This oplog is a separate collection within the local database. It is capped, so the collection cannot get bigger than the configured size. This limits the amount of entries that can be stored within the collection. If the collection is full, the oldest entries will be removed. When an application fills the oplog within 35 minutes, the replica set will get out of sync after 34 minutes and 59 seconds in case of a disaster event. If this happens, all one can do is sync the node by hand (e.g., copying the data directory, triggering a full sync later when the system load drops to an acceptable dimension).

To avoid this situation, one can check the actual size of the oplog with the command:

db.getReplicationInfo()

The output of this command gives insight into the size and the time difference between the youngest and oldest entry of the oplog.

db:SECONDARY> db.getReplicationInfo()
{
"logSizeMB" : 12288,
"usedMB" : 12268.66,
"timeDiff" : 49881,
"timeDiffHours" : 13.86,
"tFirst" : "Mon May 30 2016 20:50:05 GMT+0200 (CEST)",
"tLast" : "Tue May 31 2016 10:41:26 GMT+0200 (CEST)",
"now" : "Tue May 31 2016 10:41:26 GMT+0200 (CEST)"
}

The sample shows that the oplog of the replica set above can represent nearly 14 hours of operations. If a node goes down at 10 p.m., we could recover itself until midnight.

You’re done here if your application has a stable load behavior and 14 hours of ‘recover from failure’ time are sufficient for your use case. Whereas your application creates different amounts of data to other times within the day, the time difference represented by the oplog may change accordingly. The example above will only allow a difference of 3 hours while under maximum load. So a recommendation would be to monitor the oplog size over the day. If the time represented by oplog gets too small (depending on your requirements), you’ll need to increase its size. This is a manual process described in the official documentation. Alternatively, you can use the following script (that is based on the tutorial) in a round-robin approach on every server:

The script combines steps from the tutorial and requires an explicit input of ‘y’ followed by the return key for the sake of verification of each step and to evaluate the possible consequences. Every different input stops the execution of the script. Be aware that no recovery nor any resume functionality is included. So, cleanup, recovery, and restart need to be done by hand.

This article is a translation from German of my original article posted here.