Incremental execution makes MongoDB exports or migrations from MongoDB to SQL more manageable.
Let’s start with the big picture: State. Incremental execution records the state of a particular field as it passes through the Export or Migration process. That field has to be both unique and sortable. As each document passes through, the state is updated and continues until the Export or Migration has run out of records to process. The next time that the task is run, Studio 3T picks up on that recorded state and transparently uses it to reshape the Export/Migration’s underlying query and allow it to restart the process from the document after the one recorded.
Practical Incremental Execution
If that’s a little abstract, let’s look at an application for Incremental Execution: Batched exporting. This is where we want export data regularly in batches of, say, a thousand. To do this, we start with the export query; we only want to present a thousand documents to the Export, so we’ll create a query with a limit of a thousand.
In this example, we are exporting to a file and using Studio 3T placeholders to generate a new file each time we export so we can see the results.
Below the query, you’ll see the Incremental execution options:
Incremental execution is disabled because we have to configure it. Clicking on Configure incremental execution will bring up the configuration dialog. Here, we want to enable incremental execution and set the Resume Point options to start with a fresh state. We’ll use the default, _id, as the field to track.
Why _id? As a generated ObjectID it is pretty much guaranteed to be unique and sortable. It is also has an index by default. Click OK and we’ll be back at the export unit configuration. Now, something has changed:
This tells us we have configured incremental execution. But because this Export unit has not been run, there is no state for it to refer to.
Running An Export With Incremental Execution
If we run this unit, it will export the 1000 documents we specified in the query’s limit, and if we look over to the output directory, there will be one file, with 1000 documents in it.
If we look back at the Export unit’s configuration, we’ll notice another change:
This shows us the Export unit has retained state from the export run. When you run the Export unit next, it will use the next _id that follows. If we click Run again, a new export file of 1000 documents will appear and the state will update. As there’s only 5000 documents in the collection, another three runs will exhaust the collection. With no more data available, the next Export run will write a blank file as there are no new records. If you copy and paste some records into the collection, then run the export again, it’ll pick up on these new records and export just those.
Underneath The Options
What’s happening is a transparent modification of the query. Incremental execution starts an export or migration by sorting the collection by the specified field. This sorting allows incremental execution to locate the “first” document and then work through the collection in a known order.
The first time the unit is run it will start at the “first” document of the sorted collection. The next time it is run, there is now a state, and it’s here that incremental execution transparently adds in an and
with a { $gt: value }
clause to the query, where the value is the saved state. This allows the query to resume, skipping the previously processed documents and then start on the next batch.
Resetting The State
There are times when it may be necessary to reset the recorded state, back in time to a previous run. The incremental execution options have you covered. If you bring the options up after a few runs of the export, you’ll notice that the Select a resume point option is available. The drop down menu will list the last five runs, when they happened and what state they were in. Select one and on the next run, the Export or Migration will start from that state. Or, if you want to start all over again, select Start from a fresh state. This will clear the state and all the documents returned by the export unit’s query will be eligible for export.
Incremental Execution With Databases
While the example above uses files, to make it easy to see the results, Export and Migration tasks can also target databases and collections. Incremental execution works in exactly the same way with these targets. It does not take into account updated records in the source collection. Incremental Execution’s design focuses on tracking the new records in the database since the last time a task was run. Although a database may have updates to existing documents, incremental execution will not act as a source of updates.
Using Other Fields
In the examples, we’ve used the _id field because it is sortable, unique and used in every MongoDB collection. Incremental execution can use other fields, as long as they are sortable and unique, and ideally, immutable. If the field is mutable, that is can be changed, then the order of documents can change over time and documents previously exported could turn up again as eligible for export.