In this post, we’ll show you the simplest way of writing, debugging, and running MongoDB map-reduce jobs using Studio 3T’s Map-Reduce screen.
MongoDB Map-Reduce vs Aggregation Pipeline
MongoDB’s Map-Reduce is the flexible cousin of the Aggregation Pipeline.
In general, it works by taking the data through two stages:
- a map stage that processes each document and emits one or more objects for each input document
- a reduce stage that combines emitted objects from the output of the map operation
The main advantage over the Aggregation Pipeline is that Map-Reduce may use arbitrary JavaScript for each stage enabling otherwise impossible operations though at the expense of lower performance (potentially higher execution times). You can read more about it in MongoDB’s reference documentation.
{ "_id" : 592341, "tags" : [ "cats", "kittens", "travel" ] }
A map-reduce example
In this example, our objective is to group images by tag except for those which include the “work” tag.
To achieve this, we will need to write a Map-Reduce job that will:
- Exclude all images which include the “work” tag.
- Have the
map()
function emit the image id for each of the tags as key. - Have the
reduce()
function combine the image ids for each tag.
Let us start by opening Studio 3T’s new Map-Reduce screen by selecting the Open Map-Reduce option from the context menu:
Filtering the input data
Clicking on the “Input data” tab and then the “Preview Input” toolbar button shows us a preview of the collection data. It is here that we can shape the data fed into the Map-Reduce job and omit any image tagged “work”. This is achieved by the following query
{ "tags": { $ne: "work" } }
We can inspect the data that will be fed into the map function by clicking the “Preview Output” toolbar button.
map() function
For the second step, we move to the “map()” tab.
In this tab we want to specify the function responsible for emitting one or more key-pairs for each document. The following function gets the job done:
function () { for (var index in this.tags) { emit ( this.tags[index] , this._id ); } }
We can sample the map()
function’s output by clicking the preview button, verifying that this function was successful. The preview feature is extremely useful, in particular before submitting jobs that could take hours to run. The “map() sample output” tab gives us a detailed breakdown of how our map()
function operates, showing the emitted key/value pairs as well as their original document _id
.
reduce() function
Studio 3T’s default implementation of the reduce()
function takes care of the rest:
function (key, values) { var reducedValue = "" + values; return reducedValue; }
Again, the Preview Output toolbar button will let us verify that our function is successful. If we were writing a more complex reduce()
function or trying to debug what was being fed in, we could sample the input by clicking on the Preview Input button. This gives us a few of the key-value pairs that are emitted and then reduced.
finalize() function
MongoDB allows for a final stage to a Map-Reduce job for doing some final processing with use of a finalize()
function. Let’s use this just so the output is easier to read:
function (key, reducedValue) { var finalValue = "tag '" + key + "' was found in images: " + reducedValue; return finalValue; }
A quick inspection of finalize()
’s sample output and we are ready to submit a job that will process all of the data.
Running the map-reduce job
Now that we have set all the parameters of the job, and are sure that all our functions run as intended, we can submit the Map-Reduce job to run through the whole collection dataset by clicking the “Execute” button on the toolbar.
This action will open a new tab which will contain the results of the job when it is finished:
Clicking on Show details
will bring up a dialog showing execution statistics as well as a configuration summary for this job.
Epilogue
Now that the Map-Reduce job is finished, we can save all this work as a script. The format is 100% JavaScript code, which allows the saved file to be run in IntelliShell or even the basic mongo shell and will produce identical results.
// *** 3T Software Labs, MongoChef: MapReduce Job **** // Variable for db var __3t_mongochef_db = "exam"; // Variable for map var __3t_mongochef_map = function () { for (var index in this.tags) { emit ( this.tags[index] , this._id ); } } ; // Variable for reduce var __3t_mongochef_reduce = function (key, values) { var reducedValue = "" + values; return reducedValue; }; // Variable for finalize var __3t_mongochef_finalize = function (key, reducedValue) { var finalValue = "tag '" + key + "' was found in images: " + reducedValue; return finalValue; } ; db.runCommand({ mapReduce: "images", map: __3t_mongochef_map, reduce: __3t_mongochef_reduce, finalize: __3t_mongochef_finalize, out: { "inline" : 1}, query: { "tags": { $ne: "work" } }, sort: { }, inputDB: "exam", });
Do you have an existing script that you’ve been working with already? No problem, Studio 3T will load it into the Map-Reduce screen, just click on the “Open Map-Reduce File” toolbar button, select the file and there you have it!
Once you’re done running MongoDB map-reduce jobs, why not keep the momentum by learning how to build MongoDB aggregation queries, and discover our MongoDB shell integration, IntelliShell.