Q: I don’t get unwind in aggregation – why do I need it?
Welcome to the latest Ask Manatees Anything – #Studio3T_AMA. Want to know how to use Studio 3T to make your MongoDB life better? Just tweet and use #Studio3T_AMA or email ama@studio3t.com.
A: Aggregation is all about documents moving from stage to stage on a virtual production pipeline which whittles the input down from a mass of documents to a completely transformed set of data. The Studio 3T Aggregation Editor makes this easier by making the pipeline more manageable.
But it gets all a bit counter-intuitive when you meet the unwind
stage. This stage takes a document and turns it into multiple copies of almost the same document. Why, though?
Unwind Basics
Well, unwind
is all about arrays. A document in the pipeline with an array per field is pretty much un-processable; you can’t tell the pipeline to stop and have a dig around in the array, which could be any size. Let’s imagine a document:
{
"range" : "Cheese",
"supplier" : "Cheese Shop",
"types" : [
{
"name" : "Red Leicester",
"stock" : 5.0
},
{
"name" : "Tilsit",
"stock" : 2.0
},
{
"name" : "Caerphilly",
"stock" : 10.0
},
{
"name" : "Bel Paese",
"stock" : 1.0
},
{
"name" : "Red Windsor",
"stock" : 5.0
},
{
"name" : "Stilton",
"stock" : 10.0
}
],
"_id" : ObjectId("61715908eda5d43b15939c09")
}
We have a product range, supplier, and an array of types with stock levels. Aggregation doesn’t know how to stop and dig through an array like that, so we want to break it down into smaller aggregation friendly documents. And that’s what unwind
does. You point unwind at an array in a document and it generates a document for each entry in the array. The resulting documents are simply the original document with the array replaced by each entry. Let’s unwind our cheesy example with a unwind
aggregation stage that is just this:
{
path: "$types"
}
So, unwind
‘s path
parameter is the array we want to work with. If we feed in our document above, we get these documents out:
{
"_id" : ObjectId("61715908eda5d43b15939c09"),
"range" : "Cheese",
"supplier" : "Cheese Shop",
"types" : {
"name" : "Red Leicester",
"stock" : 5.0
}
}
{
"_id" : ObjectId("61715908eda5d43b15939c09"),
"range" : "Cheese",
"supplier" : "Cheese Shop",
"types" : {
"name" : "Tilsit",
"stock" : 2.0
}
}
{
"_id" : ObjectId("61715908eda5d43b15939c09"),
"range" : "Cheese",
"supplier" : "Cheese Shop",
"types" : {
"name" : "Caerphilly",
"stock" : 10.0
}
}
{
"_id" : ObjectId("61715908eda5d43b15939c09"),
"range" : "Cheese",
"supplier" : "Cheese Shop",
"types" : {
"name" : "Bel Paese",
"stock" : 1.0
}
}
{
"_id" : ObjectId("61715908eda5d43b15939c09"),
"range" : "Cheese",
"supplier" : "Cheese Shop",
"types" : {
"name" : "Red Windsor",
"stock" : 5.0
}
}
{
"_id" : ObjectId("61715908eda5d43b15939c09"),
"range" : "Cheese",
"supplier" : "Cheese Shop",
"types" : {
"name" : "Stilton",
"stock" : 10.0
}
}
Each type of cheese now has its own document. And now we can bring aggregations strengths to the problems. Say we want the total stock of the “Cheese” range (and any other range). We can have another aggregation stage which grouped up the documents by range and then summed up the stock numbers. That’ll be a group
stage with these settings:
{
_id: "$range",
totalStock: { $sum: "$types.stock" }
}
Which produces:
{
"_id" : "Cheese",
"totalStock": 33.0
}
So, by unwinding an array to create multiple documents, it makes it much easier for aggregation to do its thing, aggregating values. This all takes place in the aggregation pipeline, and (usually) not on disk so creating lots of small documents isn’t an issue.
Think of unwind
like an iron, smoothing out lumpy arrays into easier-to-process documents.
Preserving Array Indexes Through Unwind
That’s the essential part of unwind
. There is, of course, more to it. Arrays have an order and unwinding erases that order from the date. To preserve that information, the unwind
stage has a parameter includeArrayIndex
. This lets you specify the name of a field which will be added with the array index number, to the generated document. Practically, if our unwind
stage looks like this:
{
path: "$types",
includeArrayIndex: "typeIndex"
}
Then the resulting documents will include a typeIndex
field:
{
"_id" : ObjectId("61715908eda5d43b15939c09"),
"range" : "Cheese",
"supplier" : "Cheese Shop",
"types" : {
"name" : "Red Leicester",
"stock" : 5.0
},
"typeIndex" : NumberLong(0)
}
{
"_id" : ObjectId("61715908eda5d43b15939c09"),
"range" : "Cheese",
"supplier" : "Cheese Shop",
"types" : {
"name" : "Tilsit",
"stock" : 2.0
},
"typeIndex" : NumberLong(1)
}
{
"_id" : ObjectId("61715908eda5d43b15939c09"),
"range" : "Cheese",
"supplier" : "Cheese Shop",
"types" : {
"name" : "Caerphilly",
"stock" : 10.0
},
"typeIndex" : NumberLong(2)
}
…
What’s NULL got to do with it?
There’s one thing we haven’t mentioned: what happens if there’s no field that matches the path or the path exists, but has a null value? Or if there’s an empty array present? The default answer is that no document is generated at all, because there’s nothing to work with.
And that’s why there’s preserveNullAndEmptyArrays
as a third optional parameter for unwind
. Set this to true and unwind
will generate a document, with a null value where the array value would be, and a null where the index would be.
This avoids losing documents completely but you have to make sure the rest of your aggregation allows for these null values.
Be Kind, unwind
As you can see, unwind
is a great way to pull arrays out from within your documents so aggregation can work with the data inside them. Paired with group
, it offers the ability to radically reshape your data so you can extract new insight from it.