What is $setWindowFields
? Well, $setWindowFields
is a new Aggregation stage in MongoDB 5.0. In the most recent release of Studio 3T (2022.2), we added support for it in the Aggregation Editor. Why is it so neat? What it does is completely change what you can do with aggregations, literally opening a window on your documents for aggregated calculations.
The problem with aggregation
Let’s start with classic Aggregation. That takes each document matched and feeds it into a pipeline for processing. Each stage of the pipeline takes each document and looks at it on its own, and then produces a new document or documents. It’s pretty much contextless.
That’s why the stage $group
is so important: It’s one of the few occasions where documents are gathered together and evaluated as a set of related things. But even with $group, you have to think in terms of accumulators which perform an operation per document in the group. If you want to compare a document with the previous one in the group, you’re out of luck.
A window on your documents
Enter $setWindowFields
. It’s a kind of super group except it does not boil the data down into accumulated values and leave only that data in the pipeline’s documents. Instead, it adds the ongoing calculations to the documents passing through the pipeline. That’s why it’s a “set” command. How is this useful?
Consider, say, a set of temperature sensors where each room’s sensor is read hourly:
{
"_id" : "621ddee2892dfb1bfd714502",
"room" : "Bedroom1",
"when" : "2022-01-01T01:06:00.000+0000",
"temp" : 38
}
{
"_id" : "621ddee2892dfb1bfd714503",
"room" : "Bedroom1",
"when" : "2022-01-01T02:06:00.000+0000",
"temp" : 36
}
{
"_id" : "621ddee2892dfb1bfd714504",
"room" : "Bedroom1",
"when" : "2022-01-01T03:06:00.000+0000",
"temp" : 35
}
{
"_id" : "621ddee2892dfb1bfd714505",
"room" : "Bedroom1",
"when" : "2022-01-01T04:06:00.000+0000",
"temp" : 31
}
Building a $setWindowFields stage
Now we want to see, for any temperature whether it was above or below that day’s average. The question is then, what was the day’s average. Let’s set up a $setWindowFields stage for that with these settings:
{
"partitionBy" : { room:"$room", day: { $dayOfYear: "$when" } },
"output" : {
"averaged" : {
"$avg" : "$temp",
"window" : {
documents: [ "unbounded", "unbounded" ]
}
}
}
}
There’s a lot to parse here, so let’s take it line by line:
"partitionBy":{ room:"$room", day:{ $dayOfYear:"$when"}},
This says using the composite object of the room and the day of the year, split up the data. So each set of documents being handled is for a specific room on a specific day.
"output":{
In the output section, we describe the fields we’re going to add and how we are going to calculate those fields.
"averaged":{
The first field (and in this case only field) we create is called averaged.
"$avg" : "$temp",
And that will be calculated by taking the average of the temperature field. Here’s where things get interesting, because you may be wondering, which document’s temperature fields. Well, earlier we partitioned the documents by room and day, so we want to take all the documents in the partition. We tell $setWindowFields that the window we want to work with is all the documents in the partition.
"window" : {
documents: [ "unbounded", "unbounded" ]
}
And that’s it. If we run this stage we will see that the documents in the pipeline now have an averaged field with that rooms average temperature that day:
{
"_id" : "621ddee2892dfb1bfd714502",
"room" : "Bedroom1",
"when" : "2022-01-01T01:06:00.000+0000",
"temp" : 38,
"averaged" : 24.26086956521739
}
{
"_id" : "621ddee2892dfb1bfd714503",
"room" : "Bedroom1",
"when" : "2022-01-01T02:06:00.000+0000",
"temp" : 36,
"averaged" : 24.26086956521739
}
{
"_id" : "621ddee2892dfb1bfd714504",
"room" : "Bedroom1",
"when" : "2022-01-01T03:06:00.000+0000",
"temp" : 35,
"averaged" : 24.26086956521739
}
{
"_id" : "621ddee2892dfb1bfd714505",
"room" : "Bedroom1",
"when" : "2022-01-01T04:06:00.000+0000",
"temp" : 31,
"averaged" : 24.26086956521739
}
Follow that up with a $match
stage that compares the two fields, temp and averaged:
{
"$expr" : {
"$gte" : [
"$temp",
"$averaged"
]
}
}
And the result is a list of all the times any room’s temperature was over that room’s average for that day.
Sliding the window open
But say we don’t want to use the average for the day and we want to have a sliding window of the last 5 hours and next 5 hours as the average. Here’s another $setWindowFields
stage that does just that:
{
partitionBy : "$room",
sortBy : {
when : 1,
},
output : {
averaged : {
$avg : "$temp",
window : {
range : [
-5,
5
],
unit : "hour"
}
}
}
}
This time we create the partitions by room, we don’t need to partition by day. We sort that data by when the reading was taken. That sortBy
section is new and is important as it provides the raw data for the window
. The output section is basically the same, apart from the window
.
window : {
range : [
-5,
5
],
unit : "hour"
}
This time we want to take a relative range of hours; negative is hours before, positive is hours after. How do we know this refers to hours? The unit
field is mandatory where we are working with a time in the sortBy
section. So, setting the unit
to "hours"
means that now the range is -5 hours to +5 hours. Documents with a when
value within this range will be used to calculate the rolling average temperature.
You can use this technique, with sorted times and a range of time units to move a flexible time-based window over your documents.
Documents by Relativity
In our first example, we used the window documents setting, but we set it to unbounded to get whole days of data into our calculation. But you can be more precise, and relative, with that setting. Take this example:
{
partitionBy : "$room",
sortBy : {
when : 1,
},
output : {
coldtemp : {
$min : "$temp",
window : {
documents : [
-1,
0
]
}
}
}
}
Here we have a window
documents
set to [ -1, 0 ]. This is a rolling window which includes the preceding document and the current document. In this case we are looking for the lowest temperature between the two, using $min
, and saving that as the coldtemp
field. Another $match stage can now let us work out which hours saw the temperature rise:
$match {
$expr: { $gt: [ "$temp", "$coldtemp" ] }
}
And we’ve just compared values between two documents, in order and it all runs on the database server.
Going Further with $setWindowFields
We’ve only scratched the surface of $setWindowFields
here. There’s a world of operators that can be applied to the windowed document fields, from ranking them to calculating exponential moving averages, derivatives, integrals, covariance and standard deviation. You can learn more about that in the MongoDB $setWindowFields
documentation.
With $setWindowFields, your windows can span relative documents or relative times, pulling together sequences of data you previously had to write code to extract and process. It’s a great addition to MongoDB’s assembly of stages, and it’s now available in Studio 3T’s aggregation editor – if you are running MongoDB 5.0 or later as it’s only displayed as an option where your database is capable.