In this exercise, you’ll add the second stage to the pipeline. This stage is based on the $group aggregate operator, which lets you group the documents in the pipeline based on a specific field.
To group the documents in the aggregation pipeline
- On the IntelliShell tab, ensure that the aggregate statement you created in the previous exercise is still entered at the command prompt.
- After the closing curly brace for the first stage, type a comma and insert a new line.
- On the new line after the first stage, add the following stage to the pipeline:
{ "$group": { "_id": "$address.state", "total": { "$sum": "$transactions" } } }
This stage uses the $group
operator to group the documents by the address.state
field and calculate the total number of transactions for each state. The process is similar to the example you saw in the introduction, except that the data is being grouped by states rather than cities.
You’re not limited to using the $sum
accumulator operator for your comparisons when working with the $group
operator. For example, you might use the $avg
operator to find the average number of transactions per state or the $max
operator to return the highest number of transactions per state.
For now, however, we’ll stick with the $sum
operator. With the second stage added, the aggregate
statement should look like the following code:
db.customers.aggregate( [ { "$match": { "dob": { "$lt": ISODate("1970-01-01T00:00:00.000Z") } } }, { "$group": { "_id": "$address.state", "total": { "$sum": "$transactions" } } } ] );
At this point, it’s a good idea to step back and look at what’s happening so far in your aggregation pipeline to ensure that you understand the workflow and that it’s doing what you expect:
- The first stage in the pipeline starts with all the documents in the customers collection and filters out any documents whose dob value is not before 1970.
- The second stage starts with the filtered data set and processes it further, grouping the documents by state and providing the total number of transactions for each state.
The important point to take out of all this is that each time you add a stage, it builds on the results from the previous stage. The aggregate pipeline defines a linear process that moves in one direction, with each stage building off the preceding stage.
- On the IntelliShell toolbar, click the Execute button. Studio 3T runs the aggregate statement and displays the results in the lower pane, as shown in the following figure.
The statement should now return only 45 rows and include only two fields: _id
and total
. The _id
field is the default name given to the list of grouped states. In a later section in this course, you’ll learn how to change the field’s name to something more intuitive. The order these states appear in will vary, because no sorting has been applied to the aggregated data. To fix that, we move on to the next exercise.