When building an aggregate
statement, you might need to work with one or more array fields and their individual elements. To help with this process, MongoDB offers a variety of expression operators that you can use as you refine your statement’s aggregation pipeline. Some of these operators are specific to arrays, and some can be used with multiple field types, although they can still be helpful when working with arrays.
In this section, you’ll learn how to use several of these operators to build an aggregate statement based on the customers
collection in the sales
database. The statement’s aggregation pipeline will include four stages. The first stage will filter the collection’s documents based on the values (elements) in the interests
array field. The second stage will deconstruct the array’s values so you can work with them individually. The third stage will group the array values and count the number of documents in each group. And the fourth stage will save the pipeline’s output to a new collection in a separate database.
By the end of this section, you will learn how to
- Use expression operators to filter input documents
- Unwind an array to create individual documents
- Group array values and generate a document count for each group
- Write pipeline results to a new collection
What you will need
- Access to a MongoDB Atlas cluster
- Access to the
customers
collection in thesales
database
Applying operators to an array
MongoDB provides several aggregate operators that can be useful when working with arrays in your aggregate
statements. In some cases, the operators are specific to array fields. In other cases, they can be used with multiple field types, although they can still benefit arrays, whether directly or indirectly. Three of these aggregate operators are $unwind
, $sortByCount
, and $merge
.
$unwind
The $unwind
operator lets you deconstruct the values in an array field. The operator outputs a document for each array element, replacing the array field with the individual array value. By default, the operator does not output a document if the array field value is null, missing, or contains an empty array, but you can override this behavior.
The following syntax shows the arguments (fields) available to the $unwind operator:
$unwind: { path: field, includeArrayIndex: string, preserveNullAndEmptyArrays: Boolean }
The operator takes three arguments, which you pass into the operator as a document with three fields:
- The
path
argument (mandatory) specifies the field path for the target array in the input documents. - The
includeArrayIndex
argument (optional) is the name of a new field for the element’s array index. - The
preserveNullAndEmptyArrays
argument (optional) determines whether a document is included in the output when the target array is null, missing, or contains an empty array.
At its simplest, the $unwind
operator requires only the path
field argument and its value, as shown in the following example:
db.customers.aggregate([ { $unwind: { path: "$interests" } } ])
In this statement, the $unwind
operator deconstructs the values in the interests array field. The field name must be enclosed in quotes and preceded with a dollar sign. A dollar sign must precede the field name whenever it can be interpreted as an expression.
When you run the $unwind
operator, MongoDB outputs a document for each element in the array, repeating this process for each input document. For example, if an input document includes the elements Technology
and Software
in the interests array, the $unwind
operator will generate two output documents to replace the source document. The new documents will each include a string field named interests
. The field’s value in the first document will be Technology
, and the field’s value in the second document will be Software
.
$sortByCount
After you deconstruct the array, you can use the $sortByCount
operator to group the documents by the array’s elements and to calculate the number of documents in each group. The operator also sorts the output data in descending order, based on the document counts. You can think of the $sortByCount
operator as a shortcut to creating a $group
stage and a $sort
stage.The syntax for the $sortByCount
operator is fairly straightforward:
$sortByCount: "expression"
The operator’s expression can be any expression except a document literal. In fact, the expression can be as simple as a single document field, as shown in the following example:
db.customers.aggregate([ { $unwind: { path: "$interests" } }, { $sortByCount: "$interests" } ])
In this case, the $sortByCount
operator will group the documents by the interests
field values, after they’re deconstructed by the $unwind
operator. The output will include two fields, _id
and count
, as shown in the following figure. Notice that the data is sorted in descending order, based on the count
values.
$merge
Once you’ve set up your pipeline to return the data you need, you can add a final stage based on the $merge
operator, which writes the pipeline’s results to a specified collection, either in the current database or a different database. The $merge
stage must be the last one in the pipeline.
The following syntax shows the components that make up the $merge operator:
$merge: { into: collection|{ db: database, coll: collection }, on: identifier field [, ...], whenMatched: replace|keepExisting|merge|fail|pipeline, let: variable: expression [, ...], whenNotMatched: insert|discard|fail }
The $merge
operator takes five arguments, which you pass into the operator as a document with five fields:
- The
into
argument (mandatory) specifies the target collection for the pipeline’s output. If it’s in the same database as the source collection, you need only specify the collection name. If the collection is in a different database, you must provide both the database and collection names, using the field names db and coll. MongoDB will create the collection if it doesn’t exist. - The
on
argument (optional) defines one or more fields to serve as a unique identifier for determining when a pipeline document matches an existing document in the target collection. - The
whenMatched
argument (optional) determines what action to take if a pipeline document matches an existing document in the target collection. - The
let
argument (optional) defines one or more variables that can be used by thewhenMatched
argument. - The
whenNotMatched
argument (optional) determines what action to take if a pipeline document does not match an existing document in the target collection.
To continue with the previous example, you can use the $merge
operator to save the output from the $sortByCount
stage to a collection in a different database, as shown in the following example:
db.customers.aggregate([ { $unwind: { path: "$interests" } }, { $sortByCount: "$interests" }, { $merge: { into: { db: "marketing", coll: "clients" }, whenMatched: "keepExisting" } } ])
The $merge
operator in this example saves the pipeline output to the clients
collection in the marketing database. The operator also includes the whenMatched
option, along with the keepExisting
value. If a document already exists in the target collection, MongoDB will retain that document rather than replace it with one from the pipeline output.
In addition to aggregate operators, you can use other types of operators when working with documents in the pipeline. For example, you can use the $or
and the $and
Boolean expression operators to set up specific conditions based on array values, as shown in the following example:
db.customers.aggregate([ { $match: { $and: [ { interests: { $in: [ "Technology" ] } }, { $or: [ { package: "Standard" }, { prior_support: true } ] } ] } } ])
The statement uses the $match
aggregate operator to filter the documents in the customers collection. For a document to be returned, the interests array must include the term Technology
and it must meet one of two other conditions: the package
field value must be Standard or the prior_support
field value must be true
.
To implement this logic, the operator’s expression starts with the $and
operator, which defines two conditions. Both conditions must evaluate to true
for a document to be included in the stage’s output. The first condition uses the $in
operator to verify whether the interests
array includes the term Technology
. If the term exists, the condition will evaluate to true. The $in
operator is used specifically to determine whether an array contains a certain value.
The second $and
condition starts with the $or
operator, which defines its own two conditions. The first of these checks whether the package
field equals Standard
, and the second one checks whether the prior_support
field equals true
. If either of these conditions evaluate to true, the second $and
condition evaluates to true
. If both the $and
conditions evaluate to true
, the document will be included in the stage’s output.