Skip to content
Studio 3T - The professional GUI, IDE and client for MongoDB
  • Tools
    • Aggregation Editor
    • IntelliShell
    • Visual Query Builder
    • Export Wizard
    • Import Wizard
    • Query Code
    • SQL Query
    • Connect
    • Schema Explorer
    • Compare
    • SQL ⇔ MongoDB Migration
    • Data Masking
    • Task Scheduler
    • Reschema
    • More Tools and Features
  • Solutions
  • Resources
    • Knowledge Base
    • MongoDB Tutorials & Courses
    • Tool/Feature Documentation
    • Blog
    • Community
    • Testimonials
    • Whitepapers
    • Reports
  • Contact us
    • Contact
    • Sales Support
    • Feedback and Support
    • Careers
    • About Us
  • Store
    • Buy Now
    • Preferred Resellers
    • Team Pricing
  • Download
  • My 3T
search

Studio 3T Knowledge Base

  • Documentation
  • Tutorials
  • Workshops
Take the fastest route to learning MongoDB. Cover the basics in two hours with MongoDB 101, no registration required.
Start the free course

Extending MongoDB Aggregation with the Bucket and Facet Stages

Posted on: 03/09/2018 (last updated: 04/08/2021) by Phil Factor

With any type of database, you will sometimes face a problem when you are reporting aggregations.

Maybe you are reporting on trading performance and want to break it down a number of different ways to spot trends or anomalies. You have to do some preparation work on the data and then run several reports, or aggregations on that intermediate collection or table that you’ve prepared.

With SQL, you have to prepare a temporary table and then do your repeated aggregation GROUP BY queries on that table. Sometimes you can do it with a virtual table in a Common Table Expression. Occasionally you just shrug and process the preparation work every time, using maybe a table-valued function or view.

In a previous article, I’ve shown how to make a temporary collection for this purpose in MongoDB, but there is a much neater and faster way of doing it. It is the $facet stage of an aggregation.

The $facet stage

To understand how it works, and how it can be so useful, it is far easier to try things out on some very simple real data.

In a previous article, How to Clean Up MongoDB Data Using Aggregation and Regex, I created such a practice database. I attached the database to that article so you can play along. The database just holds the rainfall data for Southeast England from 1931. Here is a sample of the individual documents. I showed some of the basic aggregations that one could do.

The MongoDB stage $facet lets you run several independent pipelines within the stage of a pipeline, all using the same data. This means that you can run several aggregations with the same preliminary stages, and successive stages.

It is called ‘facet’ because it is also used in commerce to guide the analysis of sales by a range of factors or ‘facets’.

The commonest time you need to do something like this with a database is when you need to have column and row totals and grand totals. They are really separate calculations, but you need them all in one collection, and you’d expect to achieve it in one calculation.

The advantage of this stage is that input documents are passed to the $facet stage only once.

Here we not only do the calculations, but do it for the three different pivot tables, for rainfall totals, average rainfall and max rainfall.

db.getCollection("DailyRainfall").aggregate(

    // Pipeline
    [
        // Stage 1
        {
            $addFields: {
                "Month" : {
                    "$month" : "$Date"
                }, 
                "Year" : {
                    "$year" : "$Date"
                }, 
                "MonthYear" : {
                    "$dateToString" : {
                        "format" : "%m:%Y", 
                        "date" : "$Date"
                    }
                }
            }
        },

        // Stage 2
        {
            $facet: {// grand totals for the entire series
                 "Totals": [
                 {
                        "$group" : {
                            "_id" : null, 
                            "averageRain" : {
                                "$avg" : "$Rainfall"
                            }, 
                            "maxRain" : {
                                "$max" : "$Rainfall"
                            }, 
                            "totalRain" : {
                                "$sum" : "$Rainfall"
                            }
                        }
                 }
                 ],
            // column totals getting aggregation fo each month   
                 "Monthly": [ 
                 {
                        "$group" : {
                            "_id" : "$Month", 
                            "averageRain" : {
                                "$avg" : "$Rainfall"
                            },"Rainfall" : {
                                "$sum" : "$Rainfall"
                            },  
                            "maxRain" : {
                                "$max" : "$Rainfall"
                            }
                        }
                 },{"$sort" : {"_id":1}}
                 ],
            // get the aggreegation for Rain by year (row total)    
                 "Yearly": [ 
                 {
                        "$group" : {
                            "_id" : "$Year", 
                            "averageRain" : {
                                "$avg" : "$Rainfall"
                            },"Rainfall" : {
                                "$sum" : "$Rainfall"
                            },  
                            "maxRain" : {
                                "$max" : "$Rainfall"
                            }
                        }
                 },{"$sort" : {"_id":1}}
                 ],
            // Cell level data    
                 "MonthYear": [ 
                 {
                        "$group" : {
                            "_id" : "$MonthYear", 
                            "averageRain" : {
                                "$avg" : "$Rainfall"
                            },"Rainfall" : {
                                "$sum" : "$Rainfall"
                            },  
                            "maxRain" : {
                                "$max" : "$Rainfall"
                            }
                        }
                 },{"$sort" : {"_id":1}}
                 ]
                  
            }
        },

    ]

This will seem odd at first glance. We seem to have four separate group stages within a single stage. Actually, they are pipelines, as we’ve shown by adding a $sort substage.

The first stage did the preparatory work to allow us to do simple groupings and we then forked into four different pipelines that processed the same input set of documents.

In the single output document, each sub-pipeline has its own field where its results are stored as an array of documents. We could have ended up with a single pipeline that tidied it all up and presented it in a more human-oriented way. This $facet stage is powerful.

The $bucket stage

The $bucket aggregation can provide some features that are hard to do any other way. The term ‘bucket’ is from histogram theory, and the $bucket is ideal for doing frequency distributions.

Here is a typical bucket histogram used on our rainfall data:

db.getCollection("DailyRainfall").aggregate(

    // Pipeline
    [
        // Stage 1
        {
            $bucket: {
                groupBy: "$Rainfall", // group by daily rainfall
                boundaries: [ 0,5,10,15,20,25,30,35,40,45 ],
                default: "Above 45", // just in case!
                output: { count: { $sum: 1 } } // the number in each category
            }
        },

    ]

);

You can see that there were 27481 days with under 5mm of rain, 2879 days where it rained lightly between 5 and 10 mm, 943 days with between 10 and 15 mm of rain, and so on to 4 days with over 40 mm of rain.

There is a variant of the $bucket called $bucketAuto where MongoDB attempts to change the size of the buckets to fit the distribution of the values.

This is easiest demonstrated:

This tells you that it rains roughly two days out of three in south-eastern Britain, but for two days out of three you might miss it unless you were staring out of the window a lot. In fact it is one day in three that it rains in southeast Britain as we can quickly prove:

We can use the MongoDB facet and bucket stages together. In fact you can use almost all the pipeline stages with the $facet stage within a sub-pipeline. Here we do three analyses, including our precious example, that are tricky to do any other way.

db.getCollection("DailyRainfall").aggregate(

    // Pipeline
    [
        // Stage 1
        {
            $addFields: {
               Month: { $month: "$Date" },Year: { $year: "$Date" }
            }
        },

        // Stage 2
        {
            $facet: {// average rainfall and max rainfall of the series
                 // the weather by the quarter      
                 "ByQuarter": [
                     {
                      $bucket: {
                        groupBy: "$Month",
                        boundaries: [ 0,4,7,10,13],
                        default: "Other",
                        output: {
                          "TotalRainfall": { $sum: "$Rainfall" }
                        }
                      }
                    }
                  ],
             //The weather for each decade
                       "ByDecade": [
                     {
                      $bucket: {
                        groupBy: "$Year",
                        boundaries: [1930, 1940, 1950, 1960, 1970, 1980, 1990, 2000, 2010, 2020],
                        default: "later",
                        output: {
                          "TotalRainfall": {$sum: "$Rainfall" },
                          "MaxRainfall": {$max: "$Rainfall" },
                          "AvgRainfall": {$avg: "$Rainfall" }
                        }
                      }
                    }
                  ],
             //The histogram for weather
                       "Rainfall Frequency": [
                     {
                      $bucket: {
                            groupBy: "$Rainfall", // group by daily rainfall
                            boundaries: [0, 5, 10, 15, 20, 25, 30, 35, 40, 45],
                            default: "Above 45", // just in case!
                            output: { count: { $sum: 1 } } // the number in each category
                        }
                     }
                  ]
                  
            }
        },

    ]


);

The beauty of the $bucket becomes apparent when you have a lot of independent variables, and you wish to explore the relationships between them, and investigate their distribution. There is a lot of useful information that can be explored, and it makes a lot of sense to do the pipelines in parallel.

When developing a $facet stage, I find it much easier to create each pipeline independently, with a copy of the preceding stages. This means that each sub-pipeline can be written and tested in the Aggregation Editor of Studio 3T, and then spliced into the $facet.

Conclusions

The MongoDB facet stage is best considered a general way of making aggregations more efficient by allowing the same intermediate set of documents to be processed by a number of pipelines before the results are then fed back into the original pipeline. It means that the initial stage is done only once. Where the initial stage takes a lot of effort to construct, this makes sense.


How helpful was this article?
This article was hideous
This article was bad
This article was ok
This article was good
This article was great
Thank you for your feedback!

About The Author

Phil Factor

Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications. Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

Article navigation

Related articles

  • MongoDB Aggregation Made Easy with 3T’s Aggregation Editor
  • What’s New in Studio 3T 2021.9 | Popup Visual Query Builder and Aggregation Index Hints
  • Lesson 3, Exercise 3: Adding and removing fields in the aggregation pipeline
  • The Beginner’s Guide to MongoDB Aggregation (With Exercise)
  • 3 Best MongoDB Aggregation Pipeline Builders

Studio 3T

MongoDB Enterprise Certified Technology PartnerSince 2014, 3T has been helping thousands of MongoDB developers and administrators with their everyday jobs by providing the finest MongoDB tools on the market. We guarantee the best compatibility with current and legacy releases of MongoDB, continue to deliver new features with every new software release, and provide high quality support.

Find us on FacebookFind us on TwitterFind us on YouTubeFind us on LinkedIn

Education

  • Free MongoDB Tutorials
  • Connect to MongoDB
  • Connect to MongoDB Atlas
  • Import Data to MongoDB
  • Export MongoDB Data
  • Build Aggregation Queries
  • Query MongoDB with SQL
  • Migrate from SQL to MongoDB

Resources

  • Feedback and Support
  • Sales Support
  • Knowledge Base
  • FAQ
  • Reports
  • White Papers
  • Testimonials
  • Discounts

Company

  • About Us
  • Blog
  • Careers
  • Legal
  • Press
  • Privacy Policy
  • EULA

© 2023 3T Software Labs Ltd. All rights reserved.

  • Privacy Policy
  • Cookie settings
  • Impressum

We value your privacy

With your consent, we and third-party providers use cookies and similar technologies on our website to analyse your use of our site for market research or advertising purposes ("analytics and marketing") and to provide you with additional functions (“functional”). This may result in the creation of pseudonymous usage profiles and the transfer of personal data to third countries, including the USA, which may have no adequate level of protection for the processing of personal data.

By clicking “Accept all”, you consent to the storage of cookies and the processing of personal data for these purposes, including any transfers to third countries. By clicking on “Decline all”, you do not give your consent and we will only store cookies that are necessary for our website. You can customize the cookies we store on your device or change your selection at any time - thus also revoking your consent with effect for the future - under “Manage Cookies”, or “Cookie Settings” at the bottom of the page. You can find further information in our Privacy Policy.
Accept all
Decline all
Manage cookies
✕

Privacy Preference Center

With your consent, we and third-party providers use cookies and similar technologies on our website to analyse your use of our site for market research or advertising purposes ("analytics and marketing") and to provide you with additional functions (“functional”). This may result in the creation of pseudonymous usage profiles and the transfer of personal data to third countries, including the USA, which may have no adequate level of protection for the processing of personal data. Please choose for which purposes you wish to give us your consent and store your preferences by clicking on “Accept selected”. You can find further information in our Privacy Policy.

Accept all cookies

Manage consent preferences

Essential cookies are strictly necessary to provide an online service such as our website or a service on our website which you have requested. The website or service will not work without them.

Performance cookies allow us to collect information such as number of visits and sources of traffic. This information is used in aggregate form to help us understand how our websites are being used, allowing us to improve both our website’s performance and your experience.

Google Analytics

Google Ads

Bing Ads

Facebook

LinkedIn

Quora

Hotjar

Functional cookies collect information about your preferences and choices and make using the website a lot easier and more relevant. Without these cookies, some of the site functionality may not work as intended.

HubSpot

Social media cookies are cookies used to share user behaviour information with a third-party social media platform. They may consequently effect how social media sites present you with information in the future.

Accept selected