Skip to content
Studio 3T - The professional GUI, IDE and client for MongoDB
  • Tools
    • Aggregation Editor
    • IntelliShell
    • Visual Query Builder
    • Export Wizard
    • Import Wizard
    • Query Code
    • SQL Query
    • Connect
    • Schema Explorer
    • Compare
    • SQL ⇔ MongoDB Migration
    • Data Masking
    • Task Scheduler
    • Reschema
    • More Tools and Features
  • Solutions
  • Resources
    • Knowledge Base
    • MongoDB Tutorials & Courses
    • Tool/Feature Documentation
    • Blog
    • Community
    • Testimonials
    • Whitepapers
    • Reports
  • Contact us
    • Contact
    • Sales Support
    • Feedback and Support
    • Careers
    • About Us
  • Store
    • Buy Now
    • Preferred Resellers
    • Team Pricing
  • Download
  • My 3T
search

Studio 3T® Knowledge Base

  • Documentation
  • Tutorials
  • Workshops
Take the fastest route to learning MongoDB. Cover the basics in two hours with MongoDB 101, no registration required.
Start the free course

Detecting COVID-19 Trends Using MongoDB Atlas Data

Posted on: 30/04/2020 (last updated: 04/08/2021) by Phil Factor

Johns Hopkins Center for Systems Science and Engineering (CSSE) has provided a curated collection of all the US and global details of confirmed cases, deaths, and recovery from the COVID-19 pandemic. This COVID-19 data is available on GitHub here, and it is easy to download the entire set of CSV files.

The Dev Rel team at MongoDB has made this data even easier to access by creating a public, read-only data set on MongoDB Atlas.

The connection URI for the read-only data set is:

mongodb+srv://readonly:[email protected]/covid19

The username and password are both readonly.

You will find two different databases: covid19jhu contains the raw, original Johns Hopkins data, and covid19 is the curated MongoDB database with indexes.

Accessing the covid19 database

I used Studio 3T to access the covid19 database for several reasons.

Firstly, it has a lot of ways of making life simpler for users. You can cut and paste from it to Excel when you’re doing one-off exploratory work, and there is a task scheduler (e.g. for setting up daily exports) for keeping everything up-to-date.

I also have a particular weak spot for the Aggregation Editor, and a slightly fractious love-hate relationship with the SQL Query tool.

Connecting to MongoDB Atlas was easy, done by pasting the URI.

A moment later, and we are in.

Exploring the statistics collection

Hmm. Nice. Unlike the raw covid19jhu database, MongoDB’s covid19 database has six suitable indexes, and the database is properly curated to make most queries easy. We ought to just check that we have the data there properly.

A quick inspection of the data in the statistics collection tells me that the individual documents relate to a province/region or an entire country.

Each document represents a day’s figure for the region or country, but shows accumulated total figures to that date for confirmed (see screenshot), deaths, and recovered – not actual daily figures.

To get things rolling, I turn to the SQL Query window, spit on my hands, and tap in the SQL query (here’s the snippet) that tells me the current incrementing total figures for confirmed cases and deaths by finding the MAX() value.

As I write this, we are expecting 96 records per area. The deaths are an incrementing total, so we just need to find the MAX() value for the current deaths (I over-simplify: it is possible for an error to be corrected downwards, but this would only affect the figures if it happened near the peak.)

Finding trends in COVID-19 data

Fine so far. I can check the population figures (https://worldpopulationreview.com/countries/) and the current deaths for each country/region/province, but what I really want is to aggregate deaths by country, and more importantly, I want to know the death rate as a percentage of the population.

It is a statistic that many foolish commentators have ignored while making pronouncements about the pandemic. One must just sigh, and wish that they’d stayed awake in elementary school maths.

Unfortunately, the SQL Query feature of Studio 3T is beginning to run short of breath at this level of complexity. We can take the query and edit it in the Aggregation Editor without too much stress.

Click on Query Code and choose mongo shell (here’s the full MongoDB query). Then, click the org chart-looking icon.

mongo shell translation of the SQL query

This opens the SQL query as a MongoDB aggregation query, broken down into stages, in a separate tab.

Now we can add a new $match stage to display regions with a population greater than zero, and a $group stage to combine the regions into countries (here’s the aggregation query with the additional stages).

By doing some maths on the projections, we can calculate the metric we want, which is the death rate as a percentage of the population.

OK. This looks more useful. We can see the worst-affected countries at the top.

A word of caution on total confirmed cases and death rates

The “Confirmed” figures are pretty useless where only hospitalised patients can be confirmed and tests are simply not available. Additionally, many people have the disease, but are symptomless.

The death rate is calculated in very different ways in different countries. Some, like Belgium, assume that any death is likely to be COVID-19 unless there is an obvious alternative explanation, whereas other countries refuse to include deaths outside hospitals. A few countries are reluctant to report any death as COVID-related, even though there are dead bodies in the streets.

We have to sigh, and say that while death rate is a metric that gives immediate measurement of the progress of a pandemic, it relies on human objectivity.

When comparing different places, we should be measuring the ‘excessive mortality’ over a time period and assuming that extra deaths over the average for the time of year must be assumed to be due to COVID-19. The problem with this is that we also need immediate feedback to measure the effect of drastic measures, such as lockdown.

COVID-19 death rates in Western Europe as a percentage of population

Now that we have the query results that show us the worst-affected countries, we can now import this data into Excel.

To do this, we simply export the query results onto the clipboard as a tab-delimited version of a CSV file. Tab-delimited CSV (TSV) is usually the best clipboard format for transferring tabular information between Windows applications on the clipboard.

Click on Execute, which runs the export in a couple of seconds.

With this, we can map out the relative death rate for Western European countries, where the deaths as a percentage of the population were particularly severe. Actually, it may just be an effect of a more honest attempt to get a true figure of the deaths from the virus.

Here is the summary as of the time of writing, which shows that Belgium, Spain, and Italy have seen the highest death rate percentages.

COVID-19 death rates as a percentage of population

Plotting bell curves of infections and deaths

Each country has a wildly different curve, depending on when the infection reached it via a “superspreader” (someone who was symptomless, but highly infectious).

Here we examine the frequency of confirmed cases of infection in the provinces of China as part of a complete data set. We are using sparklines as we’re not interested in the detail at this stage. We want to see the pattern of rise and decline, the bell curve.

This isn’t easy to do in MongoDB. The raw data, you’ll remember, is the total figure to date for confirmed, deaths, and recovered – not daily figures.

Here is a sample.

In Excel, we can easily paste into a sheet a running tally like the above and convert the figures into deaths per day.

What we do is to use Studio 3T’s export mechanism to paste a tab-delimited version of the table above with the accumulated data into a ‘data’ sheet. We then sort it by country and province.

Then, we create another final ‘display’ sheet that computes the difference for each day’s figures of that cell and the previous day’s cell. That will be the number of deaths on that day.

We add the sparklines to the display sheet and do all our graphing and analytics from the display sheet.

We used this display sheet to show the sparklines for the provinces of China that we showed above. It highlights the way that the infection initially spread.

From this same ‘display’ sheet, we were then able to do a range of analytics for which Excel is ideally suited. As an example, we could look graphically at the death rates over the pandemic for six European countries – Spain, France, Germany, Italy, Sweden, and the UK – along with a moving average to show the progress along the typical pandemic bell curve.

Daily COVID-19 deaths in Spain, France, Italy, Sweden, Germany, and the UK

It isn’t so easy in MongoDB to transform the running totals that we find in the published data into daily figures. This sort of thing is best done as a post-processing on the running cursor of results, but even then, this can only be done one region/province/country (a document) at a time, or for the full global figures.

Even in a relational database, it is tricky to do. It can be done but awkwardly in a SQL grouping but there is no equivalent technique in MongoDB.

A workaround is to run an aggregation query using $group, $project, and $sort (here’s the code), and then post-process the deaths to convert them to a daily tally. This query was developed below using Studio 3T’s IntelliShell.

The results can be pasted into a new collection in a separate read-write MongoDB database.

From there, one can export it straight into Excel where you can generate a graph of the global deaths from the pandemic.

I’ve produced such a graph below, where I’ve added a seven-day rolling average in red dots to show the classic bell jar curve of the progress of the pandemic.

The global pandemic bell curve based on the current COVID-19 data

Conclusion

It is a painful process for any data scientist to engage in research that serves to look at human suffering drawn as a graph, especially with relatives working as medics in intensive care. In this case, the consolation was to see, almost everywhere, the decline in the deadly consequences of the COVID-19 virus (thank goodness).

To have the data available is important and we can give thanks to Johns Hopkins University for so promptly making their data available, and for MongoDB for publishing the database on MongoDB Atlas.

Query snippets

SQL query to find the MAX() value

select 
  country, 
  combined_name,
  population,
  max(confirmed), 
  max(deaths), 
  Count(*)
from covid19.statistics 
group by country,population,combined_name
order by max(deaths) desc

The same SQL query as a MongoDB aggregation query

// Requires official MongoShell 3.6+
use covid19;
db.getCollection("statistics").aggregate(
    [
        { 
            "$group" : { 
                "_id" : { 
                    "combined_name" : "$combined_name", 
                    "country" : "$country", 
                    "population" : "$population"
                }, 
                "MAX(confirmed)" : { 
                    "$max" : "$confirmed"
                }, 
                "MAX(deaths)" : { 
                    "$max" : "$deaths"
                }, 
                "COUNT(*)" : { 
                    "$sum" : NumberInt(1)
                }
            }
        }, 
        { 
            "$project" : { 
                "country" : "$_id.country", 
                "combined_name" : "$_id.combined_name", 
                "population" : "$_id.population", 
                "MAX(confirmed)" : "$MAX(confirmed)", 
                "MAX(deaths)" : "$MAX(deaths)", 
                "COUNT(*)" : "$COUNT(*)", 
                "_id" : NumberInt(0)
            }
        }, 
        { 
            "$sort" : { 
                "MAX(deaths)" : NumberInt(-1)
            }
        }
    ], 
    { 
        "allowDiskUse" : true
    }
);

Aggregation query with additional match and group stages

// Requires official MongoShell 3.6+
use covid19;
db.getCollection("statistics").aggregate(
    [
        { 
            "$match" : { 
                "population" : { 
                    "$gt" : 0.0
                }
            }
        }, 
        { 
            "$group" : { 
                "_id" : { 
                    "country" : "$country", 
                    "population" : "$population"
                }, 
                "confirmed" : { 
                    "$max" : "$confirmed"
                }, 
                "deaths" : { 
                    "$max" : "$deaths"
                }, 
                "recovered" : { 
                    "$max" : "$recovered"
                }, 
                "latest" : { 
                    "$max" : "$date"
                }, 
                "first" : { 
                    "$min" : "$date"
                }, 
                "count" : { 
                    "$sum" : NumberInt(1)
                }
            }
        }, 
        { 
            "$group" : { 
                "_id" : { 
                    "country" : "$_id.country"
                }, 
                "population" : { 
                    "$sum" : "$_id.population"
                }, 
                "confirmed" : { 
                    "$sum" : "$confirmed"
                }, 
                "deaths" : { 
                    "$sum" : "$deaths"
                }, 
                "recovered" : { 
                    "$sum" : "$recovered"
                }
            }
        }, 
        { 
            "$project" : { 
                "country" : "$_id.country", 
                "population" : "$population", 
                "total_confirmed" : "$confirmed", 
                "total_deaths" : "$deaths", 
                "total_recovered" : "$recovered", 
                "confirmed_percent" : { 
                    "$multiply" : [
                        { 
                            "$divide" : [
                                "$confirmed", 
                                "$population"
                            ]
                        }, 
                        100.0
                    ]
                }, 
                "deaths_percent" : { 
                    "$multiply" : [
                        { 
                            "$divide" : [
                                "$deaths", 
                                "$population"
                            ]
                        }, 
                        100.0
                    ]
                }, 
                "DeathRate" : { 
                    "$multiply" : [
                        { 
                            "$divide" : [
                                "$deaths", 
                                "$confirmed"
                            ]
                        }, 
                        100.0
                    ]
                }, 
                "_id" : NumberInt(0)
            }
        }, 
        { 
            "$sort" : { 
                "deaths_percent" : -1.0
            }
        }
    ], 
    { 
        "allowDiskUse" : false
    }
);

Aggregation query to convert deaths to a daily tally

// Requires official MongoShell 3.6+
use covid19;
accumulator=0;
db.getCollection("statistics").aggregate(
    [
        { 
            "$group" : { 
                "_id" : { 
                    "date" : "$date"
                }, 
                "SUM(deaths)" : { 
                    "$sum" : "$deaths"
                }
            }
        }, 
        { 
            "$project" : { 
                "deaths" : "$SUM(deaths)", 
                "date" : "$_id.date", 
                "_id" : NumberInt(0)
            }
        }, 
        { 
            "$sort" : { 
                "date" : NumberInt(1)
            }
        }
    ], 
    { 
        "allowDiskUse" : true
    }
).map(doc => {
  incrementDeaths = doc.deaths-accumulator;
  accumulator = doc.deaths;
  return Object.assign(doc, { deaths:incrementDeaths });
})

How helpful was this article?
This article was hideous
This article was bad
This article was ok
This article was good
This article was great
Thank you for your feedback!

About The Author

Phil Factor

Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications. Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

Article navigation

Related articles

  • Working with MongoDB Atlas from A-Z using Studio 3T
  • Simplifying Data Migrations From Legacy SQL To MongoDB Atlas With Studio 3T And Hackolade
  • Lesson 6, Exercise 2: Using the SQL Query tool to aggregate collection data
  • Working with MongoDB Data? Use These Data Masking Techniques
  • How to Connect to MongoDB Atlas

Studio 3T

MongoDB Enterprise Certified Technology PartnerSince 2014, 3T has been helping thousands of MongoDB developers and administrators with their everyday jobs by providing the finest MongoDB tools on the market. We guarantee the best compatibility with current and legacy releases of MongoDB, continue to deliver new features with every new software release, and provide high quality support.

Find us on FacebookFind us on TwitterFind us on YouTubeFind us on LinkedIn

Education

  • Free MongoDB Tutorials
  • Connect to MongoDB
  • Connect to MongoDB Atlas
  • Import Data to MongoDB
  • Export MongoDB Data
  • Build Aggregation Queries
  • Query MongoDB with SQL
  • Migrate from SQL to MongoDB

Resources

  • Feedback and Support
  • Sales Support
  • Knowledge Base
  • FAQ
  • Reports
  • White Papers
  • Testimonials
  • Discounts

Company

  • About Us
  • Blog
  • Careers
  • Legal
  • Press
  • Privacy Policy
  • EULA

© 2023 3T Software Labs Ltd. All rights reserved.

  • Privacy Policy
  • Cookie settings
  • Impressum

We value your privacy

With your consent, we and third-party providers use cookies and similar technologies on our website to analyse your use of our site for market research or advertising purposes ("analytics and marketing") and to provide you with additional functions (“functional”). This may result in the creation of pseudonymous usage profiles and the transfer of personal data to third countries, including the USA, which may have no adequate level of protection for the processing of personal data.

By clicking “Accept all”, you consent to the storage of cookies and the processing of personal data for these purposes, including any transfers to third countries. By clicking on “Decline all”, you do not give your consent and we will only store cookies that are necessary for our website. You can customize the cookies we store on your device or change your selection at any time - thus also revoking your consent with effect for the future - under “Manage Cookies”, or “Cookie Settings” at the bottom of the page. You can find further information in our Privacy Policy.
Accept all
Decline all
Manage cookies
✕

Privacy Preference Center

With your consent, we and third-party providers use cookies and similar technologies on our website to analyse your use of our site for market research or advertising purposes ("analytics and marketing") and to provide you with additional functions (“functional”). This may result in the creation of pseudonymous usage profiles and the transfer of personal data to third countries, including the USA, which may have no adequate level of protection for the processing of personal data. Please choose for which purposes you wish to give us your consent and store your preferences by clicking on “Accept selected”. You can find further information in our Privacy Policy.

Accept all cookies

Manage consent preferences

Essential cookies are strictly necessary to provide an online service such as our website or a service on our website which you have requested. The website or service will not work without them.

Performance cookies allow us to collect information such as number of visits and sources of traffic. This information is used in aggregate form to help us understand how our websites are being used, allowing us to improve both our website’s performance and your experience.

Google Analytics

Google Ads

Bing Ads

Facebook

LinkedIn

Quora

Hotjar

Reddit

Functional cookies collect information about your preferences and choices and make using the website a lot easier and more relevant. Without these cookies, some of the site functionality may not work as intended.

HubSpot

Social media cookies are cookies used to share user behaviour information with a third-party social media platform. They may consequently effect how social media sites present you with information in the future.

Accept selected