Q: I’ve just set up MongoDB and I want some sample datasets to work with as I learn. Where can I find some data that’s easy to load up?
A: If you’ve set up your MongoDB on MongoDB Atlas, then you’re in luck as the fine folks there let you install a sample data set from the console. Just head over to the Atlas console for your cluster and select “Load Sample Dataset”:
And it will import eight different sample datasets which include geo-data about wrecks and weather, Airbnb listings, mock financial data, fake movie data… and more! All of them come with the appropriate indexes set up too.
Importing locally
If on the other hand, you have your own MongoDB server or are running your own local instance there’s no such option. As I mentioned in our Windows 11 install guide, if you’d like to tap into this sample dataset, what you need to do is download the mongodump
archive that MongoDB have published at:
https://atlas-education.s3.amazonaws.com/sampledata.archive
Paste that link into your browser and it should download and save the file. If you prefer to do that sort of thing on the command line then use curl or wget like so:
curl -O https://atlas-education.s3.amazonaws.com/sampledata.archive
Or
wget https://atlas-education.s3.amazonaws.com/sampledata.archive
Once that’s downloaded, head over to Studio 3T to import the data into MongoDB. Connect to your MongoDB database and then select the Import tool from the toolbar. You’ll then be presented with a dialog where you can select which import tool you want to use.
For this import, we want to use the BSON – mongodump archive option. Select that and click Configure. You’ll be taken to a BSON – Archive Import tab:
Click on Select File and in the file chooser dialog that appears, set the filter to Archive and select the sampledata.archive file you downloaded earlier. If you didn’t select a database before clicking Import, you’ll need to click Change Target and select a server as a destination for the imported data. And that’s all you need to change before clicking Run.
Watch the Operations Pane to see the progress of the BSON Archive import. When it’s complete, you may need to refresh the connection bar. Click on any item in the connection bar, then bring up the right click menu and select Refresh All, or press control/command R.
Other Sources for Sample Datasets
So far, we’ve talked about bringing in the well-documented Atlas sample dataset, into Atlas and into your own server. But you may want to bring in data from somewhere more relevant or interesting than this.
Extra3T – In the Github repository for Extra3T/ExampleDatasets are a collection of various sample datasets used in Studio 3T’s own tutorials and exercises. These datasets can be imported using the Studio 3T JSON Import tool.
Airbnb datasets – The InsideAirbnb project accumulates data from Airbnb lettings around the world and creates datasets for cities to help inform researchers and policy makers. On their Get The Data page, you’ll find these datasets for download in CSV format. Download and import using the Studio 3T CSV Import tool.
Kaggle – Kaggle runs competitions for data scientists, and data scientists need datasets. On Kaggle’s main page you’ll find user-contributed datasets of various shapes and sizes.
On a much lighter note – the FiveThirtyEight GitHub repository contains data on DC and Marvel comic characters in CSV format as reference data for a story they did. It’s not alone; if you look in the full repository you’ll find lots of datasets in CSV format that provide a reference for their articles – and a link to the article in the readme file.