Skip to content
Studio 3T - The professional GUI, IDE and client for MongoDB
  • Tools
    • Aggregation Editor
    • IntelliShell
    • Visual Query Builder
    • Export Wizard
    • Import Wizard
    • Query Code
    • SQL Query
    • Connect
    • Schema Explorer
    • Compare
    • SQL ⇔ MongoDB Migration
    • Data Masking
    • Task Scheduler
    • Reschema
    • More Tools and Features
  • Solutions
  • Resources
    • Knowledge Base
    • MongoDB Tutorials & Courses
    • Tool/Feature Documentation
    • Blog
    • Community
    • Testimonials
    • Whitepapers
    • Reports
  • Contact us
    • Contact
    • Sales Support
    • Feedback and Support
    • Careers
    • About Us
  • Store
    • Buy Now
    • Preferred Resellers
    • Team Pricing
  • Download
  • My 3T
search

How do I clean up a dataset and schema with Studio 3T? – #Studio3T_AMA

Posted on: 12/08/2021 (last updated: 11/03/2022) by Dj Walker-Morgan

Q: I’ve been given an old dataset to tidy up for new applications, but where do I start?

Welcome to the latest Studio 3T Ask Manatees Anything – #Studio3T_AMA. Want to know how to use Studio 3T to make your MongoDB life better? Just tweet and use #Studio3T_AMA or email [email protected].

A: Well, you’re in the right place; Studio 3T is built to help you modernize your data. As we don’t have your dataset, we’ll dig up a small one of our own. You can download it here and import it to follow along. This has documents of various shapes with different sets of fields attached to it.

Exploring the Schema

Now, the obvious starting point is to understand what is and isn’t there. For that, we can use Studio 3T’s Schema Explorer to investigate. Select the collection and click on the Schema toolbar button. Studio 3T will display the Schema analysis page where you get to pick how deeply the tool examines the database.

For most work, the Random sample should give you a good idea, but if you are looking for outliers and rare differences, set that to All to not miss a thing. There’s a few more settings we can adjust, but let’s move on and click Run Analysis.

And here’s our own “scruffy” schema named customers. On the left, all the fields that were found and their types. On the right, a pie chart of the distribution of those types.

The initial view of the Schema analysis as a pie chart of types.

Now, if we look on the left we can see the global probabilities of particular fields and types existing in our collection. Where that probability is 100%, we can be sure that the field exists in all records and when we’re writing an application, we can rely on it being there. So we already know our schema looks like this

Customer {
    _id: ObjectId,
    device: String,
    interests: Array,
}

Finding Bad Documents

There are, though, a lot of fields at 99.9%. Hmmm… this sounds like some bad data that’s distorting the schema. Let’s bring up the right click menu on the first of them: address. We’re interested in records which don’t have the address field, so select Explore documents not containing selected field.

Explore documents not containing selected field opens a collections view with a $exists: false query.

And a new tab will open with an { “address”: { $exists: false } } query. This reveals the awful truth:

This bad document contains seems to have come from somewhere else.

It’s all down to a single, very malformed document, almost as if it was meant for another collection. It is also responsible for most of the fields showing up as those 0.1% probabilities – nationalities, number_employees, preferred_database and country. It’s obviously an error, so let’s delete it (right menu, Document -> Remove Document). Now we go back to the Schema Tab and click Run Analysis again:

This analysis of the schema shows much more consistency.

That’s much better. More 100% probabilities all over.

Customer {
    _id: ObjectId,
    address: Object,
    device: String,
    dob: Date,
    interests: Array,
    package: String,
    prio_support: Bool,
    registered_on: Date,
    transactions: Int32
}

There’s still plenty more to do cleaning up this schema. In the next AMA, we’ll show you tips and tricks for giving your data a deep clean.


How helpful was this article?
This article was hideous
This article was bad
This article was ok
This article was good
This article was great
Thank you for your feedback!

About The Author

Dj Walker-Morgan

Dj has been around since Commodore had Pets and Apples grew everywhere. With a background in Unix and development, he's been around the technology business writing code or writing content ever since.

Related articles

  • More Dataset And Schema Clean Ups With Studio 3T – #Studio3T_AMA
  • How do I analyze the schema of some of my documents. #Studio3T_AMA
  • How do I copy and paste plain JSON from Studio 3T? #Studio3T_AMA
  • What’s the difference between Robo 3T and Studio 3T Free? #Studio3T_AMA
  • Smart and Safe MongoDB Multi-Document Updates #Studio3T_AMA

Tags

2022 academy aggregation AMA atlas Certification christmas community connections culture date tags events export features hackolade import intellishell In Use JSON knowledge base migration modelling mongodb mongoodb mongosh My 3T productivity query regex releases schema security SQL Studio 3T tasks time series tips updates webinar windows

Browse by MongoDB topic

  • Connecting to MongoDB
  • Database Administration & Security
  • Getting Started with MongoDB
  • Getting Started with Studio 3T
  • Import/Export
  • Job Automation & Scheduling
  • MongoDB Aggregation Framework
  • MongoDB/Studio 3T Workshops
  • Performance
  • Query & CRUD Operations
  • Reference
  • Schema
  • Studio 3T Licensing
  • Support and other resources
  • Working with MongoDB & SQL
  • Working with MongoDB Atlas

Studio 3T

MongoDB Enterprise Certified Technology PartnerSince 2014, 3T has been helping thousands of MongoDB developers and administrators with their everyday jobs by providing the finest MongoDB tools on the market. We guarantee the best compatibility with current and legacy releases of MongoDB, continue to deliver new features with every new software release, and provide high quality support.

Find us on FacebookFind us on TwitterFind us on YouTubeFind us on LinkedIn

Education

  • Free MongoDB Tutorials
  • Connect to MongoDB
  • Connect to MongoDB Atlas
  • Import Data to MongoDB
  • Export MongoDB Data
  • Build Aggregation Queries
  • Query MongoDB with SQL
  • Migrate from SQL to MongoDB

Resources

  • Feedback and Support
  • Sales Support
  • Knowledge Base
  • FAQ
  • Reports
  • White Papers
  • Testimonials
  • Discounts

Company

  • About Us
  • Blog
  • Careers
  • Legal
  • Press
  • Privacy Policy
  • EULA

© 2023 3T Software Labs Ltd. All rights reserved.

  • Privacy Policy
  • Cookie settings
  • Impressum

We value your privacy

With your consent, we and third-party providers use cookies and similar technologies on our website to analyse your use of our site for market research or advertising purposes ("analytics and marketing") and to provide you with additional functions (“functional”). This may result in the creation of pseudonymous usage profiles and the transfer of personal data to third countries, including the USA, which may have no adequate level of protection for the processing of personal data.

By clicking “Accept all”, you consent to the storage of cookies and the processing of personal data for these purposes, including any transfers to third countries. By clicking on “Decline all”, you do not give your consent and we will only store cookies that are necessary for our website. You can customize the cookies we store on your device or change your selection at any time - thus also revoking your consent with effect for the future - under “Manage Cookies”, or “Cookie Settings” at the bottom of the page. You can find further information in our Privacy Policy.
Accept all
Decline all
Manage cookies
✕

Privacy Preference Center

With your consent, we and third-party providers use cookies and similar technologies on our website to analyse your use of our site for market research or advertising purposes ("analytics and marketing") and to provide you with additional functions (“functional”). This may result in the creation of pseudonymous usage profiles and the transfer of personal data to third countries, including the USA, which may have no adequate level of protection for the processing of personal data. Please choose for which purposes you wish to give us your consent and store your preferences by clicking on “Accept selected”. You can find further information in our Privacy Policy.

Accept all cookies

Manage consent preferences

Essential cookies are strictly necessary to provide an online service such as our website or a service on our website which you have requested. The website or service will not work without them.

Performance cookies allow us to collect information such as number of visits and sources of traffic. This information is used in aggregate form to help us understand how our websites are being used, allowing us to improve both our website’s performance and your experience.

Google Analytics

Google Ads

Bing Ads

Facebook

LinkedIn

Quora

Hotjar

Reddit

Functional cookies collect information about your preferences and choices and make using the website a lot easier and more relevant. Without these cookies, some of the site functionality may not work as intended.

HubSpot

Social media cookies are cookies used to share user behaviour information with a third-party social media platform. They may consequently effect how social media sites present you with information in the future.

Accept selected