Q: I’ve heard that covered queries in MongoDB perform much faster. But what are they and how can I use Studio 3T to find out if my query is covered?
A: Let’s start with what a covered query is. When you create an index in MongoDB, you can have multiple different fields contributing keys to an index. When you perform a query, the query engine will work out the best way to run the query. The system looks to see if all the data required to produce results is available in the index. If it is, then that query is covered (by the index). When it’s covered, the system only retrieves data from the index which should already be in memory. That’s the theory at least. Let’s break out the Studio 3T and try it ourselves.
Practically Covering Queries
So, let’s imagine you have a collection with department
, username
and logincount
… along with lots more user data. For this example, let’s say you regularly search on department and username, and maybe regularly sort by descending login_count. Let’s index that by department and username and then do a query on the department. It comes back fast enough. And if we look at the Explain tab in Studio 3T we can see where that index plays a part:
But for every result found by the index scan, at the start each document then has to be retrieved. Here, that’s 45,000 documents. They are then filtered down in the projection stage. 87.3MiB of data retrieved according to the Explain stats.
All the data is in our index however, so you’d think this would be a covered query. There’s one more condition to making a covered query. And that’s the slightly counter-intuitive one. You need to exclude the _id field too because that is included by default. If it is included by default, the query engine pulls the documents from the database. That lets it make a complete document for the rest of the query to process.
So, let’s exclude the _id
field from the results (adding { "id":0 }
to the projection). Now when we re-run our query and look at the Explain view:
Now we can see the PROJECTION_COVERED stage that sees us go from scanning our index straight to the results. That’s how Studio 3T makes it easier to see if your query is covered.
Covering more with a Covered Query
Covered queries are fragile things though. If we were to start retrieving and sorting on the logincount
fields, things would change. Let’s add it to the projection and set it in the sort options. Now, looking at the Explain tab shows us that we are back to retrieving the whole documents:
We can add the logincount
field to our index. Studio 3T makes that simple by letting you add fields to the index definition. When you update the index, Studio 3T then recreates the index (as MongoDB doesn’t allow for modifying indexes).
If we go back and rerun the query then check the Explain view we can see this works:
There’s the PROJECTION_COVERED stage showing we covered the query by expanding the index.
Caveat Indexer
This can make a huge difference to performance, because, in most circumstances, you’ll have the index in memory. If your indexes can’t fit in memory, you’ll usually have performance issues. But that’s outside the scope of this particular AMA (Ask Manatees Anything).
Always remember to make your indexes fit the queries you are doing. Unused indexes cost performance at write time and are a RAM usage overhead. Index memory will fight with your working set memory as it tries to make best use of your RAM.
As long as your index is in memory and your query is covered, the results will be coming straight from RAM. And that’s a great way to speed up things.