Are my MongoDB indexes being used? #Studio3T_AMA

Q: People tell me I should only have the indexes I need on my MongoDB collections. How do I find out which indexes are actually in use for a collection?

Welcome to the latest Studio 3T Ask Manatees Anything – #Studio3T_AMA. Want to know how to use Studio 3T to make your MongoDB life better? Just tweet and use #Studio3T_AMA or email [email protected].

A: It’s good advice that you make sure collections only have the indexes you need. It can be tempting to add an index that matches every query that you make on a collection but that has a cost. Indexes are ideally always in memory, but when memory the system may move them to disk. If they get taken out of memory you end up losing a lot of their performance boosting ability.

The Problem With Indexes

Also, the more indexes you have, the more work MongoDB has to do when you insert a document, updating each and every index. Indexes work their magic in the background though. It is hard to tell if they are in use though. You need to look at the “explain” output from MongoDB and interpret that. Or you can use Studio 3T’s Visual Explain which interprets it for you. It will show you when a collection’s index is in use for a particular query.

Here we see that the scanning of the name field uses the name index. It gives us the fifty six thousand (56K) results which then pass through a filter. That leaves us matching thirty one (31) documents which MongoDB directly retrieves from the collection (via the _id index) and we get the results. It’s a lot easier to understand than a JSON document (which is what the explain command produces).

Introducing MongoDB’s $indexStats

Visual Explain tells you how an index is helps. But it doesn’t tell you how often that index comes into play. For that, we have to go look at MongoDB aggregation. Which sounds a little odd, but in aggregation is a pipeline stage $indexStats and if you look it up in the MongoDB documentation, you’ll find it’s been around since MongoDB 3.2 and it returns lots of information about a collection’s indexes.

Here’s the pipeline in the Studio 3T Aggregation Editor. The $indexStats stage has to be the first stage in the pipeline and it takes no document parameters.

Creating and running a $indexStats pipeline

The results for each index of the collection forms a document in the results. Each one has the name of the index and a key object. The key object lists each field used in the object and the type of index. Notice the address.location_2dsphere index has a 2dsphere index specifically for geo searches. For now, let’s close up the key field and zoom in on these results:

The next object in the $indexStats results is an accesses object. This contains two values, the number of times this a query has accessed the index and the time when that counting started. That clock will be reset when there is a server restart or when modifications are made to the index. Finally, there’s a host value recording the server where this value was originates from. If you were working with a sharded cluster, you would see information about which shard this data was came from.

Reading The Results

But circling around, what really interests us is how many accesses the index has had. We can see the _id_ index has 7 access ops against it, the same as the composite property_type_1_room_type_2_beds_1 index which can cover a number of queries. The name_1 index only has one op against it. And finally the address.location_2dsphere index has no operations at all. Obviously this is just an example and your ops numbers in production should be higher, but this illustrates what we are looking for, an index that’s not being accessed.

And here it appears that no one is doing any geo searches against this collection, despite there being a 2dsphere index. Wait, wait, before you go and delete it, do make sure it is never being put to use. Queries using that index only exist in an occasionally run task, That’s a case for finding out if the cost and value of having the index makes sense. Also, before you drop the index, you’ll need to consider if the index is useful in stopping that query placing too high a load on the server.

The decision on whether to drop an index is rarely a simple “has it been used recently” question, but the $indexStats results will help inform you which indexes could be candidates for removal.

Bonus: $indexStats with the shell

If you don’t have Studio 3T’s Aggregation Editor to hand, you can still run $indexStats by hand. The command would be something like:

db.getCollection("listingsAndReviews").aggregate([ { "$indexStats": { } } ] );

Where you’d replace “listingsAndReviews” with the name of your collection. You’ll get results in JSON format similar to this:

{
   "name": "_id_",
   "key": {
       "_id": 1
   },
   "accesses": {
       "ops": NumberLong(7),
       "since": ISODate("2022-01-10T20:02:24.356Z")
   },
   "host": "exemplar:27017"
}
{
   "name": "property_type_1_room_type_1_beds_1",
   "key": {
       "property_type": 1,
       "room_type": 1,
       "beds": 1
   },
   "accesses": {
       "ops": NumberLong(7),
       "since": ISODate("2022-01-10T20:02:24.356Z")
   },
   "host": "exemplar:27017"
}
...

The fields are the same as discussed above.

Finally

MongoDB Indexes are a great way to speed queries, but only when used with an eye on the trade-off between memory and performance. With $indexStats, you’ll have a bit more insight into how useful your indexes are.