Q: How do I export the last seven days’ new documents from my collections? My documents don’t have a timestamp field but I want to archive them separately.
Welcome to the latest Studio 3T Ask Manatees Anything – #Studio3T_AMA. Want to know how to use Studio 3T to make your MongoDB life better? Just tweet and use #Studio3T_AMA or email [email protected].
A: There are two parts to the answer. First you need a timestamp of some sort and you need to know when seven days ago actually was. When you know both those things, it’s a simple “greater-than” comparison to identify new documents. That opens the way to you using Studio 3T Tasks to regularly run an export or report and using placeholder names for destination files to create unique archives.
Let’s start with the timestamp. If your schema doesn’t have a timestamp field for when a document was created (or modified), you can’t really know the history of any document. Many schemas won’t have timestamp fields because people don’t expect them to be useful.
So, let us assume you just have your collection documents with their data and the server-generated _id
field for indexing. Can we work out when a document was created?
The Hidden Timestamp
If you’ve let MongoDB generate the _id’s value, there is. The purpose of an ObjectId is to be as unique as possible. To that end it includes a 5 byte random value and a 3 byte counter, and a 4 byte timestamp. You can read about that in the MongoDB ObjectId documentation. All that uniqueness is created when you insert a document for the very first time.
It’s the timestamp in the _id
field that we can use to find the last seven days of new documents. The question is, how do we access that timestamp information. ObjectId has a .getTimestamp()
method in JavaScript and in Mongo shell, but that’s not something we can use when constructing a MongoDB query.
To get that timestamp we have to look at the aggregation function $toDate
. While it would most likely come to mind for converting strings and numbers to Date values, it has a trick up its sleeve. When given an ObjectId, it returns the timestamp from that ObjectId as a Date. That looks like this in practice:
{ $toDate: "$_id" }
Now, the timestamp’s date value is rounded to seconds, but when we use it, it will return a value in milliseconds, like other dates in MongoDB.
How Soon Is $$NOW?
The next bit of information we need is when was seven days ago from right now. MongoDB’s aggregation system variables give us the “right now” through the $$NOW
variable that returns the current date value.
To make $$NOW
into seven days ago, we need to work out the number of milliseconds that equals seven days and subtract that from $$NOW
. Well, one day has 86,400,000 milliseconds (1000*60*60*24) so we’ll use that as our starting point and just multiply it by 7 to get seven days:
{ $multiply: [ 86400000, 7 ] }
You can bookmark this simple expression for future use. Just change the second parameter to set the number of days you need and avoid messing about with the calculator. You can also immediately see the number of days without having to divide by 86400000. We have to subtract that from $$NOW
:
{ $subtract: [ "$$NOW", { $multiply: [ 86400000, 7 ] } ] }
We now have a way to calculate, within an aggregation, seven days ago.
Putting the Timestamp and $$NOW Together
We want to wrap both these parts, the _id
timestamp and the seven days ago calculation in a greater-than expression. If the timestamp is greater than seven days ago, it was created in the last seven days.
$gt: [ { $toDate: "$_id" }, { $subtract: [ "$$NOW", { $multiply: [ 86400000, 7 ] } ] } ]
We’re almost there. To use this in an aggregation’s $match
stage we’ll need to use $expr
to have it evaluated.
Create an aggregation $match
stage and drop that in. The pipeline will now only contain records newly created in the last seven days.
{
$expr: {
$gt: [
{
$toDate: "$_id"
},
{
$subtract: [ "$$NOW" , { $multiply: [86400000, 7 ] }]
}
]
}
}
Once you’ve fine tuned your Aggregation query, click on Studio 3T’s export tool to turn the aggregation into a source for an Export. Select “Current Query Result” as the export source when asked
Save the Export as a Task to be able to repeatedly run it and use Studio 3T’s Task scheduler to regularly run the export. If you are exporting to files, don’t forget to use placeholder names for destination files to create unique filenames.
Express It As A Find Query
We’re not done yet. Using $expr
to wrap it also means we can use that same query in a find. The query would look like this in Studio 3T:
It’s a superpower worth knowing about that $expr
lets you tap into the rich range of aggregation operators from inside traditional MongoDB queries.
Caveat Queryor
Remember that the ObjectId is only time stamped once on creation where it didn’t exist before. If you are copying collections and copy the _id
field over to the new collection, you’ll find everything retains the time it was initially created with. If you copy over without the _id
field, a new _id
will be created but the collection will have a new index too.
Using the _id
timestamp is a useful method of identifying new documents, but if you want to also capture modified documents you’ll need your own time-stamping. We’ll talk about that in a future AMA on this subject.