Mongosh: How To Read More Files

In this short series, we’re looking at the new capabilities of the IntelliShell and we start with how you can use Mongosh to read files.

Not all data is structured as JSON or CSV. A lot of valuable data is often trapped in plain text files on your disk. To get real value and insight from them, you will want to give the data some structure and turn it into a collection that you can query and aggregate in MongoDB.

For example, if you had a plain text file of legacy data you wanted to use in MongoDB, previously you would have to think about writing an app to parse and import that data. But now, it’s possible to do that from within the Mongo shell and with Studio 3T’s IntelliShell.

A Change Of Shell

With the switch to default Mongosh in Studio 3T 2022.9, a whole world of new capabilities has been added to the Mongo Shell.

The old legacy mongo was a JavaScript interpreter attached to a database driver and a REPL (Read/Execute/Print/Loop) command line. There wasn’t much else to it. Also, it was hard to extend its JavaScript engine so that you could write files or connect to things that weren’t MongoDB. That meant you were more likely to have to write an actual program in your preferred language to do what you wanted.

With Mongosh, that’s all changed. The new shell is built on top of Node.js, the JavaScript runtime which is at the heart of many applications, both desktop and web. Because of this, we can harness the whole spectrum of modern JavaScript capabilities in our IntelliShell scripts.

In this series, we’ll be exploring what you can do and why you should consider adding it to your toolbox. Today, we’ll talk about using Mongosh to read files.

Getting Mongosh to Read Files

Why would you want to read files in the Mongo shell? Well, it’s a good way to provide input data, whether it’s commands or raw text which needs to be massaged into a JSON document so it can be inserted into a collection. In the older mongo shell, there was a command called “cat(<filename>)” that let you read a designated file into a variable, as a single string.

With Mongosh, it officially replaces that command with:

fs.readFileSync( <filename>, 'utf8' )

This command reads an entire file in one go and it returns the contents as a string. The ‘utf8’ sets the character encoding. The “fs.” means it is part of the File System package in Node and if you look in the Node documentation for the fs package you’ll find it amongst a whole throng of versatile file reading and manipulating functions.

We want to make things as simple as possible. Therefore we’re only going to focus on a couple of those functions, specifically the Sync() versions of the functions. These are functions which run and don’t need JavaScript callbacks or promises to work. If you know what they are you’ll know they can be a bit fiddly to work with. If you don’t, then we’re sticking with the simplest, most scripty version of the available functions. You call them, they do their work, they return a result.

Now for a lot of uses, fs.readFileSync() will work because there’s plenty of memory to work with.

Practical Mongosh File Reading

Let’s look at a practical example. We have a quotes file downloaded from github which contains a quote, a tilde and then the name of the person attributed to the quote. Sounds pretty simple, but it’s not in a format you can import. Let’s fix that and use it to create a collection.

Open IntelliShell in Studio 3T on a database or collection. We begin by reading our file:

const r=fs.readFileSync("/Users/dj/quotes.txt");

You’ll need to put the appropriate path into your copy of the quotes file. This function will read the whole file in one go. BUT, unlike the old shell’s cat(), it doesn’t read it into a string – instead, it reads it into a data structure called a Buffer. So, the next thing to do is turn that Buffer into a string:

const tmp=r.toString();

And once we’ve done that, we can use JavaScript’s string.split() function to break it up into an array of lines.

const sp=tmp.split("\n");

Excellent.

Writing to MongoDB

Of course getting Mongosh to read files is only half the story. Now we have to parse and store that data in MongoDB. We start that process by stepping through the array, line by line.

for (const s of sp) {

This is JavaScript’s automatic way of stepping through an array of things. Each line will appear as the constant s inside the loop:

  if(s=="") {
        continue;
    }

If s is a blank line, we skip the rest of the loop. Otherwise we move on to parsing the data. In this case it is simple. We split up the line on the tilde ~ character and save the left and right parts as quote and author:

   p=s.split("~");
    quote=p[0];
    author=p[1];

Now all we need to do is create a document and insert it into a collection.

    document={ "author":author,"quote":quote };
    db.getCollection("quotes").insertOne(document);
}

And yes, we are keeping it simple here. You could gather all the quote documents into one big JavaScript array and insert them at the same time with .insertMany(), but we’ll leave that as an exercise for you, the reader.

Here’s all the code in one chunk so you can easily copy and paste it:

const r=fs.readFileSync("/Users/dj/quotes.txt");
const tmp=r.toString();
const sp=tmp.split("\n");;

for (const s of sp) {
   if(s=="") {
        continue;
    }

    p=s.split("~");
    quote=p[0];
    author=p[1];

    document={ "author":author,"quote":quote };
    db.getCollection("quotes").insertOne(document);
}

Doing More With Mongosh

And there we have it. We’ve used IntelliShell and Mongosh to convert a plain text file with no formal structure into a MongoDB database. What you should take away from this is that your scripts now have the ability to access anything that a Nodejs application might want. In future, we’ll look at even more capabilities – writing files and launching other processes – all from within Mongosh.