SQL to MongoDB Migration lets you import multiple SQL tables or custom datasets to a single MongoDB collection, and map table relationships (one-to-one, one-to-many) to JSON as needed.
SQL to MongoDB Migration currently supports MySQL, PostgreSQL, Oracle, Microsoft SQL Server, Sybase, and IBM DB2.
Download Studio 3T to try it today.
Download the World Database (Zip, TGZ) to follow along the example in the video.
Introduction
SQL to MongoDB Migration is about migrating data from a relational database server to a MongoDB server. You can automate SQL to MongoDB Migration using Studio 3T’s Tasks feature.
Studio 3T already has a SQL Import to MongoDB feature, which allows you to import records from a single table into a single collection.
SQL to MongoDB Migration extends the ability to import SQL data from a single table to a single collection with the following possibilities:
- Multiple imports, with each import coming from a main table to a target collection. What before were separate imports can now be done as independent pieces or import units within a larger migration. This is simple but very useful, especially if you plan to import a full database at once.
- In addition to importing the records of a main table, you can also import data from other tables, into the same collection, as long as you define how each record of the main table is related to the records in other tables within each import unit (which, as stated before, has a main source table and a target collection).
For example, say you have a table in your relational database called post
and another called comment
, which are related because each post has multiple comments:
Table post
post_id | title |
---|---|
1 | Getting Started with MongoDB |
2 | Users and Roles Explained |
Table comment
comment_id | post_id | author | content |
---|---|---|---|
1 | 1 | Alice | This is great! |
2 | 1 | Bob | This is awesome! |
3 | 2 | Carol | Best post ever! |
…you can now import the table post
into a posts
collection in MongoDB, and each document can contain an array of comments. In short, you can embed one-to-one and one-to-many relationships into the document.
{ "_id" : NumberLong(1), "title" : "Getting Started with MongoDB", "comments" : [ { "author" : "Alice", "content" : "This is great!" }, { "author" : "Bob", "content" : "This is awesome!" } ] } { "_id" : NumberLong(2), "title" : "Users and Roles Explained", "comments" : [ { "author" : "Carol", "content" : "Best post ever!" } ] }
Open SQL to MongoDB Migration
To open SQL to MongoDB Migration in Studio 3T:
- Click on the SQL Migration button in the global toolbar, or right-click into a server, database or collection in the Connection Tree and select the SQL Migration option.
- Select SQL → MongoDB Migration. This opens a new tab where you can configure and run the import.
Studio 3T automatically detects the target connection (the connection into which records are imported into a collection) depending on what you have selected in the connection tree, and whether you currently have a collection tab open.
It also detects the target database and collection wherever possible, or otherwise they are left blank. In any case, they can always be modified later.
Connect to a SQL Server
To connect to a relational database server:
- Click on Click here to connect to a server.
- Select/configure a SQL connection via the SQL Connection Manager.
Add SQL tables as import units
1. Click on the Add from source table button.
2. In the Select tables to import dialog, select the required tables in your SQL Server.
A new import unit is created for every table.
Use custom SQL queries as import units
In addition to importing all rows from SQL tables, you can also specify a custom SQL query to control the dataset to import.
1. Click on the Add from custom SQL query button.
2. In the Add custom SQL query dialog, write your SQL query in the box, and click Add.
For example, type select * from customer where active = true
to import results only where active
is equal to true
.
By using custom SQL query as an import unit, you can filter the data to include only the subset you need (instead of importing all rows from SQL tables), or extend your data (using UNION
or JOIN
, for example).
3. On the Overview tab, select the import unit and click on the Mappings button to further modify your SQL query.
The import unit opens in a separate tab, where you can continue to edit your query in the SQL Query panel.
Making changes to the SQL query
You can edit your SQL query and then refresh it, or create a new default mapping. It is important to remember the difference to avoid duplicate work.
When editing only the filter conditions of your query, it is sufficient to refresh it. In the example below, we’re still retrieving all columns and have changed only the filter condition to import the rows where active
is now false
instead of true
.
Clicking on the refresh button updates the results accordingly in the JSON output preview.
However, after making any changes related to columns, you must create a new default mapping for your updated query by clicking Create default mapping for query button.
Creating a new default mapping updates the schema – and the results – in the JSON output preview. Now only first_name
, last_name
and active
are displayed.
Refreshing the query in this case would result in an error in the JSON output preview. Studio 3T wouldn’t be able to map the new results based on the old schema from the previous SQL query.
Run the import
At this point, you can go ahead and run the import job. You can, of course, always tweak the configuration of each unit before running the import, as we explain in the sections that follow.
The import units are run in parallel – within a reasonable limit of simultaneous processes – for faster completion.
At the end of the import, a new collection is created for each table if it didn’t exist before.
Configure import units
Directly from the list of import units on the Overview tab, you can:
- Change the target database and collection of an import unit
- Change the insertion mode
- Include/exclude an import unit from the job
- Run one or more of the import units
- Delete and add import units… and so on
Additionally, you can customize how the data from the main table – and possibly from other tables – should be transformed into documents for each import unit. That’s the biggest part of the configuration, which has its own dedicated tab.
The import unit configuration tab
To access the configuration tab for an import unit, on the Overview tab: right click and select Edit mappings, or double-click on the Source table / query column, or select the import unit and click the Mappings button. If you select multiple import units, Studio 3T opens a separate tab for each import unit.
The configuration tab for an import unit contains the following sections:
- The Schema view shows the specification of the documents you want to import. It displays the fields and data types (which can include nested objects and arrays), the source dataset, and source column for each leaf field. In the case of arrays, you can also see which one-to-many relationship they represent.
- In the Datasets view, you can see which datasets are part of the import unit and the relationships between them. By default, each import unit has only one dataset (the root table), but you can add more, as we’ll explain later.
- The JSON output preview shows a sample of the documents that Studio 3T will import, in their final state.
If your import unit is a custom query, the SQL Query panel is displayed for filtering the data and editing columns.
Customize fields
You can customize all the fields in your schema, for example change their name, the SQL columns they come from, and their target mongo type.
To edit a field:
- Right-click – Right-click on the field and select the Edit / Rename
- Button – Select the field and click on the “pencil” button
You can also rename a field directly by double-clicking on its name.
Manipulate the schema
You can manipulate the “shape” of your target schema as you wish.
Using JSON Preview, let’s look at the example below, which is the first document produced post-import with its initial schema. It has nine fields: customer_id
, store_id
, first_name
, last_name
, email
, address_id
, active
, create_date
, and last_update
.
Now let’s run through the changes we can make to the schema.
Remove fields
We can remove fields from the schema (e.g. delete store_id
, active
, create_date
, and last_update
):
Rename fields
We can rename the fields (e.g. change first_name
to first
, last_name
to last
):
Reorder fields
We can reorder the fields, which can be done via the arrow buttons in the toolbar, or via drag and drop:
Create new field with object values
We can also create a new field whose value is an object.
To do this:
- Right-click on the parent object (in this case, the root node).
- Select Add object field. Alternatively, select the parent object and click on the Add object button.
Move fields from one object to another
We can move fields from one object to another via drag and drop.
In this case we’ll move the first
and last
fields out of the root object and into the new object that we just created as the value of the name
field.
Set up array fields
We can also set up array fields, but this requires defining a one-to-many relationship with another dataset, which are discussed later.
Relationships between datasets
The migration feature makes it possible to import multiple datasets into a single collection.
Importantly, there must only be one root dataset. The root dataset is the one that forms the basis of the final collection. Each record in the table of the root dataset results in one document in the output. Every other (non-root) dataset must depend on a previously defined dataset via a one-to-one or one-to-many relationship.
One-to-one relationships
The meaning of “one-to-one” in SQL to MongoDB Migration differs slightly from what the term usually means.
If we have a dataset, A, and then define a one-to-one relationship with another dataset, B, this means that for each row in A, there is (optionally) one corresponding row in dataset B.
For example:
- Employee – Social Security Number. Because an employee may still not have the SSN entered at some point, this is strictly a one-to-zero-or-one (1 – 0..1) in the modeling world. However, as far as SQL to MongoDB Migration is concerned, it’s just a one-to-one. SQL to MongoDB Migration tolerates the absence of a record on the right side.
- City – Country. On the SQL side this is clearly a many-to-one relationship, because there can be many cities in any particular country. But for the purposes of SQL to MongoDB Migration, if we define the city dataset first, we can then define a one-to-one relationship to country, because for one city there is one country.
Set up one-to-one relationships
Let’s take a look at an example.
Say we’ve set up the root dataset of our import to be city
and we left the schema as-is by default:
We then want to add a one-to-one relationship to country
(which, as we said before, is really a many-to-one, but it’s handled the same as a one-to-one in this feature). We can then include the details of the country in the city document itself.
We will right-click on the root node and select Add one-to-one relationship, or alternatively, select the root node and click on the Add one-to-one button:
In the Add one-to-one relationship dialog, we’ll specify which table we’re adding (the “child table”), and how it’s related to the pre-existing table (the “parent table”).
Note that “parent table” and “child table” refer to the fact that the child table depends on the parent table in our import configuration.
In this case, the parent table is “city” and the child table is “country” because we’re defining the country as dependent on the city.
There are two ways of specifying how two tables are related:
- We can choose an existing foreign key, which Studio 3T detects automatically, or
- We can manually specify the parent table, child table, and the column pairs that make up the join conditions.
In this case, there is a foreign key that links city with country, so we’ll use just that.
Next, we can choose if (and how) we want new fields to be created automatically for the columns in this new table.
Note that these new fields are not part of the relationship per se; the one-to-one relationship can exist independently of whether any fields reference the new table and where these fields are located. This is simply a convenience feature, so that we don’t have to add all the fields that we need one by one afterwards.
The options are:
Add fields in a nested object
This creates a new field whose value is an object, and that object has a field for each of the columns in the new dataset.
Add fields on the same level
This adds the new fields in the top-most level possible. In this example it’s the root level, but it could also be inside an array if the parent dataset is the “many” side of a one-to-many relationship.
Skip adding fields
The option “Do not add any fields yet” simply adds the new dataset without changing the schema. That means that the new dataset is not used at this point, but we can reference it later by manually adding fields into the schema, or adding another dataset that references the one we just added.
Reusing and removing datasets
Your datasets, both used and unused are listed in the Datasets view pane. Unused datasets are grayed out. When adding one-to-one relationships, you also have the option to reuse a dataset. This makes it easier to add back all fields in a previously-used dataset. The option to “Reuse Dataset” is automatically displayed as a tab. Select the unused dataset you’d like to add from the list to reuse it.
You can also clear any unused datasets. Use the broom button “Clean up unused datasets” to remove unused datasets. Once removed, you cannot reuse the dataset.
One-to-many relationships
We can add new datasets that are related to existing datasets via a one-to-many relationship. This allows us to embed a collection of entities that are related to the main entity as an array in the same document. Let’s look at an example.
Add a one-to-many relationship
Assume that we want to import the table sakila.customer
into a customers
collection. We also want to include, in each resulting document, a list of all of the rentals by this customer.
First we set customer as our root dataset (we’ve omitted some fields for clarity):
Then we add an array.
We do so by right-clicking on the root node and choosing Add array field or selecting the root node and clicking the Add array button:
We then define the one-to-many relationship that is represented by this array.
As seen in the screenshot above, we chose to define our relationship using the existing foreign key fk_rental_customer
, and we chose rentals
as the name for the field that holds our new array.
Note that the Array content option is set to Object with a field per column. This means that in the final document, there is an array of objects for each rental, and each array contains a list of fields that map directly to the columns in the sakila.rental
table. We’ll have a look at the other options later.
After clicking OK, we see the following schema along with the JSON output preview, which show the new array:
On context
Adding a one-to-many relationship adds a new “context” containing (initially) just the dataset that was added.
From this new context, you can reference all columns of the new dataset as well as columns of the parent dataset.
You can, for example, include the customer name within each rental object:
You cannot, however, add a field in the top-level object that references the rental
dataset, because that dataset is not accessible (not in context) from the top-level object.
That wouldn’t make sense: because the top-level fields are there only once per customer, but there are many rentals per customer, how could you add the rental_date
as a top-level field? There’s no way to know which of the rentals are we referring to.
Array of final values
In the examples above, the array always contained objects. However, we can also define an array which contains final values (i.e. values which are not objects nor arrays). The content of the array is simply a reference to a column of a dataset.
For example: we want to import the country
dataset into a countries
collection, and for each country we want a list with all the cities that the country contains.
But we don’t want an object with multiple fields per city; we just want the name. So the array should be an array of names of cities.
In order to do that, we add an array that is linked to the city
dataset, related to country
via a one-to-many relationship.
In the array definition dialog, we’ll set the content as Final value.
Next we’ll be asked to specify which column we want to reference. In the example we reference the column city
in the dataset city
. That column contains the name of the city.
The result is as we wanted: a cities
array which contains only the names of the cities
Import many-to-many relationships
When migrating data from a relational server to MongoDB, in many cases you can just directly map each table to a collection, and be done with it.
When there is a one-to-many or many-to-one relationship in the relational database, you end up with a collection that has a field which references another collection, just like foreign keys work in the SQL world, except that of course, in MongoDB these “foreign keys” are implicit and unenforced.
For example, you can import customers and rentals such that the rentals
collection has a field customer_id
. The value of that field corresponds to the field customer._id
for the relevant customer.
Alternatively, you can embed the list of rentals in each customer’s document, as seen in the example for one-to-many relationships. The decision of which approach to follow (implicit reference vs nested list) can depend on various factors, such as the access patterns of the application and on how many rentals does each customer have. This topic is excellently covered in MongoDB’s post 6 Rules of Thumb for MongoDB Schema Design.
Many-to-many relationships, however, are a bit different. In relational databases, those are usually modeled by means of an associative table (or join table).
The Sakila database provides a classic example of this: there are films and there are actors. Because every film can have several actors, and every actor may have starred in several films, we cannot have the films directly reference the actors, or the actors directly reference the films. Instead, a film_actor
join table is used, which contains a record for every instance of a particular actor having starred in a particular film. It looks like this:
The actor
table
actor_id | first_name | last_name | … |
---|---|---|---|
1 | PENELOPE | GUINESS | … |
2 | NICK | WAHLBERG | … |
3 | ED | CHASE | … |
… | … | … | … |
The film
table
film_id | title | … |
---|---|---|
1 | ACADEMY DINOSAUR | … |
2 | ACE GOLDFINGER | … |
3 | ADAPTATION HOLES | … |
… | … | … |
The film_actor
table
actor_id | film_id |
---|---|
1 | 1 |
1 | 23 |
1 | 25 |
… | … |
2 | 3 |
2 | 31 |
… | … |
If we use Studio 3T’s SQL to MongoDB Migration to import these three tables directly into MongoDB, with no further configuration, we’ll end up with a film_actor
collection. Not the end of the world, but certainly not the MongoDB way of doing things. A more common approach would be to store an array of actor_id
s within each film. This way, given a film, we can find all of its actors without the extra step of consulting the intermediate collection. The complementary query, knowing all the films a particular actor has starred in, is also possible with the help of an index which covers the array.
Now let’s look at how we can set up the migration to achieve that result. This is really just a particular case of the Array of final values example above. We’ll import the actor
and film
tables to separate collections.
We’ll tweak the configuration for the films
collection so as to include a cast
field, which is simply an array containing the IDs of the relevant actors, taken from the film_actor
table. Note that while the film_actor
table is used as part of importing the films
collection, it doesn’t get imported into its own collection.
As an aside, note how we renamed the actor.actor_id
column to actor._id
. If we don’t do that, each document is assigned a new ObjectId as its _id
. This is fine, but then we’d need to remember to create an index on the actor_id
field to allow for efficient querying. Renaming actor_id
to _id
prevents the unnecessary creation of a new ID and keeps the result more consistent with how it was in the relational database.
Import a Hackolade model
SQL to MongoDB Migration supports the migration of MongoDB document schemas designed in Hackolade, an application for polyglot data modeling. To learn about importing a Hackolade model, see our Knowledge Base article Using Hackolade to configure Studio 3T SQL Migrations.
Automate migrations in both directions between SQL and MongoDB
If you’re interested in automating migrations in either direction between SQL and MongoDB, learn more in our Knowledge Base article Scheduling Automated Tasks in Studio 3T. Migration tasks can be saved to run later, or even scheduled to run on a periodic basis.