Nowadays, any organization is likely to be running their affairs on an estate of databases of mixed parentage. In such a context, what factors need to be considered when adding MongoDB to the mix?
A survey of over 18,000 DevOps professionals working with MongoDB (2020, Studio 3T) revealed that 79% of respondents are working with at least one variant of SQL too.
Of those other databases, in 2020, 45.4% are using MySQL and nearly 25.4 % are using Microsoft SQL Server alongside their installation of MongoDB.
When you start an initiative in any organization that involves data, you have to take into account the constraints under which you can operate.
The type or brand of the database is irrelevant. Even the most esoteric of databases are subject to the same bewildering variety of rules, laws, and constraints once they hold data. However cool the startup, there inevitably comes a cold reality when it comes to handling and processing data.
The involvement of an ever wider range of job titles all the way along with the DevOps toolchain requires every business to have a clear data playbook that everyone shares – a checklist like those used to such powerful preventative effect in both aviation and medical surgery.
We need a modus operandi for handling customers’ data, as simple and rigorous as the ones we apply to customers’ lives.
The seven topics that are discussed in the checklist are experience-based notes; one that can save data and secure it, but also make it available for analysis, even while our data management systems and processes become faster and more complex by the hour.
Here’s the list:
- Security
- Privacy
- Effectiveness
- Auditability
- Retention
- Disaster Recovery
- Business Strategy
We’ll cover Effectiveness, Auditability, and Disaster Recovery below.
Effectiveness
There was a time when databases couldn’t be used for fast-moving, multi-threaded trading.
If you sold a unique product such as a work of art to two people simultaneously, it could be embarrassing.
If the contents of an account could be withdrawn several times over before the system was able to update the contents of the account, then it couldn’t be used for banking.
The collapse of several Bitcoin exchanges was due to precisely this failure, which was then exploited by hackers.
The features that prevent this happening are summed up by the acronym ‘ACID’. A software system that works effectively for social media without any ‘transactionality’ doesn’t necessarily work for commerce. Likewise, there is an optimistic idea that caching data represents a ‘free lunch’. Not so: when done incorrectly it subverts
ACID compliance.
The good news is that as of version 4.0, released in mid-2018, MongoDB is ACID-compliant and not only wipes its nose like a heavyweight database but can be seen to wipe its nose; and be documented as doing so.
Auditability
Some legislation touches on the problem of using data as ‘evidence’.
SOX (Sarbanes Oxley Act of 2002) is an example, where there is a concern about preventing the management of a company from misleading its owners or shareholders.
It must be possible to prove that data hasn’t been altered since it was first added. In the early databases, this wasn’t a problem because data wasn’t ever deleted but merely superseded, like the handwritten ledger, clay tablet, or palimpsest.
Nowadays, important data such as financial reporting, invoices, and receipts has to be tracked to ensure that any tampering is detected and corrected. This requires an auditing system that is independent of the database.
Most commercial databases currently provide auditing components to enable compliance. Data backups can provide a good belt-and-braces method of auditing, providing robust evidence if they are retained for long enough. Backups cannot be altered because there are checksums to every page in most commercial-strength database systems.
MongoDB offers a thorough guide on Auditing.
Disaster Recovery
It must seem obvious, but a database system has to be regularly backed up as a precaution against disaster.
There are, of course, other reasons to back up data but Disaster Recovery is one of the first duties of any operational aspect to an organization’s handling of data.
Any database has to be judged by the speed and effectiveness of its recovery process. Can it, for example, restore to a completely consistent state?
To judge the effectiveness of recovery, it has to be measured against the agreed maximum downtime that the business can tolerate without loss.
How reliant is the recovery of having a particular hardware configuration? Every organization will have its own requirements, so there will be a wide range of disaster recovery solutions.
The system should provide incremental backups between full backups so as to provide up-to-the-minute recovery. Few organizations can afford to lose trading information because a database cannot be recovered right up to the point of failure.
MongoDB has a guide on Backup and its Role in Disaster Recovery.