TL;DR: Studio 3T saves weeks of time by providing the tools needed to effectively edit, update and regularly export health data for multiple life-saving studies.
A look at the EAS FHSC study’s use of MongoDB and Studio 3T
September 24th is International FH Awareness day. You may not know what FH is, but at Studio 3T we were pleased to find that Studio 3T tools were helping out in the effort to learn more about this genetic condition which can silently affect millions of families.
One of the biggest challenges for public health research is gathering data. Take the EAS Familial Hypercholesterolaemia (FH) Studies Collaboration, running out of Imperial College London, which sets out to gather, collate and analyze data on patients diagnosed with FH from around the world. Studio 3T is at the heart of this flow, and we wanted to find out more about its role.
What is Familial Hypercholesterolaemia?
Familial hypercholesterolaemia (FH) is a very common genetic condition that leads to high cholesterol (LDL-C). It prematurely affects the heart and circulatory system. FH is asymptomatic. If it’s not identified early and appropriately treated, individuals with FH can have heart attacks in the 3rd decade of life.
Currently, there are more than 25 million individuals with FH worldwide, but less than 10% have been identified and treated. There are many challenges to improving identification and treatment of FH.
With that in mind, the European Atherosclerosis Society FH Studies Collaboration (FHSC) set out to build a harmonized pool of multinational data. This data enables researchers to characterize the affected adult and child populations and describe how they managed the condition. The key challenge with FH is increasing awareness of the genetic nature of FH and getting people to understand that it is possible to inherit a cholesterol problem.
Preparing the incoming data
With 77 countries signed up, that incoming FH patient data comes with a number of challenges to establish a good quality, consistent dataset. The team at Imperial turns the incoming raw information into conforming records and then moves it into a MongoDB database. Using Studio 3T, it then turns the data into exported results sets for internal EAS FHSC researchers to import into their preferred statistical analysis tools.
The process begins with the EAS FH web portal. This is where data on patients can be entered a patient at a time, or via bulk upload using CSV files. An importing application built with Talend has been custom configured to take on the variations in data that are found at this stage. Each participating organization has differing sources, layouts and specifications for this information. Although there is a common specification for presenting data, there’s still cultural and measurement variations.
Once resolved, the data is pushed to a MongoDB server, into its own collection, for further validation. This is a staging server for incoming data. Here the documents can be worked on before they are merged with the data warehouse dataset.
Studio 3T In Use
Here is where Studio 3T enters the equation. It provides an efficient frontend to querying the new data and allowing corrections to be applied in place. Although the importer can handle a common range of variations in the incoming dataset, it takes the hand of a skilled researcher with the right tools to spot the other anomalies that may be present and apply appropriate corrections.
Christophe Stevens is the software engineer and data manager for the project. He explained that he “initially really struggled to find a good IDE for MongoDB and even considered switching to another database management system altogether for that reason”. He discovered Studio 3T and its “advanced features, and especially IntelliShell” which allowed the project to continue with MongoDB.
Collating the data in Data Warehouse
The staging server is, of course, just part of the journey for the data. Each record is brought into conformity on the staging server. Then a custom built Java application brings those records over to the data warehouse server. This application makes sure the data is correctly and safely merged with the already present corpus of patient data.
Once the data is in the data warehouse, Studio 3T comes into play again. This time allowing for later corrections and normalization through its inline editing capability.
As of writing the current database carries extensive records for 68,765 patients from 66 countries. More data is coming online regularly.
Packaging the data for Research
The final step in this system is turning the data warehouse data set into selected data sets for researchers. The actual analysis of the data is done outside of MongoDB. Typically this is using the researchers’ preferred tools, such as R, which are built for deep statistical analysis. The EAS FHSC project’s role is to give the researchers the data they need for that analysis. That means creating queries to provide exportable data.
Studio 3T provides the tools to create those queries and to export their results to the researchers preferred formats. Studio 3T’s Tasks feature can save and schedule operations like this. It is used to repeat exports easily to update data sets.
Stevens says that Studio 3T’s “IntelliShell, saved tasks, export and import features help a lot and save probably around 3 weeks of development and database maintenance work every year.”
From Datasets to Research to Results
These data sets are then sent on to researchers who begin their analysis and eventually create papers. These papers help in expanding what we know of Familial Hypercholesterolaemia. One of the first papers that used the dataset was, published in the Lancet: Global perspective of familial hypercholesterolaemia: a cross-sectional study from the EAS Familial Hypercholesterolaemia Studies Collaboration (FHSC). This paper highlighted the problem that, although hereditary, detection of FH typically happened quite late. Often, this was when symptoms were being investigated. Early diagnosis and combination therapies were the most effective approach to addressing it.
By being able to marshal together data about FH from many different countries around the world, the EAS FH project is able to provide a fuller picture of a widespread, poorly detected, life-changing condition. Studio 3T is proud to provide tools that save the team time as they collate that data.