Designing a Clinical Data Mart (Part 1) - Unified Dimensional Model (UDM)
In this tutorial we will attempt the design of a sample research Clinical Data Mart (CDM). We will discuss the basic approach to defining the Clinical Unified Dimensional Model (UDM) design, possibilities of extending the Clinical DM to include claims data, implementing an Analysis Services Multidimensional Cube, and finally performing analytics against the data, with tools such as Power BI and Excel. This tutorial for is broken into three parts:
- Clinical Data Mart (CDM) - UDM design
- Clinical Multidimensional Cube - Analysis Services, Online Analytical Processing (OLAP)
- Data Visualizations and Dashboards - Developing with Power BI and Excel
We assume you have experience with Kimball approach to dimensional modeling, OLAP and general business intelligence concepts & practices. This tutorial is not intended as a full clinical research data mart solution, rather the tutorial builds a basic foundation for future extensibility. We do not cover data integration (ETL), or in-depth detail design for future state development. However, we do provide design diagrams, sample code and reports for demo purposes only; in addition, we encourage you to reference the third-party links we include throughout the tutorial for further guidance.
Note: This tutorial assumes you have the following tools installed:
- A Relational Database Management System (RDMS) such as SQL Server, Oracle, MySQL or other. For the purpose of this tutorial we are using SQL Server.
- SQL Server Analysis Services (SSAS)
- Excel and/or Power BI Desktop
- Visio or another Entity Relationship Diagram (ERD) tool.
Tip: Some links in the article below are used for description purposes only. Place your pointer over the link to read a description, where applicable.
Let get started!
The purpose of this high-level Data Mart design example is to define the necessary parameters required to structure the unified dimensional model. This will ultimately lead to the physical design of the clinical research database. Though there are many opportunities to develop this model in support of multiple subject areas, dimension and fact tables, we will only provide the necessary tables to provide an entry-level example of the clinical data mart. You may always contact us directly, should you require consultancy support for your specific design requirements.
Step 1 - Define Dimensional Model Objectives
Describe what the purpose of our dimensional model.
- Create a dimensional model which supports temporal analysis of patient encounter, diagnosis, procedures, medications and related vitals.
- Define a Star Schema structure to support the design objectives of item 1.
- Optimize the dimensional model design for steps 1. & 2. in support of Analysis Services cube design.
Step 2 - Defining Data Sources
Consolidate a list of source systems where known data resides.
This step requires quite a bit of time identifying, prioritizing and planning the data flow into your staging and data mart environments. For the purpose of this tutorial, we will not cover the extraction, transformation or loading of data. We have provided the following Model Template to help guide you. You may also visit here for additional Kimball information and downloads.
Step 3 - List Fact Tables and Granularity
List fact tables you want to include for the data mart and what measures your want to track for each fact.
We will begin by listing all business processes (fact table) we want to include in our UDM. For this tutorial we will only be using two fact tables. The approach we are taking is to begin with preparing our bus matrix with the fact and date dimension. Our bus matrix will evolve as we continue forward with the tutorial. As you can see below, the Encounter and Prescription fact table has an association with the Date dimension table.
We're now going to define the level of granularity for the fact tables.
- Encounter - The date the patient visited the provider facility, reason code, diagnosis, procedure and general vitals information.
- Prescription - Provider prescribed medications for patients by date, number of refills, dosage and fill date.
Our fact tables look similar to the image 1.
For the purpose of our analysis requirements, the fact tables above cover the level of granularity we're looking for.
Step 4 - List Dimension Tables
We will list the dimensions required for our UDM design.
Our business rule states that all the dimensions presented below are derived from a patient encounter, with the exception of Prescriptions. The Prescription fact table is patient centric, so doesn't require the patient encounter instance. The fact tables "FACT_ENCOUNTER" and "FACT_PRESCRIPTION" tell us we'll need the following dimensions to support our design:
- Date - Provides date information from a start period to end; the date values may be added at any point to extend the range.
- Patient - Patient demographics table.
- Provider - Provider identification details, specialty and location.
- Diagnosis - Diagnosis code and description.
- Procedure - Procedure code and description.
- Prescription (RX) - Prescription code and description
Note: Optionally, the Prescription fact table may also be modified to including the RX fulfillment details and/or be required to derive from an Encounter. If this is the case, we can simply add a dimension to hold RX Fulfillment details; additionally, we could create an Encounter dimension so that encounter id's may be referenced from the FACT_PRESCRIPTION table. We will not be performing this task for our design.
We will now update our Bus Matrix to reflect the changes to the UDM design, to include the additional dimensions.
We've predefined each of the dimensions with the bare-minimum attributes. Our UDM is taking shape and looks similar to below.
Step 5 - Optimizing the UDM Design
For this step, we want to think about reducing the potential size of our fact table(s) and increasing the flexibility to filter. Our approach to implement a many-to-many bridge between our fact and commonly inserted dimension keys.
Many-to-Many (Bridge Group)
If we continue with the design above, for each diagnosis or procedure we would need to insert a distinct record into the Encounter fact table, which corresponds to the code (e.g., diagnosis code) for a specific patient encounter and date, or we would have created predefined list of columns (e.g., dx_1, dx_2, dx_3, and so on...). This approach grows the fact table exponentially and/or limits our filtering to the predefined number of columns during our analysis - we want to see all patient diagnosis, procedures and prescriptions for our analysis.
What if we could insert one diagnosis key into the fact table, which in return groups one or many diagnosis or procedures for the patient encounter? We can accomplish this feat by creating a Bridge Group table which joins between the fact and the related dimension table. For example, a patient may receive three or more diagnosis during the encounter. With the many-to-many approach, we can create a new diagnosis group key into the GROUP table, insert that key into the BRIDGE_GROUP table to assign related diagnosis codes for the patient encounter. Finally, we insert the single Group Key to the Encounter fact table. This makes it easy to query against the bridge table and get all related diagnosis from a single group key. To demonstrate this concept, please see the image below.
What's going on here?
- The GROUP_DX table is used to generate our diagnosis group key to be used within the BRIDGE_GROUP_DX table and FACT_ENCOUNTER table.
- The BRIDGE_GROUP_DX will hold one or many diagnosis codes for the patient encounter.
- The DIM_DIAGNOSIS is where related diagnosis system codes reside.
- The FACT_ENCOUNTER holds all patient encounter details and diagnosis or procedure codes.
- We will eventually use this model for our Analysis Services cube design, so the BRIDGE_GROUP_DX table will act as an Intermediate Measure Group from within the Analysis Services Dimension Usage, Many-to-Many type.
Our final UDM design will look similar to below. We've listed the ETL sequence below for help with understanding data load.
- This model may be extended to include additional fact tables. For example, you may want to add an Observation, Claims, or Lab Orders, and so on.
- You may add additional dimensions to support new research analysis requirements. For example, Network, Status (junk dims), Type of Bill, Member Eligibility, Place of Services, and so on.
- The UDM may be used for querying the model directly, from an OLAP cube, or from advanced statistical analysis scripting tools like R and Python, or from Azure Machine Learning.
This tutorial provides a basic approach to defining a clinical research data mart. We taken a basic star schema and modified it to support a more flexible many-to-many relationship design, for our system code requirements. If you would like to learn more about the data warehouse modeling and concepts, I would recommend you visit the Kimball Group website, or https://www.microsoft.com/en-us/sql-server/data-warehousing. Optionally, you may contact our team for project support. You can reach us at +1 (877) 488-0566 or visit us online at www.infotoad.com.
Download the full ClinicalDM script below. Note: The script was created using SQL Server. You will need to create a database named "ClinicalDM" before running the script.
ClinicalDM.sql (24.72 kb)
Our next article/tutorial will cover implementing a Clinical Multidimensional Cube, using Analysis Services Online Analytical Processing (OLAP) tool.