About the Core Clinical Data Model
CoreCDM.Rmd
Core CDM is a set of curated tables in a Cloud Data Warehouse (CDW) to address most clinical data science applications and use-cases.
Core CDM is designed to reduce the friction of acquiring and analyzing raw data from the CDW while leveraging the strengths of the Snowflake data platform.
Core CDM is patterned after other popular clinical data models (ie MIMIC-III and OMOP ) that have a handful of de-normalized tables that are more intuitive and require less effort to combine and prepare for analysis than typical database models such as ‘star schemas’.
- Snowflake optimization: Snowflake has two primary cost drivers - data storage and data computation. Data storage is relatively inexpensive and computation is more expensive, particularly with complex queries with numerous table joins. A best practice in Snowflake is to denormalize complex queries or views to static tables.
Core CDM addresses a few issues with the existing CDW architecture:
Different primary keys: The existing main databases all use different primary keys, which makes it more difficult to join Unity data with Lucidity data for example. Core CDM uses ‘PAT_ENC_SID’ as the primary key for nearly all tables.
Data delays: While SagaLucidity is updated daily for the previous 24hrs, other CDW databases have a delayed refresh cadence that generally requires the account coding to be complete, which can be 3-7 days after discharge.
Core CDM is updated every morning at 7am with the data from the previous 24 hours, including current admissions and discharges BEFORE coding is complete. This means that the same data model can be used for retrospective projects based on coding as well as more timely projects based on Saga documentation and clinical data.
-
Non-standard naming conventions: Tables in
SagaLucidity, Unity, Marimba, Canoodle, CPS datamarts and other
databases use different naming conventions for the same data fields and
data types. In addition, the column names frequently are named
differently than the data types (ie. columns with
_TIME
actually have dates, columns with_DATE
have datetime values, foreign keys are character columns in one table and numeric in another etc.)
Core CCDM uses a controlled vocabulary where:
- date fields end in
_DT
in ISO format - datetime fields end in
_DTS
in ISO format - Boolean fields end in
_FLAG
and are coded to 0/1
- Numeric keys end in
_ID
- Text keys end in
_CD