The Late-Binding™ Data Warehouse: A Detailed Technical Overview
to study advanced decision support in nuclear warfare operations—a project called the Strategic Execution Decision Aid (SEDA). He turned to the healthcare industry for what he expected to be role-model examples of computer-aided analytics to drive better decisions in time-critical, life-critical situations but instead found almost no examples, with the notable exception of a scattered few at Intermountain Healthcare in Salt Lake City. Intermountain clearly possessed the culture and willingness to fully leverage data for improving care, but the industry at large was many years behind. Anticipating the eventual demand for analytics in the industry, he made a career transition from the military, national intelligence, and manufacturing sectors into healthcare.
Sanders’s late-binding data engineering concept is now fundamental to Health Catalyst’s data warehouse platform. The Late-Binding™ Data Warehouse enables time-to-value that is measured in days and weeks, not months and years, and has proven many times more scalable and adaptable to new analytic use cases and data content than the methodologies that utilize early binding, tightly coupled data models and vocabulary management.
Data Binding Points in an Enterprise Data Warehouse
There are six points in a data warehouse at which data can be bound to rules and vocabularies. As the data flows from left to right in the diagram below, points 1 and 2 are appropriate for binding to rules and vocabularies that exhibit low volatility; that is, those rules and vocabularies that change infrequently, such as patient identifiers and provider identifiers. Late binding—at points 4 and 5—is appropriate for rules and vocabulary that are likely to change on a regular basis, or for which no standard rule or vocabulary exists. For example, binding in the visualization layer is appropriate for what-if scenario analysis that is associated with modeling different reimbursement models or defining disease states. Once that exploratory what-if phase is complete, the new models and definitions can be locked down and bound in points 3, 4, or 5.
It is a best practice to retain a record of the bindings in the data warehouse. This record will allow analysts to quickly run models on rules and vocabulary (e.g., ICD-9 to ICD-10) that change over time, which is helpful for forecasting and predictive analytics. Health Catalyst recommends embedding the history of vocabulary and business rule binding into the data structures of the data warehouse so that they become a self-contained configuration control library that can easily be used to retrace analytic history when necessary.
Knowing what to bind and when in the flow of data in a data warehouse requires more than technical skills. Data engineers and architects who work in a Late-Binding Data Warehouse environment must possess a strategic understanding of the short and long-term evolution of the entire industry. They must appreciate the historic volatility of vocabulary and business rules as well as an ability to predict the velocity and the specifics of volatility in the future. Healthcare is undergoing changes to business rules and vocabulary at an unprecedented rate. Data warehouses must be designed to keep pace with the market, and the Late-Binding architecture has a proven track record of agility and adaptability to new rules, vocabularies, and data content that other designs have not matched.
Data Modeling and Data Binding
Relational data models are inherently binding—they bind data to business rules and relationships. When developing transaction-based applications for capturing data, a data model is an important aspect of the application design and data integrity strategy. When designing a data warehouse, data models can inhibit adaptability to new analytic use cases. Below are the current options for modeling data in a data warehouse, listed in order of progression, from early to late binding.
The Inmon, Kimball, and I2B2 approaches to data modeling are inherently early binding. They require all source system data to be mapped into predefined data models, a process called conformance and normalization. The terms imply exactly what is required—data that was modeled and captured in disparate source transaction systems must conform to a new data model in the data warehouse. While at first this approach might appear reasonable, in practice, it leads to major problems when applied to the healthcare industry.
In analytic environments where data content, use cases, data rules, and vocabulary change infrequently—such as the retail industry where the data model is largely reflected in the simplicity of a transaction receipt—the Kimball and Inmon approaches are adequate. In the healthcare industry where the data environment is much more complicated than a sales receipt and the analytic use cases are constantly changing, these early binding data models can be disastrous in their consequences to agility and initial time-to-value. The process of mapping and conforming data to these early binding models in a healthcare delivery data warehouse typically takes 18–24