The Late-Binding™ Data Warehouse: A Detailed Technical Overview
The Concept of Data Binding
Data can be “bound” to business rules that are implemented as algorithms, calculations, and inferences acting upon that data. Examples of binding data to business rules in healthcare include
- Calculating length of stay (LOS)
- Attributing a primary care provider to a particular patient with a chronic disease
- Calculating revenue (or expense) allocation and projections to a department or physician
- Data definitions of general disease states for patient registries
- Defining patient exclusion criteria for disease/population management
- Defining patient admission, discharge, and transfer rules
Data can also be bound to vocabulary terms, for both local and industry standards. Examples of vocabulary binding include
- Patient identifier
- Provider identifier
- Location of service
- Diagnosis code
- Procedure code
Knowing when and how tightly to bind data to rules and vocabularies is critical to the agility and success—or failure— of a data warehouse. In healthcare, the risks of binding data too tightly to rules or vocabularies are particularly high because of the volatility of change in the industry. Business rules and vocabulary standards in healthcare are among the most complex in any industry, and they undergo almost constant change.
Lessons From Software Engineering
The idea of late binding in data warehousing borrows from the lessons learned in the early years of software engineering. In those early years, very large software programs characterized software development—it was very common to program hundreds of thousands of lines of code in a single module, supporting numerous and widely different business functions. The code for these varied business functions was tightly bound (also known as coupled) together all at once, at compile time. It was a time consuming process to write and troubleshoot these large programs. If one piece of the program failed at compile time, the entire program failed. It was all-or-nothing programming. Also, if the program required changes or modifications because of new business rules and requirements after compile time, the entire program had to be modified, re-compiled, and placed back in service, often with significant downtime. Agility suffered enormously.
Object Oriented Programming and Late Binding™
In the 1980s, software engineering practices would change significantly, moving away from large, tightly coupled, early binding programs. Alan Kay from the Universities of Colorado and Utah and Xerox/PARC introduced the concept of late binding and object- oriented programming. This new approach was based upon two radically new concepts: (1) Writing code in smaller modules or objects that were modeled after processes and services in the real world that the software was designed to support, and (2) binding these software objects at run time, not compile time, and only when those objects were needed to support the services they reflect.
Alan Kay’s new concepts for software engineering sat underutilized and largely unknown, confined to PARC and academic circles, until Steve Jobs founded NeXT. Jobs was not a programmer, but he instinctively understood the elegance of Kay’s concepts. Object-oriented, late-binding software engineering became the standard practice at NeXT and paved the way to commercial, large-scale adoption of Kay’s philosophies. Steve Jobs receives due credit for his innovation and leadership at Apple, but by making object-oriented, late-binding software a new commercial norm at NeXT, he paved the way for the entire software revolution in Silicon Valley. The agility, scalability, and performance of platforms such as Amazon, Google, Facebook, and Salesforce were enabled by this new approach to software engineering.
Data Engineering and Late Binding™
After witnessing and reflecting upon the failure of several multimillion-dollar data warehousing projects in the US military, Dale Sanders, Senior Vice President for Strategy at Health Catalyst, saw the same patterns in data engineering as those in software engineering prior to object oriented programming. Early and tight binding of data to rules, models, and vocabulary led to unnecessary complexity that delayed time-to-value and led to a very fragile and inflexible data warehouse infrastructure that could not adapt to rapidly changing analytic use cases or new data content.
In the late 1990s, while Sanders was employed by TRW Inc., he was sponsored by the Pentagon