Early- vs. Late-Binding Speed: Which Is the Faster Data Warehouse?
|Click to enlarge|
When it comes to early- vs. late-binding models of data warehousing, which is the fastest for healthcare? Which will allow an end user to skip the long wait times, the guessing, the back-and-forth, and get the information they need? What is the speed of late-binding? I’ll skip the drama and suspense and tell you straightaway that the Late-Binding™ approach to building a healthcare enterprise data warehouse (EDW) is the fastest and most flexible option. Read on if you want to know why.
In the context of data warehousing, late binding has to do with how data is modeled within the EDW. Data warehouses pull data from many different systems into one place—the data warehouse. To make this data meaningful it is “bound” to other elements for analysis. Binding is the point at which the data in your EDW is tied to a healthcare vocabulary (like SNOMED) or business rule and bound to a specific format. An example of a business rule is the definition of length of stay (LOS). Do you define LOS as the time from admit to discharge? From registration to discharge? From presentation in the ED to discharge?
The Disadvantages of Binding Early
Early-binding data warehousing approaches require you to—you guessed it—bind your data to business rules early in the process of building the EDW. This means that from the moment you start building your data warehouse, you have to decide what your definition of LOS, for example, will be. In fact, you have to decide upfront what the definition of every business rule will be. Once your definition is set in the data model, it is very difficult and time-consuming to change it afterward. This means that you had better be sure in advance that you want your definition of LOS to be the time from admit to discharge.
One pitfall of the early-binding approach is that it takes longer to get your EDW up and running, because you have to model your entire EDW and define every business rule from the outset. In fact, this approach can take as long as one to two years to get up and running. But the most notable pitfall of these approaches is, as I already mentioned, that you risk painting yourself into a corner. When you bind early, you get stuck with a rigid data model that doesn’t adapt well. This can work very well in industries like retail or banking where business rules are stable. But in healthcare, business rules and definitions are changing all the time—especially in today’s evolving industry. You need more flexibility.
The Advantages of Binding Late
The late-binding approach is all about speed and flexibility.
Here’s why the late-binding approach is faster: You don’t have to hammer out your entire data model or hash out all of your business rule definitions upfront. If you’re not going to do any analysis of diabetes outcomes in the first year of your EDW initiative, then you don’t have to worry about defining business rules dealing with diabetes. When you’re ready to tackle diabetes the next year, you probably already have most of that raw data in the EDW and you can just define the business rules. In other words, with the late-binding approach, you can start with the use cases you’re ready for and then add others as you’re prepared to address them.
Here’s why the late-binding approach is so flexible: It enables you to bind data to business rules as late in the data-modeling process as possible. In fact, the late-binding approach advocates binding data at the last moment that it makes sense to do so—and only when a specific use case calls for it.
Let’s go back to the example of defining LOS. LOS can be defined in many ways not only by different hospitals but also by different departments within hospitals. This means that agreeing on a definition of LOS upfront that will serve everyone’s needs can be quite a headache—and will end up satisfying only one (and maybe none!) of those needs. (It’s worth noting here that a good data governance policy can alleviate a lot of these issues around differing definitions and help everyone achieve their goals.) With the late-binding approach, this is a non-issue. You simply pull all the relevant data—admit date, registration date, ED presentation date, etc.—into the EDW, and then when you’re ready to tackle a specific use case, you can decide on the definition of LOS for that use case alone.
A straightforward example of this flexibility occurred recently with a client I’m working with. This client had decided to define LOS as the time between admit and discharge. However, as we began analyzing LOS, we discovered that admit date data wasn’t populated in their system nearly as well as registration date, and they had far less data than they needed to run their analytics. They determined that registration date was fairly equivalent to admit date for the purposes of their analysis. Because of the flexibility of the late-binding data model, they were able to change their definition easily to the time between registration and discharge—and they were able to perform a much richer analysis. If the client had been using an early-binding EDW model, they would have been stuck.
These types of situations happen all the time in healthcare. But with the late-binding approach, it’s pretty tough to paint yourself into a corner.