What is the Best Healthcare Data Warehouse Model for Your Organization? (Webinar)

Comparing Healthcare Data Warehouse Approaches:

A Deep-dive Evaluation of the Three Major Methodologies

February 2014

[Steve Barlow]

Thank you, Tyler.  Good day everyone.  It’s great to be with you.  I’m looking forward to spending some more time talking about these approaches in healthcare data warehousing.

A Personal Experience with Healthcare

I want to start with some anchoring context.  And first of all, let me share some personal experiences I have had with healthcare that I think might lay some applicable groundwork.

First experience was with my mother who happens to be and has been a diabetic for type 2 diabetic for about 15 years.  When we were building our first data warehouse when I was working for Intermountain Healthcare, the first registry that our clinical teams asked us to assemble was a diabetes registry.  Two weeks after we built the registry and it went live, I was visiting mom and there in the countertop at mom’s house was a report generated out of my data mart that reminded here as a diabetic patient to go in and see her endocrinologist to get her hemoglobin A1c tested and she should plan on doing that annually.  That was a defining moment in my life and in my professional career, recognizing the power the data has to change and affect the quality of lives of the patients that are served in the US Healthcare System and abroad.  That’s one.

The second one was a trip to the doctor.  I think I had an office visit and I was given two prescriptions by the physician.  I took those scripts to the pharmacy and the pharmacy said, “That will be $490”.  They were two topical ointments.  It’s a cream and one is a foam.  And I said, “You have got to be kidding me.  Is there any less expensive alternative to these scripts?”  And he said, “Oh certainly.  Would you like me to call the physician and arrange the writing of an alternate script?”  And I said, “Absolutely.”  So the pharmacist called the doctor and makes changes to the scripts.  I leave the pharmacy paying $13 for the prescriptions.  Again, a defining moment in my professional career, realizing there are serious challenges in the healthcare system and providers typically don’t have a good handle on the cost of the care delivered.

So that’s context one.

Healthcare Analytics Goal

Why have an EDW?

The other thing I’d like to just level set a little bit on is why we have a data warehouse to begin with.  I think too often, we, especially in the IT ranks and the data ranks, immediately go to debating what data model should be the right data model for our application.  And we forget why we are building a data warehouse and why it’s needed in the first place.  So we are always asking ourselves and asking others in the industry and our clients, why do you need to build the data warehouse, how will it be used, what is the business and/or clinical imperative.

A data warehouse is a means to a greater end, it is not the end.  It should exist for the following three high level reasons.  First, it should be used to support the improvement of the effectiveness of care delivery, including safety.  Second, it should be built and used to support the improvement of the efficiency of care delivered, e.g. workflow.  Let’s make sure that we did our patients through the emergency department as optimally as we can.

Well let’s define one way for a conscious sedation and you’ve got one standard best practice way for conscious sedation/med administration across our healthcare enterprise.  And ultimately we want to improve and reduce the time to doing #1 and #2.  We want to reduce the mean team to improvement.  So the time the data are available and we define an intervention and care is actually improved, made safer or made more efficient, we all the time want to reduce that mean time for improvement as much as possible.  So again, level setting, what are our goals, pie in the sky stuff.

Three Systems of Care Delivery

We also don’t see a data warehouse as the only part of the solution.  As healthcare organizations, both payers and providers, embark on this journey to use analytics for improvement of care, we see three major subsystems that are very very important in this journey.

Analytic System

The first and the one that will spend certainly most of our time today is the analytic system.  There needs to be a measurement system to help us, one, identify where we have variation in care delivered across our system.  And then prioritize what our greatest opportunities are.  And then next, that system also needs to help us define a population of patients or events or processes to help measure what the standard of care should be and then track compliance over time to that standard of care.   So that’s the analytic system.

Content System

The content system is the evidence both internal and external to an enterprise for the best practice in managing a given process or population.  So what does the evidence tell us is the best way to manage a heart failure population of patients.  What are the data we need to measure, how do we define empirically that population of heart failure patients and what metric should we track over time to track our management of that population.

Deployment System

So then finally the third is the deployment system.  Once we’ve defined empirically that population of patients or events and we’ve defined what the standard, the best practice should be, either a care process or a workflow, then we need some kind of a framework or methodology to deploy that standard across the system, so we’re doing it the best way everytime at every site and in every setting.

So those are the three major systems.  And again, an analytic system alone is not going to result in improvements.  We’ll need to marry all three of these subsystems.

Population Health Management

What we also want to do is we focus on historically there’s been a bit of a (07:20) mentality in healthcare and frankly in other industries.  Let’s identify the outliers and cut the tails off the normal bell-shaped curve.  And in reality what we want to do is focus on the inliers and let’s push the whole curve, heighten it and push it to more excellent outcomes as opposed to just focusing on the outliers.  Philosophically that’s the approach that we all should be taking in healthcare improvement.

Healthcare Analytics Adoption Model

I’m not going to spend a lot of time.  You have seen this before if you attended some of our other webinars.  But just to splash this Healthcare Analytics Adoption Model up, it’s a model that we helped and collaborated with others in the industry to provide a framework against which organizations can measure themselves and kind of track and target their maturation through the analytics, kind of analogous to what we remember seeing in EMR Adoption Model.

But really quickly from the bottom up, at level zero, there really is no analytic solution, per se.  Level 1 we begin to co-locate data and have a centralized repository.  Level 2 we begin to standard vocabularies.  Levels 3 and 4 we can automate some of our reporting.  Level 5 we begin to focus on reducing wastes again in care delivery and efficiency.  Level 6 and above, we really get into managing populations of patients proactively.

So leaving this up, I’d like to pause and just issue a poll question to you all very briefly about where you might see your organizations measured against this model.

Tyler, do you want to ask the question?

Polling Question

[Tyler Morgan]

Absolutely.  So our poll is up.  At what level in the Analytics Adoption Model would you rate your health system’s analytic solutions?  So we’ll leave this poll up for another 5 to 10 seconds to give everyone a chance to respond.

Okay, we’re going to go ahead and close that poll now and let’s see our results.

[Steve Barlow]

Interesting.  That’s great.  That does not frankly come with a huge surprise.  We believe that healthcare as a whole is in its infancy in using analytics, far behind other vertical industries.  So that’s a great representation.  Thank you for the participation in that poll.

An Analyst’s Time

Another thing is we contemplate building a data warehouse and going back and (10:27) ourselves to what our purpose is.  We see often, too often, these very well-prepared and trained statisticians and knowledge workers who are a rare commodity in healthcare, spending the vast majority of their time doing wasteful activities, wasteful from the perspective of operating at the top of their license.  They were trained to interpret, analyze information and they spend the vast majority of their time hunting, gathering, and manipulating data, and a minority of their time, a small percentage of their time actually doing what they were trained to do.  We need a solution that will flip that on its end and allow analysts to spend the bulk of time analyzing and interpreting data.

HR – Desired State

Another interesting concept that we have seen is typically as we build analytic systems, we see a distribution like you see in the pyramid in the upper left corner here where there’s a large number of viewers, and that makes sense.  You’ll have folks, consumers of information who receive passive information.  They receive the report in their mailbox, they go out and look at the dashboard report for 5 seconds and they’re on their way.  And then there’s a set of drillers who do a little more interaction with the data and then authors or knowledge workers who are the ones producing information.

We actually see a need for a different distribution, especially in healthcare.  We need to increase the number of authors and knowledge workers.  We still certainly see a need for viewers, but fewer drillers.  We see a need for increasing the number of module workers to enable knowledge discovery of best practices.  So the analytic approach and solution needs to enable and facilitate that kind of a distribution.

Comparison of prevailing approaches

Enterprise Data Model

I want to quickly go over at a high level and we’ll go into detail later in the call in the webinar the three main approaches that we have seen and experienced in building data warehouses and analytic solutions, and we’ve got experience actually in all three of these.

The first is the Enterprise Data Model which contemplates the need to building as repository, an integrated and centralized repository that includes all information we could ever want to analyze, unencumbering ourselves from the source data that we already have in our data ecosystem.  So let’s lock ourselves away the information we want to measure and then go back to the very system that contained information within our healthcare system, and map and manipulate the data to fit in to this model.  That’s option one.  I’ve made these arrows in moving data from the source systems into the new model large for a reason.  There are a lot more manipulation and transformation that takes place in moving a data from the source into the target data model that does not necessarily reflect the source.

This in theory if I’m building an (13:50) key system makes tremendous sense.  If I’m building an analytic system, however, I have to consider the data that does exist in my source systems rather than being unencumbered by those data.

Enterprise Data Model – Still need Subject Area Marts

Option two…so option one, before I leave that, I still need to get and contemplate, I need to get to data marts.  In healthcare, if I’m going to manage populations of patients and populations of events, I still need to materially build registries, if you will, for those populations.  So the enterprise model gets you this so far.  I still need to take the next step to get to data marts.

Bill of Materials Conceptual Model

The next model briefly gives some context.  The Star Schema or dimensional or vertical summary of data mart model really was born out of this bill of materials conceptual model.  In manufacturing retail insurance, this bill of materials model is very predictable .  Typical analyses that are performed against these in these industries are simple counts or simple aggregations by various dimensions.

Star Schema Conceptual Model

The Star Schema idea basically had that in mind.  I want to be able to do, to count the number of transactions by location, by product, by date, by purchase, or as an example here, and do various aggregations and counts.  It works tremendously well in that scenario.

Vertical Summary Data Marts

And if you translate that into our world, what we could build quickly are these data marts for a given constituency of population of patients or domains.  So I can provision a diabetes data mart, although I have some challenges from a data modeling perspective that we’ll explore here soon.  And then I can go back to my source systems, navigate it into these data marts.  But what ends up happening is I don’t have an atomic level data warehouse.  I’m trying to bring data in to very quickly provision to my constituency, respective constituencies, for each of these clean data mart boxes.  And I end up with a bit of a maintenance challenge over time, as well as impacting these source systems repeatedly for each new extract and each new data mart.  So again it has some challenges in this approach as well.

Adaptive Data Warehouse

An alternative final approach that we have seen work very well in healthcare is making a more adaptive approach.  Let’s identify the data that exist in these various sources systems.  They are the (16:43) data sets that we need to analyze.  And bring those data in with minimal transformation.  The arrows are smaller to indicate less transformation.  Let’s bring those in but link them as opposed to re-model and manipulate and integrate them.  Link them through what we call a data bus, a standard and common vocabulary for core data element such as patients, providers, locations, diagnoses, procedures, etc.

And then once we have those data in and we can bring them in very quickly because we’re applying minimal transformation, from those sources, what we called source marts, we can quickly build subject area marts.  There is more transformation but we are binding the data as a preview to what we’ll talk about here in a few minutes.  We’re binding the data late as opposed to early as when we bring data from source into the source marts going to a data warehouse.

Classic Star Schema Deficiencies

So to explain this – let me wrap up here before I turn it over to my colleague – Classic Star Schema deficiencies, again, in healthcares, we’re managing populations, we’re managing many-to-many relationships.  The Star Schema is built around one –to-many relationships.  We’re managing counts of transactions, we’re not managing or concerned about events as much or states of change over time, as well as related states, such as co-morbidities and attribution.

Sample Diabetes Registry Data Model

So here, if we conceptually look at a population of patients, and in this case we’re looking at diabetes, it’s not just how many counts of a given product by a location by other dimensions.  I need to know how many diabetics do I have, what was their last A1c, eye exam, foot exam, what was the history in the last two years for each of these reserves, what medications?  What’s the history of their office visits, what comorbid conditions might they have, what interventions have been employed to help the diabetics improve their A1c rates.  Far more complicated than a simple bill of materials construct.

So to help illustrate and understand these concepts, we’re going to walk through and hopefully you will introduce to that in the video, but I’m going to take it over to Cherbon for a few minutes now and walk us through some phone exercises.

Measurement System Exercise Webinar

[Cherbon VanEtten]

Thank you, Steve.  So based on feedback from people like yourselves, what we’ve designed is a short analogy that compares data warehousing to shopping.  We used this analogy to highlight in a non-technical way the differences between the three models Steve just described.

The Enterprise Shopping Model

Let’s start by looking at the enterprise shopping model on the screen.  Notice that it’s well-organized and highly structured.

Next, I’m going to put up on the screen a list of items you need to get at the store.  I want you to map those items into the enterprise shopping model.  While doing that, imagine you get a call from your significant other asking you to also get the following items.


Okay.  Now that you’ve attempted this exercise, let’s take a quick poll.

Describe your experience with the enterprise shopping model exercise.  It worked great, were you frustrated, or did you stop trying?

[Tyler Morgan]

Alright.  We’ve got that poll up and live.  I will just leave this open just for a few seconds to give you a chance to answer that.  Okay.  We’re going to go ahead and close that poll now and let’s share the results.

Enterprise Data Model (Technology Vendors)

[Cherbon VanEtten]

So, I’m not surprised.  It looks like that three-fourths of our audience was either frustrated or just stopped trying.  This shopping approach is much like the enterprise data model that Steve has described which works very well in industries like retail and banking.  And when you need to capture the data that you need to capture, it’s more standardized and stable over time.  Why this model breaks down in healthcare is because medical knowledge is always expanding and changing.  So it’s impossible to anticipate at the outset what those new data will look like and how is it going to fit into your model.  Additionally, concepts like length of stay and readmission rates end up having different definitions based on audiences.

Using a dimensional model in Healthcare is kind of like shopping for data like this…

So going to our next exercise, we’re going to look at the dimensional shopping model.  In this model, instead of using a shopping list, we have specific recipes we need to create, which are chocolate chip cookies, a cake and apple pie.  Now, prior to today’s webinar, we sent you a link to a short video that could fix this experience.


I want to just take a quick poll to see how many of you were able to watch the video prior to the webinar.

[Tyler Morgan]

Alright.  We’ve got that poll up.  It looks like folks are voting.  We’ll leave this open for just another 5 seconds…

And the poll is closed and here are the results.

[Cherbon VanEtten]

Great.  So it looks like many of you were able to watch the video and I hope that you really enjoyed that.  But for those of you that were not able to watch the video, I will go through and high spot what was depicted.

Dimensional Shopping Model – Cookie

So we start with our shopper getting a call from the school board to bring cookies to a board meeting.

She goes to the grocery store to get exactly what she needs to make the cookies.  Two cups of shortening, four cups of flour, four eggs, and so on.

Now, imagine our shopper returning home to bake the cookies and getting a second call requesting to bring a cake.

So back to the store she goes to purchase many of the same ingredients.

She returns home to get yet another call and the video ends as our exasperated shopper heads to the store for a third time.

The Dimensional Shopping Model

So this dimensional shopping trip started out fine for our shopper, getting just exactly what she needed.  But as soon as we add another recipe and another recipe, making the redundant trips to the store became exhausting.  So think about how many trips to the store do we need to make.

Dimensional Data Model (Healthcare Point Solutions)

This shopping approach is like the dimensional data model, which starts out really great with a couple impressive point solutions for a few departments.  And as the demand for analytics grows, the model starts to become a mass of redundant data feeds from the source system to multiple applications.  So for organizations that are looking for an enterprise data strategy versus a few point solutions, they would quickly become encumbered by the dimensional model, spending all their time designing trips to the store.

The Adaptive Shopping Model

Now, let’s explore our final model, the Adaptive Shopping Model.  In this exercise, you are given a shopping cart with a simple structure that allows you to indicate the store you are shopping at and the items you need.  Now, using the same shopping list we saw earlier, imagine how you would fill out your adoptive shopping model.  Likely, you would select the stores you are familiar with and then you would start organizing the items in a manner that reflected the way out of the store.  Now, when we add new items to the shopping list, if they don’t fit within the third list, you can just grab another cart and so on.

Shopping List Revisited

Now, let’s think about bringing all those items home back to your house.  Could you still make the recipes?  You can and you can do it without all the redundant trips to the store.

Adaptive Data Warehouse

This shopping approach is like Health Catalyst Adaptive Data Model.  The carts represented source systems that are brought into the data warehouse and these are going to match the transactional systems exactly.  So it doesn’t take a lot of effort to bring that data into the warehouse.  There is no manipulating or forcing the data into a model.  Additionally, you have all the data you need for the analytic applications you know that are being requested now and those that will come in the future.  You can build all of these without having to go back to your source for more data.  This approach works really well in healthcare because the data requirements are typically not well understood at the outset.  With the adaptive approach, you can build incrementally, adding source systems as you need them as well as subject area marts.


So let’s stop and pause for another poll question.  At this point, in our exercise, which shopping approach would you prefer?  Enterprise, dimensional or adaptive?

[Tyler Morgan]

Alright.  We’ve got our poll up and I will leave that open for just a few seconds to give you a chance to answer that.

Alright.  We’re going to go ahead and close the poll now and let’s go ahead and share the results.

[Cherbon VanEtten]

Fantastic.  I agree with those that selected adaptive as well.

So now, with this simple analogy, it doesn’t work perfectly like most analogies but hopefully it will help you remember the characteristics and the differences between each model.  And maybe more importantly, it will help you communicate with a less technical audience and help them understand the differences as well.


So for a final poll question, we would appreciate your feedback in saying did this analogy help you better understand the differences between the models?

[Tyler Morgan]

Alright.  We’ve got that poll question up.  Please take the time to fill that up.  We’d also like to let you know in our regular post webinar survey we have also included an opportunity for you to provide your feedback about the video, if you had the chance to watch the video and this experience and that will be an open-ended question so that we’re able to get your full feedback regarding this exercise.

I’m going to go ahead and close the poll now.

[Cherbon VanEtten]

So looking at those results, thank you for those that were honest and said “no”.  And whether it’s during the questions and answer time or if you want to submit feedback, we’d love to hear what we could have done differently to help it clarify the difference better.  But thank you for your participation.

[Steve Barlow]

Great.  Thank you, Cherbon.

In summary of the adaptive model, I wanted to highlight just a couple three use cases that we see most often in analytics that leverage this model.

The first use case is if I’m a lab administrator, let’s say, and my primary focus is operations of my lab and I want to know primarily metrics like what are my turnaround times in my lab, in this approach, I have a really good system now, analytics repository, for me to analyze operational measures for my lab system.  I can live in these rules only.  I don’t need to live anywhere else in the data warehouse.

Scenario two is if I’m an analyst who is assigned to support the diabetes care team, or any other care team as an example, and most of my work will be done managing that diabetic population, I can get all of the information, or nearly all of the information I need in this diabetes, what we would call as subject area data mart.

And then finally, if I’m a quality analyst and maybe I’m assigned for various regulatory reporting or other types of analyses and I do have a need to link disparate data, I need to connect supplies data with clinical data out of the EMR with satisfaction data, with lab data, I can now easily navigate across these previously disparate data sources without any difficulty.  So this model really is meant to support those three key use cases that we see over and over again.

Late-Binding Deeper Dive

Data Modeling Approaches

Now, I’d like to take a deeper dive into this concept of Late-Binding ™.  By way of illustration, these models or approaches may be familiar to you.  We built a continuum to kind of illustrate where they might fall on this continuum from an early binding to a Late-Binding ™ measurement and we’ll explain more what that means.

So if you consider the corporate information model, or the enterprise model popularized by folks such as Bill Inmon and Claudio Imhoff, that is clearly an early binding approach where we need to bind data to rules or vocabularies very early on in the process of moving data from its source to its final resting place.

Another example of a more early binding approach is I2B2 popularized by Academic Medicine that is intended to support research.

In the middle is the Star Schema.  We’re still binding.  We have to bind early but we’ve been very selective in the data we bring over from the source systems.  We just provision those data sets supporting our business requirements, popularized by Ralf Kimball.

The Data Bus approach popularized by Dale Sanders.  Let’s bring the data in from the source systems, connect them into a Data Bus, link them as opposed to remodel them.

And then finally, file structure association such as MapReduce, Hadoop and NoSQL that are becoming more popular, but have been around for a number of years, associating file structures.  There really is no binding at the database level or at the application level, only at one time.

Origins of Early vs Late Binding

So, the origins of early binding.  Early on in software engineering, if you would call the days of thousands of lines of codes, tightly coupled, hard coding, single compile, all functions linked at compile time, that’s an example where software engineering learned that the pains and challenges can come up to early binding.

Origins of early vs Late Binding

In the early 1980s, object oriented programming came into being and we began to see the benefits of binding late and decoupling and delaying coupling and binding.  Steven Jobs in NeXT Computing in his (32:45) there began to really bring it to mainstream.

Data Binding in Analytics

If we take that concept and transfer it to the data warehousing analytic space where we operate, we need to bind atomic data to business rules at some point in the analytics life cycle.  We need to bind to vocabularies in healthcare such as ICD, SNOMED. LOINC, CPT, RxNorm, etc, etc.  There are common and core elements or entities such as patients, providers, diagnoses, procedures, location, etc, that also need to be consolidated and standardized.  So those are examples of binding data to definitions.

And in the next example, we need to bind data to business rules.  There are calculations and variations such as length of stay or patient attribution to a provider, or cost per case, etc, etc.  We need to identify where in the ecosystem and life cycle it makes the most sense to bind data, to bind data to those rules.  And the principles, if we look at the lesson one in software engineering, the later we bind, the more flexibility we provide ourselves.

Analytic Relations

The key is to relate data, not model data

So if we look at again this idea, if you look at the (34:07) diagram, there are different vocabularies.  Maybe a health system has multiple EMR systems or multiple vocabularies.  There certainly is overlap between and among these vocabularies and these systems.  The highest value from standardizing a vocabulary perspective are those that have intersection across all systems.  Those are candidates to look at maybe an earlier binding approach.

Over on the left you see some core data elements that also need to be considered for an earlier binding.  But as it states here in black, the key is to relate data, not to model or remodel data.  Healthcare systems have purchased data models from their EMR vendors and other source system vendors.  Let’s not remodel the data.  That’s going to end up costing us a tremendous amount of money and time in all the transformation.

Six Points to bind Data

If we can quickly look at where there are points to bind data, what we’re advocating is Late Binding ™.  Bind as late as possible and feasible according to the business requirements.  We’re not advocating that all things be bound at one point.  Let’s be judicious and thoughtful about where we bind and leave ourselves open to the most flexibility possible.

So if we look at the points of binding, the first point.  So if we look at these columns, each column represents a different phase or stage in the ecosystem.  The data warehouse, the boundaries of the data warehouse or the analytic system are the last three on the right, data analysis.  So if we look at the source system column, there is a point where data can be bound on the source system.  As an example, let’s think of a business rule, maybe it’s cost per case, maybe it’s physician attribution to a provider, maybe it’s readmission calculation.  We want a gold standard for those definitions.  So we’re going to bind this early as possible, maybe even going into our EMR system and binding to those rules and creating a new field on one of our source systems.  That’s an example of very very early binding.  We don’t leave it up to anybody to make decisions in the analytic space.  It’s done actually in the transaction space.

Now, you may not have all of the necessary inputs to store and bind those inputs to a rule on the source system or on the single source system.  We need to cut across multiple source systems.  So then there’s an opportunity to bind as we move data from a source system into a source area – we call it our source mart area.  If you remember that circle diagram, these are the purpose circles where data comes with minimal transformation from the source system into a source mart.  The extraction transformation and loading routines from the source system to the source mart, there’s an opportunity to bind data in the ETL.

Third, there’s an opportunity, once all of the data have landed in the source system, we have an opportunity here in the source marts to derive new data and to bind rules.  So maybe we want again in our example of patient-provider attribution, we want to apply our algorithm after the data has landed in the confines of the data warehouse and the source mart area and we bind data here at this point in three.  Maybe we don’t want to bind there.  Maybe we want to delay it further and we want to bind as data is moved from the source marts into a subject area mart for a population of patients or events, the ETL between the source to the subject area mart is another opportunity to bind data to terminology to rules.

Finally, if we want to then delay even further, we have an opportunity to bind active data mart level.  All of the inputs remain static and consistent to this point.  We have all of the inputs just quickly available to us at the data mart layer, we can then create a “bound” object that is the derivation for that concept, while also retaining its discrete inputs.  So we have left ourselves much more flexibility here.

Now, there may be a derivation that we want absolutely to bind earlier.  So we have one way to calculate cost per case.  Regardless of which level we’re going to analyze that concept, that bound concept needs to be bound earlier in the stream.

The last bind point is in the visualization layer.  There are reasons where you retain all of the discrete inputs and then at one time you can make those, you can bind those data.  If you look down here at the bottom, it goes from low volatility to high volatility.  Those things were going to change constantly.  You may not want to materialize at the data base layer.  You may want to leave that ultimately at the reporting layer.  Certainly there are issues with that as well but that’s an option.

Binding Principles & Strategy

In summary, if we look at the principles behind data binding, we want to delay as long as possible until a clear analytic use case requires it.  The longer we delay, the more flexibility we have.  There are, however uses for early binding.  We want to lock down a consistent definition or to a consistent terminology earlier and that’s a big justified reason to bind early.

Late binding at the visualization layer is appropriate for “what if” scenario analysis.  If I’m going to change my model, my algorithm, at will, that’s a great place to bind late.  If I have all the discrete elements available to me in the visualization layer, then I can do “what if” analyses.

And then retain a record of the bindings from the source to the data warehouse.  This gets back to the technical metadata.  As we impose changes to data to the entire ecosystem, we want to track those changes in binding.

And retain a record of the changes to the vocabulary and rules that we bind the data as well.

Thank You!


[Steve Barlow]

Great!  Thank you.  We now have time for some questions.

[Tyler Morgan]

Alright.  We’d like to remind everyone that you can ask your questions by typing your questions in the questions and answers pane in your control panel.  It looks like we’ve got some questions (41:29) and ready for you, Steve.

[Steve Barlow]

Great.  I’ll just read through the questions here.

How is a source mart different from an operational data store? 

That’s a great question.  And we may be dealing with semantics.  So, an operational data store typically is a replica of one transaction system with no transformation.  It just comes in in its whole.  So from a characteristic perspective, a source mart is very similar to an operational data store.  The difference we would call out in the source mart is a source mart does conform to (42:15) standard and does get hooked up into the Data Bus, if you will.  So one, there is minimal transformation, we’ll change the names of the structure, the tables and column to conform to a standard; and two, there is some de-normalization often from the source into a source mart; whereas an operational data store is typically a replica for reporting purposes off of that system alone.

Next question, if you have two different EMR applications, would you create two EMR source marts?

Excellent question.  Yes, the answer is yes.  The reason why we would create two source marts for those split EMRs, we want and philosophically believe we should always have all of the atomic level data available to us.  Now, in that scenario, there might be a need for, and what I call, what we call a core entity consolidation layer, where you’re very very selectively consolidating key entities such as patients, even if you don’t have a master patient index, provider, if you don’t have a master provider index, locations, diagnoses, procedures, labs, meds, that’s about it.  So you would have kind of a layer between the source marts and the subject area marts that gets consolidation layer where you would combine, not integrate, but combine.

How do you move up levels in sequence or jump over some levels?

And I assume that question, you’re referring to the Analytics Adoption Model.  And I would identify a note that the Analytic Adoption Model isn’t necessarily sequential.  As the example, you may already be getting some interesting themes in your enterprise relative to predictive analytics.  But yet, you have no data warehouse.  So you’re doing some of those advanced analytical practices.  I think moving up that sequence, you can, if you read in detail each level, it gives you a good roadmap of what you need to do to move to the next level.

How do you pull the data fields beyond the 20 bus items from multiple EMRs, example EDLab turnaround time is measured differently in each EMR?

Exactly.  And I think we addressed this earlier.  That will be an example of what’s defined a standard definition for EDLab turnaround times, and then bind that in this consolidation layer.  Now, I really raise a red/yellow flag, let’s call it orange flag, about how selective you need to be in that consolidation layer.  It can quickly turn into the Enterprise Data Model approach and you end up taking years before you’re going to provide analytic solutions to your constituencies.

How to best handle private insecurity in the Adaptive Model? How to mitigate an intrusion?

So the Adaptive Model has the same challenges from a security perspective that any of the other approaches have.  We advocate a strong and robust data stewardship approach, where you use data base security, application level security, and policy level security where data stewards are assigned to each set source data mart, as well as subject area data marts and they are the ones who authorize access.  And no access is given except and unless to the appropriate security teams and any of your, whatever your given approach and process is for authorizing and allowing access.

How does Snowflake Schema fall within the early binding concepts?

Great question.  I remember reading an article that Ralph Kimball wrote in the mid-1990, and it was when Star Schema was being written about and becoming very very popular.  When he wrote an article about using the Star Schema in healthcare and he said, “I’ve discovered a place where we need to add what’s called a helper table.”  And that’s kind of the precursor to Snowflake.  So it’s just the Star Schema was meant again to be a compact structure to support given counts of given transactions and aggregations for each set of facts or a fact table in the center surrounded by the various dimensions.  A Snowflake basically connects or links two Star Schemas or more than two Star Schemas via shared dimensions.  So it’s another way and it’s another way to basically deal with the many-to-many relationships that the dimensional model typically doesn’t support well, whereas the relational model does.

Now to be clearer, we are not advocating that Star Schemas and all up materials should never be used.  What we are advocating is first go to – the atomic level data warehouse should be a relational source, again allowing you optimal flexibility.  Then as you build the subject area data marts, those also should be atomic, relational because there are so many many-to-many relationships that you noted in the slide earlier, and the resolution of those many-to-many relationships cannot be done with a Star Schema.  Now, if there are certain measures within that subject area data mart that lend it themselves very well, or the content such as supplies and finance and claims, the content lend itself well to an all up structure, then build the Star Schema if need be, so you can leverage, the tool can leverage that structure on top of that data mart.  That’s entirely acceptable.  And again, you’re allowing yourself optimal flexibility.

How is the subject area data mart in Late-Binding ™ structured in comparison to fact in a dimension data mart?

I’m struggling to comprehend the difference and I think I may have just explained that.  But the subject area data mart is a relational data model.  Like you saw in that slide, the sample diabetes registry, you’ve got a patient who has many-to-many relationship to medications, to visits, to labs, to vital signs, to providers, etc, etc.  so that subject area data mart is relational.  I would call it second normal form, if you will, those division of the normalization levels.  And so, it is not a Star Schema or dimensional structure.  But again, that does not preclude you from building a dimensional structure, e.g. Star Schema, on top of that or dictated by the types of analyses.  If you’re going to do primarily counts and transaction aggregations, then it might make sense to build a Star Schema on top of that.

In the Adaptive Data Model, what is the minimum data latency for data from the EMR to use in an interactive engineering report reminders?

So some thoughts there philosophically, you can make a data warehouse, you can reduce the latency as much as your business requirements and your budget allows.  I would say by default it ought to be a day log every 24 hours.  If you have requirements dictating a more frequent feed from the EMR into the source systems, then just understand the consequences for making that decision.  You’re imposing a high availability set of requirements on a data warehouse, not unlike what you would impose on a transaction system.  So we try to advise our clients to be very careful and ensure that the business requirements dictate an absolute real time or near real time latency.  And most often, if they think about it, it really isn’t.

Now, there are justifications for making it near real time.  An example might be in readmission prediction.  We want to identify this patient who just was admitted what is their history and what are we predicting from a readmission perspective weaning from an AVP perspective the data more frequent and once per day.  So then maybe consider just bringing in in a more real time fashion that subset of data rather than imposing a real time requirement to your entire data warehouse.

How is the concept of binding different from transformation?

If they are now, I guess, the change from ETL to ELT can be considered as an end later binding approach.  Very very well said.  I would say transformation is not always necessarily equivalent or synonymous with Late-Binding ™.  They’re transforming structures sometimes for ease of analytic use, moving from the normalization as an example, moving from a third or fourth normal form down to a first or a second normal form.  We’re not necessarily binding the data differently to rules.  We are de-normalizing somewhat.  We could argue that it is its next of binding.  The binding we’re talking about is binding to business rules, e.g. what we talked about earlier, patient-provider attribution, length of stay, cost per case, readmission calculations, those kinds of new calculations that are derived from various inputs.

And ELT certainly can be considered and later a later binding approach.  And in fact, if you look at the Adaptive Model it is heavily ELT as opposed to ETL.

Is binding simply creating referential integrity between databases?

Not necessarily but it certainly is a part of it.  In the case of patient and providers, if you have multiple and disparate EMRs, as an example, in your enterprise and you need to create a master patient index, your binding and linking those disparate concepts referential integrity in a source where primary foreign keys are enforced is not advocated in the data warehouse, especially in the Adaptive Model.  Since we’re not transforming the data in a significant way, it’s less necessary to impose yet again RI or Referential Integrity in the data warehouse.

Now, in the enterprise model and also in the dimensional model, referential integrity enforcement is far more important because I’m creating a net new structure and I need to enforce that differential integrity.

I think we have time for a couple more questions here.

What modeling approach is used in the custom subject area data marts (55:32) adapted, etc?

Yeah, I think we addressed this question as well.  So the subject area data marts are relational/adaptive and they are not dimensional.  However, again star schema certainly could be used and employed on top of that subject area data mart where required by the business needs.

Another question.  You didn’t list Business Objects or BO in your visualization tools.  Do you see any constraints supplying the Business Objects tool to the adaptive reporting system?

The answer is NO.  That was not meant to be an all encompassing inclusive list of visualization tools.  There’s nothing that would preclude you from using Business Objects against the adaptive approach.

I think we have time for maybe one more.  Those were fast questionsDoes each source mart have its own semantic layer or is there one semantic layer for the entire Late-Binding ™ Bus Architecture?

Semantic layer is a very overloaded term.  So let me try to answer that in a two predominantly accepted definitions of a semantic layer.  Semantic layer A is binding to a semantic vocabulary, like LOINC, RxNorm, CPTs, ICD9, etc.  that is taken care of where needed in this consolidation, entity consolidation layer and that would apply across the entire adaptive warehouse.  Semantic model, that the term semantic layer that’s used by many visualization tools such as Business Objects, an example that would be the Business Objects Universe, that would be done.  You have flexibility really to do that anywhere.  You could build a universe over a subject mart, you could build a universe over one source mart, you could build a universe over multiple source marts.  You have flexibility with how you use those.

So I have answered those questions.  I apologize we didn’t get to all your questions but I do believe we’re out of time.