Evaluating an EHR-Centric vs Data Warehouse-Centric Analytics Strategy: Seven Points to Ponder

My Folder

ehr-vs-data-warehouseThe relevance of analytics and data-driven culture in today’s healthcare environment can be compared to the relevance of antibiotics in medicine in the 1940s. We are on the threshold of a new era in modern medicine and the tipping points are data, technology, and how we combine them to fundamentally change the way we deliver care. To build this culture of data-driven care improvement, healthcare systems need analytic tools to support the culture. Via the following seven points, we examine the pros and cons of two primary tools—the electronic health record (EHR) and the enterprise data warehouse (EDW)—as well as the infrastructure supporting them:

  1. Incorporating data from a wide range of sources
  2. Ease of reporting
  3. The data mart concept
  4. Relevance of each to value-based care
  5. Relevance of each to managing population health
  6. Surfacing results of sophisticated analysis for physicians at the right time
  7. Ability to combine best practices, data, and technology tools into a system of improvement

First, let’s examine the foundational characteristics and philosophies in this matchup of EHR versus data warehouse tools.

A Brief History of the Electronic Health Record

Understanding the history of EHRs and their original purpose informs their current capability.

Before the EHR, doctors had paper charts, essentially file folders with tabs and sections. Some were thick, some were thin, but the chart was the canonical clinical record of the healthcare institution’s interactions with a patient. The chart included clinical notes and the medication record. While bills and statements were not part of the chart, documenting the clinical record made it possible to get paid for services rendered.

A paper record could only be in one place at one time, so the industry sought the benefits of digitizing the patient chart. Thus was birthed the EHR, an electronic representation of clinical interactions with the patient. Its original purpose was to transition from paper documentation that supported billing and charting, to an electronic tool that supported billing and charting. This did away with the carts of charts and pneumatic tubes that moved patient information from the patient room to the diagnostic imaging area to the lab to the physician office, and so on. The EHR was designed as a point-of-care electronic way to capture data and, as it matured and grew in its capabilities, it expanded to other areas, providing software to operate the pharmacy, lab, diagnostic imaging, nuclear medicine, and all these other areas where processes could be automated. This made the clinical process much more efficient and allowed many different groups of people to collaborate on this process. The EHR was valuable, necessary, and useful.

This heritage is part of what EHRs are today. They certainly have expanded their capabilities, but clinicians have also expanded their needs for technology. Unfortunately, there is a relatively high level of dissatisfaction among physicians with their current EHR capability, who feel that efficiency and productivity have decreased while costs have increased. There’s also a common perception that clinicians are using the EHR merely to document for regulatory and billing purposes, instead of using it as a tool to help care for patients.

The History of the Enterprise Data Warehouse

A discourse on the history of the EDW requires some coverage about the history of data-driven clinical improvement, how we apply best practices and evidence from literature, and how we measure performance in aggregate.

EHRs are 30 years old now. Healthcare providers have become more reliant on EHRs and other technology solutions to help them deliver care for their patients. Health Catalyst is working with one medium-sized academic medical center that has 425 different technology vendors helping it run its institution. In that complex environment, all these vendors have a specific purpose, providing and tracking specific sets of data, e.g., time management, computer hardware and software inventory, the core EHR, cost accounting, patient satisfaction, the OR scheduling system.

How can all this data be applied to deliver better care of patients? How can these assets be used to better understand how the business is performing? How can data be used to predict future performance? How can it be used to guide systemwide strategic decisions? How can it be used to reduce medical errors?

Most institutions have some sort of data analysis capability that comes from various vendors or from homemade tools, however, the demand for data and analysis almost always exceeds available capabilities. Deploying a robust enterprise data warehouse is usually the next major step forward in terms of data management and analysis. This EDW tends to be the one source that’s institutionally trusted as the source of truth that brings together all of the assets from the various technologies and data generators within the organization.

Data must be used to inform the care that is provided both to a single patient and to a panel of patients. This objective should inform and guide every element of a healthcare analytics platform. Because this objective is different from that of an EHR, a different tool is needed, such as a Late-Binding™ Enterprise Data Warehouse, that more effectively taps into physicians’ desire to have useful data that helps them deliver better care.

7-Point Comparison Between the EHR and the EDW

There are clear distinctions between the current characteristics of the EHR and the EDW that define their individual inabilities and capabilities across the vital frontlines of healthcare practice and delivery, outcomes improvement, value-based care, accountable care, and population health.

1. Incorporating Data from a Wide Range of Sources

EHRs historically have not been very good at incorporating data from multiple sources, whereas EDWs are very good at it.

There are two parts to this expertise:

  1. Acquiring the data wherever it’s located

    The data acquisition technique for EHR vendors is through the Health Information Exchange (HIE), where a request is made for a patient’s clinical summary or a lab result via an HL7 transaction. EHR vendors have become good at this, but those kinds of transactions are not sufficient for most analytics purposes because access is needed to a broad swath of data. For example, an analyst may need a discharge summary for patients over the last two years, not just for those discharged today. The HIE-centric method is an insufficient way to transmit data and many of the standard exchange formats do not include data elements that are useful and valuable for many analytics purposes.Many healthcare systems want to do improvement projects related to labor and delivery. Common analytic use cases require gestational age and the standard HL7 and HIE interchange formats don’t include data fields for gestational age. There is no way for the HIE to reliably collect data. It can be done, but it requires customizing the transaction and then all vendors have to customize the same way. This is not easily replicated multiple times and it’s the Achilles’ heel of the HIE/EHR duality.Data warehouses, on the other hand, generally use different methods of data acquisition. For example, a best practice is to take every table and every column directly from the database underlying the EHR and moving them directly into the warehouse. This solves the problems of standard HIE transactions and gives the organization access to all of the data, including all of an organization’s patients, notes, and lab orders over a long time period. It also gives a broad view of data from a data elements dimension. In the case of a labor and delivery improvement project, now all of the data that’s stored in the EHR is available, including gestational age that’s recorded in the ultrasound template, the OB template, the maternal fetal medicine template, and in the observation unit when the mother comes in to deliver. This is all accessible in the data warehouse. Now there’s a systematic approach for pervasive data capture and collection. Data warehouses are better than EHRs at providing as much data as possible to facilitate exploration and correlations.

  2. Once collected, correlating all of the data

    Once collected, the capability must exist to link data across multiple sources. This requires techniques and technical infrastructure for linking a patient in data source A with a patient in data sources B and C. The same is true for linking providers from these three data sources. This is a capability that data warehouses are much better at than EHRs.As the number of data sources increases, the importance of this capability increases. An EHR can handle two or three sources, but with 20, 30, or 50 data sources—as is typical with most healthcare systems—that all need to be correlated, there’s a sophistication requirement that EHRs lack, but data warehouses have in abundance.

2. Ease of Reporting

With regard to detailed transactional data, healthcare systems must be able to roll up, combine, and structure data in ways that are easy for internal stakeholders to digest for follow-up outcome reporting. This is a useful and valuable function and EDWs usually have a better infrastructure to do this than EHRs.

Fundamentally, the EHR is a transactional system designed and optimized for data capture. The performance metric that matters in an EHR is response time. When a doctor hits the button, the system has to be responsive to minimize wait time. When a nurse makes the electronic request, the EHR needs to pull up a chart quickly. This transactional function of the EHR has led it to be optimized for responsiveness within these transactional situations. EHR software and data architecture is generally designed to support the transactional nature of the users and the functions they need to perform. That style of data design is generally not ideal for reporting and analysis. The reporting capability introduces difficulties because of the need for unnatural querying to work around some of the fundamental design decisions that were made in the core EHR data structures.

In a data warehouse, the kinds of reporting or data structures that are optimized for analysis and reporting are almost always present. Reporting queries can negatively impact performance of the core EHR and potentially slow the system while doctors are charting. Also, with data structures that are designed for reporting rather than capture, it’s much easier to build summary tables and rollups and generally organize data in a way that’s better suited for reporting. Also, some analysis can be very computationally expensive to do. In a data warehouse, this can be done once and then stored.

3. The Data Mart Concept

Data marts are the capability for data warehouses to massage and prep data into intermediate forms that are more suitable for analysis. Health Catalyst builds them as an intermediate data structure to facilitate reporting and analysis of data that was captured by one or more transactional systems.

It’s not that EHRs cannot perform this function, but data warehouses are typically much stronger in this area. For example, with an ERP system, data from employees, payroll, GL, A/P, and time tracking populate a reporting database that becomes the data mart for reporting purposes. Typically, an organization has very limited ability to modify or extend that reporting database. It is what it is. In a data warehouse, the vendor typically provides tools to materialize many different data marts from the core set of data to suit various analytics purposes. There’s more flexibility and capability in creating data marts from a data warehouse that typically aren’t available with the reporting databases provided in an EHR.

If there is a need to take patient satisfaction data and compare them with ERP system data, it could be accomplished in the EHR, but it’s usually difficult and inefficient because the EHR is not designed to have data added from these other systems. A warehouse supports data from many different data marts for many different analysis and reporting use cases. These data marts are created easily and rapidly inside the data warehouse, a function that’s not available inside the individual sources (EHR, financial system, patient satisfaction system, ERP system, etc.).

Facilitating Machine Learning


One of the reasons why it’s important to be able to create lots of data marts quickly is to enable more sophisticated types of analysis, such as machine learning. Machine learning algorithms need well-formed datasets to process, and a data mart is ideally suited to create these concisely defined sets. There are lots of avenues for further analysis with a data mart. The analytics organization needs tools for materializing those data marts quickly and for storing them efficiently. This is what a data warehouse does.

4. EHR Versus EDW Under Value-Based Care

In order to effectively participate in value-based care programs, healthcare systems need claims, clinical, and cost accounting data, as well as other eligibility data from the payer, whether that payer is Medicare or commercial. This enables healthcare systems to measure performance against their contracts. These processes, in essence, double down on the need to incorporate data from multiple sources, which again points to the preference of an EDW over an EHR.

Value-based care requires healthcare organizations to have a new set of skills and capabilities. Their cost to deliver care, while it was important in a fee-for-service world, is now more important than ever. An understanding of costs is critical to success in a value-based environment.

In a value-based care contract (with either CMS or commercial payers) there’s a well-defined mechanism to determine if the care being provided is high value or high quality. Typically, a contract is either annual or multiyear (most are multiyear) and it comes with incentives for higher reimbursement.

The terms of most contracts calculate performance on an annual basis. At the end of Year 1, for example, a hospital reports its quality metrics and payers determine whether it gets paid or penalized. The data challenge here is that the hospital is required to synthesize data from multiple sources in order to compute those value metrics. If the hospital is smart, it doesn’t want to wait until the end of the year to know if it is reaching its metrics. It wants to know regularly throughout the year so it has time to make adjustments to obtain the incentive payment at the end of the calendar year.

Historically, because of Meaningful Use (MU), EHRs were required by CMS to measure for a short period, typically 90 days per calendar year. A hospital only had to submit its report once at the end of the calendar year. This became the status quo for how these kinds of quality metrics were captured and reported in the EHR. In the value-based contract, this is not the optimal method because the hospital needs to measure both for cost and performance on quality measures that are relevant to that contract. This means daily or weekly measurement to assess performance, make course corrections and adjustments throughout the year, and avoid surprises.

It’s common to hear from a hospital that received its multimillion-dollar bonus at the end of the year, but had no idea if that was going to be the case. And it doesn’t know if it’s going to receive a bonus or a penalty next year.

Many EHRs were designed around the reporting requirements of MU, but they did not provide the ongoing to capability to monitor and manage those metrics every day, week, or month, so there was only incremental visibility into performance against the contractual requirements outlined in the value-based contract. Many EHR vendors simply added the value-based metrics to their same infrastructure, which was only designed for annual reporting.

The best way for computing quality metrics in value-based contracts is with a reporting structure that is designed for giving daily visibility into performance against a portfolio of possibly 10, 15, or 20 different contracts, each with a different set of quality metrics. This is exactly what data warehouses are designed to do. The EDW can measure daily or weekly so providers can make sure they are getting the right incentives into their contracts.

Value-based care measures require data from multiple sources and data types to support the value-based contract. Clinical and claims data have to show all costs because the provider is responsible for them even if they were incurred from outside the system. A visit to an outside provider doesn’t show up in the contracted system’s EHR even if the visit is with a provider that’s connected by an ACO. This means the external clinical data needs to be collected and brought into the data warehouse to get the necessary visibility.

New Data Infrastructure Needed for Bundled Payments

The whole idea behind bundled payments is that a consortium of providers will be paid a flat rate for all services related to an episode of care. For example, with hip replacement, the flat rate includes the hospital stay, surgical procedure, anesthesiology, and 90 days of physical therapy and follow-up care. It becomes the primary provider’s responsibility to take the bundled payment from CMS or the commercial payer and determine how it will be distributed. Providers need a sophisticated data capability so the unbundling can be done fairly. Furthermore, CMS bundled payments were optional at first, but they will soon be mandatory, i.e., CMS will only pay for certain clinical episodes through bundling.

CMS is aggressively adding value-based components to their reimbursement structure. Where some programs used to be optional or where some of the incentive components used to be small, now they are no longer optional or no longer small. A new level of data expertise and capability is required from providers that they haven’t needed before and the data warehouse is required to build this infrastructure.

5. EHR versus EDW in Managing Population Health

Effective population health requires the kind of analysis, risk stratification, focus, and machine learning that are beyond the performance specifications of EHR vendors. Population health needs more sophisticated data science and data analysis.

The very broad definition of population health is a group of providers delivering optimal care to a group of patients. To provide optimal care, those providers need to have strong collaboration and the ability to share data among themselves. If it happens that all providers use the same instance of the same EHR, it results in a strong collaboration platform for managing population health.

However, if all providers in the group do not have the same EHR, they need figure out how to get beyond their silos to share patient data and insights across the group. This can be done via HIEs with some degree of success, however there isn’t enough richness in the data that goes through most HIEs in order to do the kind of analysis and risk modeling required of population health. Most HIEs send data based on a trigger event that occurs for a single patient. This event is almost always driven by a provider doing something to or for that patient, for example, a patient being admitted to the hospital. Where the HIE falls short is with a broader query, such as all patients who haven’t been seen by their primary care physician in the last year. This is an important use case for population health and an HIE cannot help to know this. It does give some level of collaboration, but not the kind of rich data and capability needed to run a sophisticated population health strategy. Only and EDW can do this.

Again, if everyone has the same EHR, this creates a strong population health platform, but it only works if everyone also uses the same instance of the same EHR. We define instances of EHRs as follows: The EHR is one part software and one part data on which the software operates. An instance of an EHR is that pool of data on which the software works. Some health systems may have all of their hospitals on one version of software and in one pool of data, so they share a medical record number across all of their facilities. However, some health systems have the same EHR version (the software) for all hospitals, but different pools of data (the instances) across all hospitals. Among other things, this means that the hospitals might have different medical record numbers for the same patient, different department identifiers, different physician identifiers, and so on.

If the acute EHR is different from the ambulatory EHR, or if providers collaborate from different institutions, then those EHRs must be stitched together with an HIE to give some level of data collaboration and it still won’t be sufficient. The best solution is a data warehouse that healthcare systems can use to pull data from different instances of their EHRs, identifying patients and then matching them up across all hospitals and all instances with all the data linked via the EDW.

An EDW is the only effective solution when trying to determine how many ED admissions a patient has had in the last year (an important predictor of risk) or how many times a group of patients across different facilities and different EHRs have been to the ED. To do this via EHRs would require a workaround, such as running reports for every EHR and then running the data through Excel. This becomes very complicated.

There are many use cases where only an EDW can run data from multiple facilities: in a cohort of patients 65+, who did not receive a flu shot in the past year? In a cohort of women 50+, who did not receive a mammogram in the past year? In a cohort of anyone 50+, who has not yet received a colonoscopy? The more complex an EHR environment, the more valuable an EDW is at addressing these kinds of clinical issues.

The EDW Versus the Sandpaper Ping-Pong Paddle

All EHRs claim to have their own EDWs, but none of them specialize, so data requests from a second instance of the EHR usually fail. Their hope is to put an entire healthcare system on the same instance of an EHR, rather than make a sophisticated warehouse tool for bringing in other vendors’ data.

We can liken this to buying a new Ping-Pong table. The new table may be high quality, but to round out the set, the table manufacturer packages sandpaper Ping-Pong paddles with the table. They may be okay for playing Ping-Pong, but the manufacturer is really in the business of making Ping-Pong tables, so the included paddles are cheap. Anyone who wants to be any good at Ping-Pong has to buy a good paddle separately.

If all that’s needed is a little better reporting environment for clinical data, then the EDW from an EHR vendor is probably sufficient. However, if the strategy is to bring together data from two costing systems, three EHRs, and eight claims sources, for example, then the data warehouse from the EHR vendor isn’t going to work.

6. Surfacing Analytics to Facilitate Physician Workflow

To facilitate critical clinical decision making, physicians need to surface results of sophisticated analysis at the point of care when they are placing orders or discharging patients. After collecting data and extracting useful insights, they must be made available to physicians. This is where EHRs have a distinct and meaningful advantage over the data warehouse.

One case in point is around readmissions. A patient is anxiously awaiting discharge from the hospital and the attending physician is preparing orders. Sophisticated analysis determines that this patient is a high risk for readmission to the hospital, but how is this communicated to the people who are caring for this patient and discharging her from the hospital? How can they be informed of measures they can take to reduce that risk before the patient leaves the building? Somehow, the discharge nurse needs to know, via the EHR, that this patient is preparing for discharge and is high risk. Now, the discharge nurse can talk to the patient and make sure she has a ride home from the hospital, for example, or that she has a caregiver at home and that she can get her follow-up medication from the pharmacy. Surfacing risks to users of the EHR while they are using the EHR helps to reduce risks.

With a separate data warehouse, the discharge nurse needs to login to a separate readmissions application to see the list of high-risk patients. In the same way that there is a clear advantage for the EDW when it comes to multiple EHRs, there is a clear advantage for the EHR when it comes to surfacing some analytics.

Data warehouses are trying to close the gap by learning how to surface analytics from the EDW over to the EHR. EDW vendors talk about closed-loop analytics. When data is pulled into the EDW from the EHR, this is the first half of the loop. The analysis, or the result, has to move from the EDW back into the EHR. This is closing the loop.

Here’s an example of analysis that needs to go back into the EHR. One of the greatest challenges to reducing sepsis mortality rates is quickly identifying which patients actually have sepsis. It proceeds very rapidly to death, so the quicker the diagnosis, the better the chance of life-saving interventions.

To more efficiently and effectively identify patients with sepsis, hospitals can run blood cultures, but this can’t be done on every patient entering the hospital. There are other warning signs, such as high blood pressure and high fever. The data warehouse, along with the right analytics app, is good at taking these potentially unrelated warning signs and analyzing those patients at high risk for sepsis. But if nobody sees the information until the next day because it’s sitting in the EHR, it doesn’t help. Showing high-risk patients in the EHR where the right clinicians can see them and take action is critical clinical decision making.

The forward-thinking healthcare analytics platform should be working on techniques and building capability to easily inject analysis from the data warehouse and surface it inside the EHR.

7. Combine Best Practices, Data, and Technology Tools into a System of Improvement

At the end of the day, the objectives of healthcare are to deliver better care for more patients, using a systematic approach to decide which problems to solve, how to solve them, and how to measure performance against goals. The notion that EHRs can achieve all this is a fallacy because EHR implementations merely took paper-based processes and made them electronic. They didn’t take advantage of all the electronic capabilities when they redesigned their tools. Other technology tools available to healthcare providers today enable a different way of practicing medicine that’s informed by data.

For the past 200 years, medicine has been about one provider caring for the one patient in front of them. Now, using data, providers can view information about thousands of patients to open an entirely new way of delivering care. With these tools now in place, healthcare organizations need a systematic approach to use them in a way that helps clinicians deliver better care. There is enormous opportunity here; just having the tool isn’t enough.

The most advanced closed-loop system, the most efficient claims and clinical data aggregation, and the most comprehensive set of data marts will all fail without the tool to inform care. This is not a technology problem, it’s a culture problem. Organizations need a culture of data-driven decision making and a culture of evidence-based medicine in order to succeed.

Practicing medicine without evidence is no different from practicing medicine a thousand years ago. Simply converting a paper medical record to electronic does not lead to evidence-based medicine. Interventions may be better, but this is not a fundamental change in the way care is delivered. Utilizing best practices and broad scale adoption throughout the organization are required.

A Transformative Period for Healthcare

These seven points illustrate a stark comparison between two best data and analytics tools for meeting many challenges of healthcare in the 21st century. The capabilities of these tools are converging, yet their strengths are very unique and healthcare executives need to decide which set of capabilities are more important. Developing a data-driven culture is difficult work, but it has the potential to be every bit as transformative as antibiotics were for patients with bacterial infections in the 1940s. This the magnitude of change that data offers. There is nearly unlimited potential to use it to take better care of patients.

Consider the options of a standalone EHR or combining an EHR with the data warehouse to achieve this potential. They both have their pros and cons, but we all need to move forward quickly to claim the benefits of using data to better care for our patients.

Loading next article...