The Healthcare Data Warehouse: Lessons from the First 20 Years
The enterprise data warehouse (EDW) at Intermountain Healthcare went live in 1998, followed by the EDW at Northwestern Medicine in 2006. Twenty years on, in 2018, analytics and technology continue to drive healthcare’s most significant advancements and daily activities, impacting healthcare from executive decision making to the frontlines of care and patient experience. To understand how the EDW has evolved as a pivotal tool and forecast its future role, healthcare IT participants can learn from the first-hand experiences of healthcare EDW early adopters and champions.
Dale Sanders was the chief architect and strategist for both the Intermountain and Northwestern EDWs. Lee Pierce assumed leadership of the Intermountain EDW in 2008. Andrew Winter assumed leadership of the Northwestern EDW in 2009 and transitioned the responsibility to Shakeeb Akhter in 2016.
This report is based on a webinar in which Sanders, Pierce, and Akhter considered their past healthcare IT decisions and the broader digital trajectory of health. It’s not an exhaustive summary of the discussion but captures highlights of the presenters’ histories in healthcare IT in three categories—people, processes, and technology—with three central themes:
- What they did right.
- What they’d do differently.
- Thoughts on the future.
People: The Foundation for Successful Healthcare Data Warehouse
Successful healthcare IT starts with the right people and a team culture that balances social, technological, and domain skills. The strength of the healthcare EDW today speaks to the many people behind its development and implementation, but Sanders, Pierce, and Akhter, as early digital healthcare leaders, also recognize early workforce decisions they could have made differently.
What They Did Right: People
Three skillsets—social, domain, and technical skills—have formed a successful hiring framework for EDW teams. Social skills help team members collaborate and problem solve through challenges, rather than taking a defensive approach. Additionally, healthcare IT can’t run on technology skills alone; contributors must also have deep healthcare domain understanding.
In addition to hiring around the three skillsets above, successful EDW leaders partnered with and empowered data analysts from the beginning, positioning them as the primary producers of analytics. Once they proved the value of the EDW and had some solid successes, these leaders engaged other executives in data and analytics strategy and execution. They formed appropriate governance committees to get business leadership involved in caring about data and the use of that data and analytics.
What They’d Do Differently: People
Some early EDW adopters overlooked the cultural issues of data and legacy source systems, especially those newer to healthcare. Compared with other industries, healthcare data is unusually sensitive, and legacy source systems teams can feel threatened by the EDW. CIOs were not trying to replace source systems—in fact they often communicated that EDWs didn’t exist without source systems—but it took a while for legacy teams to understand and trust that.
In another early EDW challenge, different types of health systems (e.g., integrated delivery networks [IDN] and academic medical centers) presented unique challenges for early EDW leaders. An academic medical center, for example, wouldn’t be as culturally prepared to take on data warehousing goals as an IDN. Whereas an IDN might have approached clinical variability reduction, best practices, and evidence-based care, its academic counterpart may have a culture of controlled variability. Early EDW leaders had to pivot and focus more on academic organization’s research needs and less on clinical operations and clinic variability reduction.
As well, some CIOs depended too much on matrixed technology resources, especially database administrators and database system admins, who were skilled in transaction databases, not analytics. Analytics is a specialty, so early adopters who relied too much on matrixed resources lacked that specialty expertise. Educational programs for analytics skills (e.g., SQL training) can help organizations build effective analytics teams—education around traversing the large amount of data in the EDW and knowing which data structures and tables to use for a particular thing.
Recruiting clinical and operational subject matter experts to assist with understanding data would also have boosted early EDW practice. Having technically skilled people or people with EMR experience provided value; but somebody in the data warehousing and analytics department who really understands healthcare data can raise even more value. For example, a team member who knows DRG, ICD-9s, and -10s codes and how they’re used; along with claims data can accelerate the development times as well as help put in controls such as standardized vocabularies.
Thoughts on the Future: People
EDW experts predict that data science will increasingly influence EDW hiring in the future. Traditional SQL programming will remain an important skillset, but as data becomes a strategic corporate asset (Figure 1), those programmers need to start building data science and machine learning skills, as well the non-relational technical skills big data requires. Additionally, a new role, the digitician, will keep the digital profile of the patient constantly updated and maintained more effectively than current methods. Today, a health system only sees a patient, on average, three times per year, which isn’t enough to understand the patient digitally. That digitician would round out that patient digital profile with a full picture of patient health (e.g., environment, socioeconomic status, etc.) beyond the traditional encounter and beyond the EHR.
Figure 1: Data as a strategic corporate asset
Moving forward, organizations also need to improve data literacy for their leaders. The C-suite needs to know how to ask for analytics help and which questions ask; for example, how do they actually use the insights to improve decision making and change the processes they want to impact?
Processes: From Design and Code Reviews to the Impact of AI
Effective data warehousing processes are rooted in design and code reviews and lightweight data governance. Advanced analytics (e.g., AI) and innovations in treatment and diagnosis will impact these processes, however, changing the nature and priorities of how healthcare manages data.
What They Did Right: Processes
Design and code reviews have been, and remain, a critical part of healthcare analytics. EDW pioneers implemented design and code reviews to encourage reliability around safety, as well as analytics accuracy. In a case reported from the UK, a mistake in the mammography screening algorithms in the National Health System resulted in about 450,000 patients failing to get proper mammography screening. An analytics algorithm error caused the mistake, proving the real patient safety issues associated with analytics and value or design and code reviews.
Lightweight data governance (the lower portion of Figure 2), governing to the least extent necessary for the greatest common good, contributed to the EDW’s early success. Both too little and too much data governance had their pitfalls. There’s a reason, for example, that not every case goes to the Supreme Court in the US justice system. By creating what amounts to a Supreme Court of data governance, then trying to implement the principles and the values of data governance at a very distributed level, organizations succeeded with a lightweight approach.
Figure 2: The healthcare data and analytics process
What They’d Do Differently: Processes
EDW pioneers may have benefited from a different approach to prioritization management, including a governance process to formalize prioritization. They would have needed to carefully balance these goals with actual capability, as demand will always outpace EDW capability.
Initially, CIOs also could have better managed expectations around data and analytic quality validation. Revealing data too early to clinicians and researchers might have set unrealistic expectations for early EDW impact.
Thoughts on the Future: Processes
AI and data science will impact major changes in terms process, edging out data governance. Process is going to be more about algorithm and model governance, which will make analytic validation very challenging. There’s a different notion of dev ops when it comes AI, and IT professionals will need to learn what it means to apply dev ops concepts to AI machine learning algorithms.
Changes in diagnosis and treatment will also impact data and analytics processes, as bio-integrated sensors will increasingly enable diagnosis and treatment and put more data in patient hands than in the healthcare system. To keep up, health system data and analytics platforms have to constantly update and upload data to cloud-based AI algorithms. Those algorithms will diagnose the patient’s condition, calculate a composite health-risk score, and recommend options for treatment or maintaining health. The algorithm will also suggest providers based on variables such as quality of care, volume of care, etc. In addition, the algorithm will allow the patient to socially interact with other patients like them, extending the patient’s resources.
Technology: From Late Binding and Beyond
Late binding was a critical innovation in early healthcare data and analytics. Today, modern demands and capabilities require even more agility, as well as advanced security capabilities.
What They Did Right: Technology
Successful early EDW leaders ignored the Enterprise Data Model (EDM) in favor of late binding. In a fluid environment, an EDM is outdated as soon as it’s complete. Also, due to the nature of the EDM process (continuous modeling and mapping), data architects never finish mapping. Every time there’s a change in the environment, they have to go back and change the model, the ETL, and the downstream analytics. An alternative to EDM, late binding doesn’t require expensive ETL tools, as most of the ETL is more object oriented further downstream in smaller grains, not the massive ETL required to EDM.
In addition, effective EDW leaders recognized the value from Microsoft in the early 2000s. They took a risk using Microsoft products; even when their organizations historically used other software and technology companies; that risk has paid off with today’s the more manageable, more automated Microsoft-supported EDWs.
What They’d Do Differently: Technology
Even though late binding was an asset in EDW development, architects may have relied too much on late binding at times, creating challenges when it comes to data modeling. If users only practice binding, they get a proliferation of data objects in the database that are hard to manage. Performance issues (e.g., load time and load management) emerge.
Late binding works well for a while because it’s agile and adaptive. But without an effective way to manage so many data objects, and without reusing some of those objects when necessary, data inconsistency and governance problems emerge. Today it appears that EDW users may have relied too much on late binding.
While healthcare is dynamic, there are still consistent data structures. For example, CMS measures tend to have lower volatility. By binding to those data structures in that intermediate space between enterprise and late binding, the EDW can perform more efficient analytics. Greater reuse and support can also increase data governance efficiency, results are more consistent.
Another early EDW misstep was too much faith in an enterprise standard business intelligence (BI) tool. Standards provide a common BI tool and provide ease of maintenance, but a common tool also puts constraints on data interaction, which doesn’t allow the agility for effective analytics. Data scientists need the freedom to use the BI tool that works best with their processes and needs.
Thoughts on the Future: Technology
Read-only batch-related warehouses are already outdated. The industry needs more real-time capability—slow and constant trickles of data into the enterprise warehouse platform. Batch loads cause huge performance spikes on the source system, as well as the data warehouse, and lead to slow decision making. Data uploads every day, or every week, don’t support timely decisions.
Future warehouse platforms must have cloud-based hybrid transaction and analytic architectures. For this reason, today’s Health Catalyst® Data Operating System (DOSÔ) is cloud based (namely, Azure). The cloud offers unmatched agility and security.
Figure 3 shows a typical modern healthcare data warehouse architecture. Data sources are on the left, with different file structures feeding into the platform. Data integration breaks up into three parts, and AI tools are a natural part of the pipeline. The lower levels contain compute and AI clustering, the transaction data storage. And, over on the far right, arrows now go back into the platform, whereas in a traditional data warehouse it all goes from left to right. This modern architecture has the ability to write applications back into this environment. Future applications will support work flow, providing a hybrid combination of analytics and work flow in the same user experience.
Figure 3: A typical modern data warehouse architecture
Twenty Years Later, Healthcare Data Warehouse Architecture Is Still Evolving
Sanders opened the presentation admitting that trial and error defined his analytics and data warehousing journey. “Generally speaking,” he said, “I learned what was right by first doing what was wrong, in life in general, but certainly in analytics and data warehousing.” Even though Sanders, Pierce, and Akhter achieved many early wins that shaped the course of the EDW and healthcare analytics and data warehousing today, their initial missteps are equally influential in the present and future of the industry. With analytics skills and technology ever advancing and innovations in diagnosis and treatment transforming care delivery, analytics and data warehousing leaders who maintain a similar spirit of agility and humility will have the biggest impact on outcomes improvement.
Would you like to learn more about this topic? Here are some articles we suggest:
- Cloud-Based Open-Platform Data Solutions: The Best Way to Meet Today’s Growing Health Data Demands
- Healthcare Analytics Platform: DOS Delivers the 7 Essential Components
- Looking Back On Clinical Decision Support and Data Warehousing
- The Healthcare Data Warehouse: Evolved for Today’s Analytics Demands
- The Homegrown Versus Commercial Digital Health Platform: Scalability and Other Reasons to Go with a Commercial Solution