Aiding Analytics Adoption Via Metadata-Driven Architecture: If You Build It, They Will Come

Successful healthcare organizations have come to understand that return on analytics investment depends on systematic and scalable outcomes achievement in clinical, financial, and operational areas. At Health Catalyst, the core mission for every team member is to focus on outcomes improvement for the healthcare systems we support. One of the primary measures for outcomes improvement is adoption and consistent use of analytics by end users. While there are many approaches to improving adoption, this article describes how a metadata-driven architecture is pursued as the backbone feature of enterprise data warehouse (EDW) and analytics ecosystems that enables faster analytics adoption among users.

Before describing a few best practice scenarios in analytics adoption, here is a quick look at what metadata-driven architecture is.

Metadata-Driven Architecture

A good analogy for describing a metadata-driven architecture is the manufacturing process of die-cast products. There are three key components: a reusable mold or die, the machine, and the molten metal. In order to produce the desired output, the molten metal is passed through the reusable mold assembled in the machine at high pressure. Similar to that, in a metadata-driven architecture, we are building a metadata repository (the reusable mold) for the EDW and analytical applications.

How to Approach a Metadata Repository

In a white paper by Dale Sanders, Executive Vice President for Health Catalyst, he described the metadata repository in a very simple and pragmatic way, likening it to the Yellow Pages for an EDW. Every part of the data movement and transformation is captured in metadata format and stored in the repository. Dale also recommends that a metadata repository be a 50/50 combination of human-generated content (commentary) and computer-generated content (facts) that aids various types of users to interact with it in a wiki-style interface.

ETL Engine Basics

An ETL engine functions as the heart of the metadata architecture. The way we set it up at Health Catalyst, the ETL engine reads the metadata repositories (source and target), creates necessary ETL logic, and moves data from source to destination(s) into various analytical models.

To provide a visual for the metadata-driven architecture, Figure 1 shows how we organize our data flow design. This diagram depicts three core components of a metadata-driven architecture (metadata repository, ETL engine, and data sources).


Figure 1: An example of a data flow design that is supported by the Health Catalyst Analytics Platform. This diagram depicts three core components of a metadata-driven architecture (metadata repository, ETL engine, and data sources).

With the high-level understanding about metadata-driven architecture for EDW and analytics, let’s review some of the best practice scenarios that significantly drive or contribute to user adoption.

Automating ETL Processes for Data Analysts

One of the common and recurring problems for an analytics team in a health system is how much time an analyst should spend on hunting and gathering data versus analyzing and interpreting data. Data analysts have core clinical, financial, and operational knowledge, in combination with technical skills, that are best put to use when they engage, listen, and help end users with their questions. Gathering data, loading it, and managing ETL processes are the low-hanging fruits that can be automated to free up data analysts’ time. When data analysts spend less time hunting and gathering data, they engage more with end users, answer their questions, and develop reports and applications for adoption of analytics across the user base.

A health system can consider a metadata-driven architecture as a strategic approach to automating ETL processes for analytics and business intelligence solutions. To achieve such automation, consider some of these aspects:

  • Data analysts should spend minimum time in data mapping. Consider the option of loading source data in its native format (with minimal necessary cleansing, transformation, summarization, and assignment of rules).
  • Identify ETL tools that abstract the majority of data-loading scripts, but allow data analysts to create, review, validate, and complete the data mapping using a simple Graphical User Interface (GUI). At the end of this process, there should be a metadata repository (recall the reusable molds in the die-cast manufacturing process above).
  • Think about an ETL engine that can automatically read metadata repositories and churn out ETL scripts for the data loading process.

We have repeatedly observed that automating ETL processes for data loading using a metadata-driven architecture not only liberates data analysts’ time for engaging with users, but also reduces the overall time to build a single source of truth. Clinicians and other end users are able to access, research, and engage with data within three to four months of project kickoff. We strongly recommend to health systems that are thinking about transforming their existing data warehouse to consider and evaluate a metadata-driven architecture approach.

Enhance Data Literacy to Improve Data Trust

As an enabler for data governance policies, a metadata repository enhances data literacy among users, which improves trust in data.

We embrace a simple and pragmatic approach to data governance that has a triple aim of ensuring data quality, building data literacy, and maximizing data utilization. As we have observed in many performance improvement initiatives, lack of trust in the data among end users is never a trivial problem. Whenever certain results are presented, users often question the underlying data used. While questioning data is healthy to validate its integrity, one should be careful not let it deviate from the goals and aims of improvements. One of the best approaches to improve trust is to educate users across various aspects, including quality, consistency, and timeliness of data.

A single metadata repository for the EDW and analytics is a strategic asset to educate and improve data literacy among users. Here are some ways a metadata repository can help to improve data literacy.

  • Data dictionary: As various data sources are provisioned in the EDW, business/technical definitions, source to target lineage, PHI sensitivity, and other aspects of data are captured and stored in metadata, providing a rich data dictionary. With a simple GUI to expose the data dictionary, various types of users can browse the repository and find the answers to many questions about data and its validity (where did it come from, when was it loaded, what logic was applied, if any, where is it being used, etc.).
  • Understand various uses of the data: With the help of the data dictionary, a data analyst can review, learn, and validate how certain measures are defined/used across various improvement programs. For example, a simple search of ‘LOS’ in a metadata repository might return more than one result with different definitions. While an LOS for a surgery department might be different from an LOS for ED, by tracking the lineage of each definition back to its data model, data analysts are exposed to different use cases and improvement projects that are using the EDW. Having seen the EDW being adopted by peer analysts and peer improvement projects, data analysts not only learn about LOS, but more importantly, they develop trust with the EDW that is critical for increasing adoption.
  • Collaboration among users: Analysts working on improvement projects unearth various issues about data elements that might be critical for other users. Having a metadata repository where users can contribute about data quality issues can be a great collaboration platform to improve data literacy. For example, one of the population health quality measures tracked in a primary care setting is the number of patients who receive flu shots. Most often, patients receive shots in any qualified health center (a pharmacy, PCP office, employer event, etc.). This data may not automatically be captured in an EMR, or may not be reported in claims, unless a nurse at the PCP’s office makes an effort to manually ask the patient about flu shot status and records that in the notes. Being a critical measure for regulatory reporting, some health systems may have an approach to capture this information in a discrete field and supply it to the EMR or directly to the EDW (through a data entry app). As a data analyst supporting a regulatory reporting project, knowing about the source of flu shot data and its timeliness (captured weekly, monthly, quarterly, or yearly) will be critical to meet the needed threshold for reporting purposes. Users are more likely to collaborate and share learnings, if they have a single metadata repository, that can improve literacy among users.

Improving Agility of Data Analysts

So far, we have looked at how a metadata-driven architecture can automate ETL processes and improve data literacy among users. But can it also empower an agile team to meet their Aims and Goals? As health systems engage in outcomes improvement projects, a permanent cross functional agile team comprises various roles (data analysts, SME, physician champion, nurse lead, source application admin, etc.). This permanent agile team is responsible for defining aims and goals, building measures to monitor interventions (process, outcomes, and balanced measures), and enabling adoption of best-known practice intervention protocols in the health system’s workflow.

One of the expectations in this permanent agile team is managing change requests. It is extremely critical for data analysts maintaining analytics to turnaround change requests in a few hours rather than days or weeks. For example, some of the change requests can be:

  • Adjusting the analytical measures that are being monitored due to constant regulatory changes (from CMS).
  • Clinicians constantly changing interventions based on evolving, evidence-based medicine. To provide the best possible care to patients, related measures constantly change, as well.
  • End users constantly requesting changes to the visualization of analytics to make better use of them.

How can a metadata-driven architecture enable a data analyst to be productive and turnaround change requests quickly? To construct change requests, data analysts can browse through the metadata repository for needed data elements and intervention measures, and also be aware of any reported data quality issues. If a data element needed for the change request is not available in the EDW, data provisioning from source systems can be completed faster using an automated ETL process through the metadata-driven architecture. Use case data model components (cohort definitions, inclusion/exclusion rules, metrics definitions etc.) can be available in the metadata repository. Data analysts can review, modify, and re-publish those components, per change requests, back to the metadata repository. Using metadata-driven architecture, dashboards/reports can be automatically refreshed to publish results to end users.

Figure 2 shows how a data analyst can take various use case data model components available in the metadata repository to work on specific change request scenarios.


Figure 2: How a data analyst can use various use case data model components available in the metadata repository to work on specific change request scenarios.

Faster Data Results in Faster Adoption

Organizations that Health Catalyst works with across the country are embracing the metadata-driven architecture for their EDW and analytics platform and finding great success. Community Health Network (CHNw) is an excellent example of how one healthcare system successfully organized data from multiple technologies, including several EHRs and other unintegrated data sources, using an EDW. Among its many improvements, CHNw boosted operational efficiency by 70 percent and achieved its data integration goals within 12 months.

This kind of speed-to-value and adoption are typical of possible outcomes for organizations that implement the appropriate technology. As described throughout, location, lineage, and usage of data being surfaced for data analysts enables early usage of the data and broad distribution of the analytics. Any health system embarking on a journey in search of broader adoption and easier digestion of data should consider a metadata-driven architecture.

Presentation Slides

Would you like to use or share these concepts? Download this presentation highlighting the key main points.

Click Here to Download Slides

Loading next article...