Data Warehouse Tools: Faster Time-to-Value for Your Healthcare Data Warehouse
Editor’s Note: This completes a two-part series that looks at some data warehouse tools Health Catalyst has developed to help improve the time-to-value when building an enterprise data warehouse.
One of the biggest challenges organizations face when implementing an enterprise data warehouse (EDW) is the time it takes from the start of the project to the point where it’s delivering value. Typically, this has been anywhere from a 12 to 24 month process. This is where data warehouse tools can help.
Under any circumstances that’s a long time to wait. In fact, it can feel like an eternity, especially if you have population health management initiatives that can’t be launched until the EDW is fully functional. But now, with all the pressure coming down on hospitals and health systems regarding accountable care and moving to a pay-for-performance model, taking that long to complete an EDW implementation is simply unacceptable.
It’s a problem we at Health Catalyst began addressing a couple of years ago. Our original goal was to cut that time-to-value down to six months – a goal we have achieved, by the way. But now that we’re there, we’re seeing we can still improve on that. So while the rest of the industry is still measuring time-to-value in years, our objective is to bring it down to three to four weeks. As the old saying goes, make no small plans.
Problems Around Healthcare Metadata
One common reason that EDW implementations take so long is how much manual work goes into making it possible to extract data from source systems, such as EHRs or the general ledger, and bring it into the data marts for the EDW. The first step is determining where the needed data resides in each source system. That’s what metadata — data about the data, such as the names of tables and columns – tells us.
Mapping that metadata manually, as it is normally done, is a tedious, labor-intensive chore. Take a typical EHR system for example. An EHR can have more than 1,000 tables, each of which can contain 100 or more columns. To visualize how much data that is, think of 1,000 Excel spreadsheets with 100 columns each, all open at the same time. That is a lot of metadata to examine and assign a value to.
What makes it worse, however, is that there hasn’t been any consistency in working with the metadata. The long, arduous effort normally expended on metadata mapping can’t be re-used the next time you are working with that same source system. The result is every new implementation requires the same amount of work.
Data Warehouse Tools: The Source Mart Designer
It was this issue that drove Health Catalyst to develop our Source Mart Designer (SMD) tool. The SMD is a knowledge base that collects everything we learn about mapping metadata from all the different source systems and makes it available in a single, comprehensive resource. Rather than starting over each time, we can go to the SMD and pull out that knowledge, helping us greatly reduce the amount of time spent just determining where to begin.
But it isn’t just about knowing where to start. The SMD also automates much of the mapping we used to perform manually. You can point the SMD tool at the source system and the tool will pull out the raw data, giving us an incredible head start on the mapping process. This type of automation not only greatly speeds up the process but also improves the quality and consistency of the work. Greater consistency, of course, helps drive re-use and speeds up the process even more for the next time.
The source mapping process, facilitated by SMD, is an essential step in creating the proper metadata to drive our automated extract, transform, and load (ETL) process. Our unique, metadata-driven ETL process enables a high degree of automation for nightly ETL practices. A few simple, Health Catalyst-supplied ETL scripts are used to load any number of disparate source systems into the EDW.
Once a new source system is mapped, an ETL script is selected to load the new source system. The ETL process is driven entirely by the metadata of that source. This high degree of automation is a huge advantage over traditional data warehousing systems in that ETL developers are unnecessary because maintenance of hundreds of ETL scripts is nonexistent. Metadata is the heart of the Health Catalyst solution, and SMD makes the creation and maintenance of that metadata simple and intuitive.
To fully understand the impact the SMD is having, consider that there are at least 60 significant EHR, patient satisfaction, general ledger, and other source systems in healthcare. To date, we have mapped more than half of those, with more added to our knowledge base every day. The more systems we have mapped, the more we can re-use that knowledge, enabling us to deliver faster time-to-value for the EDW at a lower cost.
Making the Metadata More Useful
Improving time-to-value involves more than simply mapping metadata from the source system to the EDW, however. Once it’s there, the next step is for Health Catalyst’s experts to rename it into terms readable by humans, making it easier to search.
For example, the EHR source system may have a column labeled Pat_ID. While that may make sense to someone who has been in healthcare IT software a long time, it’s one of many names that could be used. It’s also not the way a person unfamiliar with that code would search for that information.
Knowing this, the Health Catalyst team will change that column heading to the more intuitive PatientID. When users search for “Patient Identifier” information, they will be far more likely to receive the results they want.
The uses of suffixes, like ID in the case of PatientID, convey additional meaning to a name. Another example is the addition of a suffix such as NBR (indicating that the column contains a numerical value) or CNT (a count such as how many times a patient has had a particular type of test). The team uses its extensive knowledge of the source system, supplemented by the knowledge base in the SMD, to change all the cryptic source system names into ones reflecting the way humans think. All of that data is then placed into the metadata repository so users can browse through it with another tool we’ve developed called Atlas.
The Search for Metadata
The Atlas tool is designed to simplify and speed up search by making it more intuitive. It’s very similar to the way Google works for general internet searching. If you’re a clinician who wants to find a patient identifier in the EHR, you wouldn’t enter Pat_ID as your search term unless you already knew that’s the term you needed. You would enter PatientID because that’s how you think of it.
Atlas can browse for that term, or something similar, in the metadata and not only find the identifier but tell you additional information about it, such as the source system it came from, the table it came from, and the type of data it is (a number, for example). By taking cryptic, proprietary terms and turning them into terms a human would use and search for, we are able to speed up the process of building subject area marts and metrics based on the data. It also means users who are building the subject area marts don’t need to consult an expert in the source system to help them find what they need. They can type in the terms they would use intuitively and Atlas will find the metadata they need.
Improving Population Health Management
This capability for intuitive search, without a requirement to understand all the proprietary terms, is critical when pursuing a population health management or ACO strategy. Because those systems incorporate data from many different hospitals, clinics, primary care physicians, specialists, etc., there is a high likelihood that several different EHRs from several different vendors are being used.
Once all of those different EHRs are mapped into the EDW with our SMD, Atlas can be used to find all instances of Patient Identifiers in the different systems without having to know what proprietary term is used by each of the EHRs. Users can type in “Patient Identifier,” and it will find that information in the data mart, regardless its native EHR.
The same is true for any source system – general ledger, patient satisfaction, etc. If you need a Patient Identifier from any of those systems, you can use the same intuitive language to search for it. The result is it makes it easier to function in a heterogeneous environment which is the direction the entire healthcare industry is headed. Until we have complete standardization and interoperability – which may be well off into the future – these tools provide the next best thing. And in a timely fashion.
One of our best examples of using SMD and Atlas to improve time-to-value comes from the work we did with Indiana University Health (IU Health). It was the first time we’d seen the Cerner EHR, yet using SMD, we were able to map it in just 90 days. That’s a process that used to us take anywhere from 180 days to a year. Even better, as a result of that project, we were able to incorporate what we learned about Cerner into our knowledge base, so the next time we face a Cerner EHR we will be able to do it even faster.
Accelerating Healthcare Data Warehouse Time-to-Value
It’s all about getting that tedious front-end process completed quickly, so you can start improving patient outcomes/satisfaction and lowering costs sooner instead of trying to solve a mountain of technical problems. Once you’re there, you can build applications based on metadata in the data marts rather than having to use the painfully slow process of going all the way back to the source systems, finding an expert who knows how the data is labeled in that system, and then bringing it back in for the application.
This intelligent application of metadata also saves time and improves accuracy for nightly extractions of data from the source systems. Rather than having to use humans to perform the extraction, the metadata will indicate all the table names, column names, and other information about the source system that is used to extract and update the data needed for reporting and other uses.
In most systems, creating metadata is a manual process that can get out of sync quickly. With our approach, the metadata is updated and reports are rebuilt automatically, which continues to deliver improved time to value on a daily basis.
The bottom line: in Health Catalyst’s view, metadata is king. It drives the value. The SMD and Atlas are the data warehouse tools you use to manage that kingdom.
Over the past few years, healthcare organizations in general have gotten much better at capturing data. The adoption of EHRs, driven by Meaningful Use requirements, along with many other technologies, has given us the potential to know more than we’ve ever known about healthcare, especially as it relates to population health management (PHM).
The key word in that sentence, however, is “potential.” All that data is lying there waiting to be used, and we’re accumulating more of it every day. Yet much of it remains right where it is, locked away in source systems rather than helping us actually understand and manage the health of populations.
It’s not for lack of will. It’s more a matter of how difficult it has always been to extract the data you need (and only the data you need) to build a subject area mart that will allow you to make calculations and use business intelligence (BI) tools to create visualizations that indicate trends.
The Health Catalyst Late-Binding™ Data Warehouse consolidates the data from EHRs and other hospital systems into one unified repository simplifying the process of combining disparate data sources into knowledge.
Data Warehouse Tools: The Subject Area Mart Designer
It was this issue that led Health Catalyst to create a tool we call the Subject Area Mart Designer (SAMD). It uses the work performed with our SMD (described above) to simplify the process of accessing the data you extract each night to create visualizations on a particular area. A SAM essentially is a subset of data focused on a particular subject.
Say, for example, you want to look at how well your organization is managing its entire diabetic patient population. In the past, pulling the information you needed out of the various source systems so you could perform the calculations was a time-consuming, arduous, manual task that required a lot of specialized IT knowledge. In fact, it could take months to build an individual subject area mart in your EDW using that method.
By relying on the metadata in the EDW’s source mart, which uses human-readable, easily searchable terms rather than the source data and its proprietary terminology, the SAM Designer simplifies much of the process. Since you don’t have to go all the way back to the source system and root around in there to determine where the information you need is located, or how it’s labeled, you don’t have to be an IT expert (or wait until one becomes available) to create a SAM.
Here’s where it’s made a huge difference in delivering value. In the past, a data architect would sit with a clinician and try to interpret what that clinician was saying. For all intents and purposes they were speaking different languages, which made the process of building the SAM very slow.
Now, clinicians and data architects can actively work together to build the SAM. In most cases a SAM can be built in just one day – which means the organization starts realizing value from that SAM immediately. Once the basic SAM is in place you can make continual improvements that increase its value even further.
Looking again at the example of the diabetic patients, if you want to see how the organization is doing managing that population you can build a SAM goes into the metadata and draws in the relevant fields and types of data for managing those patients, such as HbA1c levels, and filters out anything you don’t need. Once that data is there you can run calculations such as:
- How many patients in that population have their HbA1c levels under control, i.e., below 7 or whatever number you choose to use?
- How many of the patients who are out of control have I seen recently?
- Are any patients trending outside their normal A1c values, where they could be putting themselves at risk?
In short, you can ask whatever questions you need to manage the health of that particular population and get the answers back quickly.
You can see what a huge improvement that is in time-to-value. That’s why Health Catalyst is making such a huge investment in this particular tool. We’ve already released a version of the SAM Designer to rave reviews from our customers. The focus is on giving business analysts and the more tech-savvy clinicians the ability to build SAMs without any IT involvement at all.
It’s also a tool we’ve been using extensively internally to create SAMs for our clients, and to build our applications. For example, our analytics applications team is using the same tool our customers use to create SAMs we can offer as stand-alone applications to aid clients looking for more of a turnkey-type approach (as opposed to building the SAMs themselves).
Making Technology Work for Clinicians
Of course, since everyone in the U.S. practices medicine a little differently, there’s also a need to manage those differences. Our plan is to ship SAMs as starter sets to provide a very high beginning point, and let the organizations customize the details (such as changing parameters or enhancing metrics) based on their needs.
While I am a technologist at heart and find all the details that goes into extracting and using data fascinating, I also realize that I am in the minority. The key in delivering time-to-value in healthcare is getting the technology out of the way of the clinicians to ensure they’re spending their time practicing medicine instead of their software skills. Making it easy to get to the data needed in order to create calculations and visualizations is critical to managing the health of populations effectively. The SAM Designer is one more way we’re accomplishing that.
Have you used metadata in your EDW? Have you had to recreate it for each source system? What has your experience been around building SAMs? Have you done it? If so, what sort of time to value were you able to generate? Have you ever tried using metadata to shorten it?