The Dangers of Data Shopping: The Mad Scramble for Information

My Folder

To understand why data shopping keeps many healthcare organizations from maximizing the value of their data through analytics, it helps to think about your shopping options in the real, offline world.

One option is a strip mall with a number of independent stores. If I want to purchase a pair of jeans, some boots, a water storage system, and a backpack for a hike on Saturday, I will likely need to visit four separate stores. The selection may be limited, so if I want to make the purchase without having to drive somewhere else, I will have to accept what I can get.

The second option is a large warehouse club where everything is under one roof, often with ample choices, allowing me to get exactly what I need in a single trip. And, since all of the merchandise has been vetted by a single buying group, if I am confident in the quality of the warehouse club, I can be confident in the individual products it sells.

Today, many organizations have their data stored like merchandise in the strip mall, in siloed, independent repositories throughout the ecosystem. Those repositories may contain clinical, financial, patient satisfaction, customer relationship management, and many other types of data. Sometimes the same type of data (although not necessarily the same quality) may be stored in two or more systems.

This lack of centralization often leads to a phenomenon known as “data shopping.” Essentially, data shopping is the practice of analysts and knowledge workers searching throughout the ecosystem to obtain the information they need to answer a pressing business question.

Ideally, these analysts and knowledge workers would access a single source of truth, such as a Late-Binding™ Enterprise Data Warehouse (EDW), to mine the data their analytics require. Absent that, however, being the resourceful types they are, they will obtain it from wherever they can find it. Yet this piecemeal approach to data analysis can have challenging downstream consequences on the efficacy and consistency of the analytics.

The Dangers of Enabling Data Shopping

There are several dangers that result from a data shopping approach. One of the most significant is that scattered data results in no single source of the truth. The data quality may be high, low, or somewhere in between. Additionally, the definitions of the data from the different sources often do not agree and are inconsistent.

Just as important, the data required to perform a particular analysis may exist in both high and low quality in different repositories. As a result, the same analysis can produce different results depending on where the data was sourced. If the data used isn’t clean and reliable, the organization runs a risk of making poorly-informed decisions.

Here’s a real-life example of what can happen. Health Catalyst was working with a healthcare provider that was focused on reducing the rate of elective inductions of labor before the baby achieved a gestational age of 39 weeks. Clinical evidence has established that the risks and complications related to induction of labor are reduced significantly after the 39-week mark.

One of the keys to driving this clinical quality improvement was knowing when the 39 weeks had passed. That was difficult to determine in this organization because the data was captured in 14 different locations and 10 different formats. Before we could establish a baseline rate of pregnant mothers induced electively before 39 weeks, we first had to establish a single source of truth in the EDW regarding when the baby reached 39 weeks. Once knowledge workers knew where to look, they were able to run the reports that helped drive down elective inductions.

A second, related danger to having poor data quality is it becomes difficult to get clinicians onboard with clinical quality improvement programs. If they don’t trust the data, they won’t trust the conclusions it drives.

This issue became evident when Health Catalyst worked with another provider on a population health management (PHM) program for diabetes care. This was the first experience with PHM for the physicians who managed this population, so they were wary about it.

It turned out they were right to be wary. When we showed them the analytics, they immediately pointed out flaws, such as a patient who was not a diabetic or a particular patient who had not been in for a year. We enlisted their help to clean up the data and the program moved forward.

When it came time to begin a similar program for patients with asthma, we didn’t bring data right away; we started by asking them questions. But they told us we needed to show them the data, because they now had a level of trust in it they hadn’t had before.

Another danger of data shopping is creating a situation where expert knowledge workers, who may have advanced degrees in statistical analysis, are spending the bulk of their time on activities they were not trained in, such as hunting for and gathering data, scrubbing it, and making it useful for their analyses. In the process, they become producers of data rather than consumers of it. By the time they’re done, there’s little time left for meaningful analysis.

Warning Signs of Data Shopping

So, how do you know if your organization is in a data shopping mode? There are some obvious warning signs.

  • There are multiple data repositories where a knowledge worker can go to get answers to similar questions. This is an indicator that the same data elements are being captured, probably in different ways, within different areas.
  • There is a growing number of decentralized analysts or information consumers within the organization. While there will always be a need for analytics within specific departments to support operations, the organization should have a core group who perform most of the analytics across the enterprise.
  • The organization worked hard to hire brilliant analysts with tremendous training and experience, but those analysts are spending most of their time hunting and gathering data and making it consumable rather than working their magic.

According to attendees of the Healthcare Analytics Summit, 40 percent of data analysts spend 80 percent of their time gathering data, and another 39 percent spend 60 percent of their time gathering data (see chart below).

Time spent gathering data

Healthcare Analytics Summit survey results—79 percent of data analysts spend more than half of their time gathering data.

Organic Growth of Analytics

If data shopping causes so many problems, why does it occur? It results primarily from building new, disparate systems to capture data sets without having an overall data governance plan in place.

Many of the information needs within a healthcare organization surface organically. When analysts or knowledge workers within a department attempt to answer specific questions, they use whatever data they can find. Since the typical analyst is motivated to be as responsive as possible, everyone does what’s most expeditious. Once the path of least resistance has been established, it tends to be followed over and over again.

The problem is, in the quest to answer the immediate needs, little thought is given to the structure of the entire data ecosystem. Data isn’t often thought of and treated as a strategic asset that needs to be managed carefully, like human or financial capital. It’s just there. Soon there are little pockets of data everywhere, and those who want to use it need to go shopping.

Compounding this process is the way many chief information officers view their jobs vis-a-vis data. Since the beginning of the computer age, the bulk of CIO budgets has been focused on capturing, storing, and securing data. A much smaller percentage is dedicated to how the data will actually be used.

EDW pioneer Ralph Kimball says enlightened CIOs should allocate as much of their budgets to getting data out of their systems as they are to getting it in. Getting there, however, will require healthcare organizations to move from siloed data systems in favor of a centralized approach that incorporates comprehensive data governance.

Preventing/Solving the Data Shopping Dilemma

Some organizations are just getting into analytics and don’t yet have established patterns. Others may be well into their analytics efforts and now realize they have a data shopping issue. Either way, the advice to avoid/solve it is the same.

It begins with having a plan that starts with the end in mind. Two key points are 1) creating a single source of truth (such as an EDW) where everyone in the organization can go to obtain data that is clean, accurate and consistent, and 2) devoting as much time and as many resources into pulling out the data as is spent on storing and securing it. This is all part of treating data as a strategic asset rather than simply capturing it and locking it away.

Establishing a single source of truth is the first priority. It will not only save analysts time in finding and making the data consumable; it will also ensure that all knowledge workers are starting from the same point and using the same data.

Once the plan is in place and a single source of truth has been established, the next step is to deploy a data governance structure that focuses on the three pillars of data governance:

  1. Data quality, including having processes in place to improve the quality of data over time.
  2. Data utilization and access to ensure data is made available ubiquitously and securely to all users (within their authorization levels).
  3. Data literacy, i.e., providing education to elevate the level of understanding on how to access and use information properly. This includes a clear understanding of who the data stewards are for each area.

For organizations that are already in data shopping mode, it’s time to invoke the first law of holes: when you find yourself in one, stop digging. Acknowledge the situation and start moving toward establishing a single source of truth and a data governance structure, even if that means stopping or reducing work on analytics for a little while. It may be temporarily painful, but the rewards will make it worth the pain.

Don’t Head to the Mall

Data shopping occurs when there are no better alternatives. Rather than forcing knowledge workers to search for data throughout the enterprise like holiday shoppers going from strip mall to strip mall in search of the perfect gift, bring it all together.

Build a centralized, single source of truth, and create a centralized core set of data consumers who can be high-level strategic analysts who share across departments. Make sure those analysts are spending the bulk of their time consuming data and developing insights. Make the process of getting data out of the system as easy as getting data into these systems. Establish a data-driven culture that treats data as a strategic asset, just like human and financial capital. And finally, establish a data governance program that focuses on the three pillars.

Presentation Slides

Would you like to use or share these concepts? Download this presentation highlighting the key main points.

Click Here to Download Slides

Loading next article...