Wal-Mart and the Birth of the Data Warehouse
In no other field of study is measurement and true knowledge more complex, elusive and subjective than in healthcare. We are measuring ourselves and in so doing, the observer becomes the observed. As a result, uncovering the truth in healthcare is simultaneously fascinating and daunting.
That’s where the healthcare enterprise data warehouse (EDW) can help. The essence of data warehousing is measurement that leads to understanding. When properly handled, the insights from a healthcare EDW drive behavioral changes that lead to quality improvement.
It’s a straightforward enough process: Measure > Understand > Change behavior > Improve quality. But how does a healthcare EDW accomplish this alchemic transformation of raw data into quality improvement? To answer that question we need first to understand what a data warehouse is, and why it is essential for quality improvement. Along the way, we’ll get to the title of this post and Wal-Mart’s central role in the history of the data warehouse.
What is a Data Warehouse?
As a relatively new form of information technology in healthcare (only 30% of healthcare organizations have anything that even remotely resembles a data warehouse), an introduction to the basics is probably a worthy topic. In general, a data warehouse is a centrally managed and easily accessible copy of data collected from the transaction information systems of a corporation or health system. These data are aggregated, organized, catalogued and structured to facilitate population-based queries, research and analysis. In recent years, data warehouses have also been used to support what I call “secondary use of transaction data” which means retrieving and interacting with single records. For example, I’ve used data from a data warehouse to support patient account management, coding, lab results review over the web, alerting infection control when an MRSA/VRE patient is admitted, and to review individual patient records during an EMR downtime. In other words, although a data warehouse is primarily used for population-based analytics, the data can be use for other purposes, as well, at the transaction level.
The data in a data warehouse comes from multiple source systems. Source systems can be internal systems, such as the EHR, or external systems, such as those associated with the state or federal government (e.g., mortality data, cancer registries).
As a relatively new specialty in healthcare information technology, data warehousing suffers from a lingering confusion about its characteristics – in particular, those features that distinguish a data warehouse from a typical database. To clarify, I offer the following as characteristics of a data warehouse:
- Dependent on multiple source systems. A data warehouse is populated by at least two source systems, also called transaction and/or production systems. Examples include EHRs, billing systems, registration systems and scheduling systems. In large enterprises, it is not unusual for a data warehouse to contain data from as many as 50 different source systems, internal and external.
- Cross-organizational analysis. Data warehouses are designed specifically to enable data analysis across business and clinical processes, that is, the ability to analyze and “link” data across multiple source systems supporting various business processes, particularly the full continuum of care for a patient, from birth to death. For example, a data warehouse could provide the ability to analyze the relationship of data contained in an EHR problem list that is coded in SNOMED with the data contained in a billing system that is coded in ICD.
- Trends, metrics and reports. A data warehouse is designed specifically to help identify trends and previously unknown relationships in business processes. The data output is characterized by metrics and reports. In large enterprises (15,000 employees and more), it is not unusual for a data warehouse to produce hundreds of reports and process tens of thousands of queries per month.
- Large. It is not unusual for data warehouses in today’s information-intense environments to contain billions of records constituting dozens and even hundreds of terabytes of data.
- Historical. A data warehouse typically stores many years of data, typically at least five and sometimes as much as 30 years’ worth.
Think of a data warehouse as a very large, very specialized kind of library – a centralized, logical and physical collection of data and information that is reused over and over to achieve greater understanding, or to stimulate new knowledge. Like a well-stocked library, the use cases for a well-designed EDW are nearly limitless. Similarly, the ROI of a data warehouse is as difficult to calculate as the ROI of a library to a community or university.
Quite often, as we’ll see, the greatest benefits of a data warehouse are not planned for or predicted.
Back to Wal-Mart
It probably won’t surprise you to learn that the roots of data warehousing lie outside of healthcare. EDWs have existed in various forms and under many names since the 1960s, though their true origin is difficult to pinpoint. Military command-and-control and intelligence, manufacturing, banking, finance, and retail markets were among the earliest adopters. In the mid-1990s, data warehousing appeared as an IT subspecialty.
Around that same time, Wal-Mart began to achieve wide acclaim for its mastery of supply chain management. Behind the mastery of their supply chain was Wal-Mart’s data warehouse. The world’s largest retailer leveraged transaction data collected by its point-of-sales systems to achieve unprecedented insight into the purchasing habits of its 100 million customers and the logistics guiding its 25,000 suppliers.
Wal-Mart’s data warehouse, the first commercial EDW to reach 1 terabyte of data in 1992, began, like many good things, as an accident. One of the retailer’s computer operators, tired of retrieving archival tapes for historical sales data, secretly “borrowed” excess storage space on a company server, where he downloaded and stored the data from the most-requested tapes.
Such a giant data stash couldn’t stay secret for long, and it didn’t. When Wal-Mart managers found it they quickly realized the enormous value of timely and widespread access to data. Thus was born the Wal-Mart data warehouse. Soon, every transaction in 6,000 Wal-Mart stores was available for analysis in the data warehouse within seven minutes. This treasure trove of data enabled Wal-Mart to react in near real-time to sales and supply data.
In one story of analytic agility, a Wal-Mart manager on the East Coast prominently displayed an on-sale computer, the day after Thanksgiving, driving a spike in sales that far exceeded the sales of other stores in the region. Alerted to the sales anomaly by EDW analysts, Wal-Mart corporate