Practical Big Data (Health Data Management)

View Health Data Management Article Here

[Written April 1, 2014 by Elizabeth Gardner]

Big Data is like teen sex, says Shafiq Rab, CIO of Hackensack (N.J.) University Medical Center. “Everyone says they’re doing it, but no one really knows how.”

Rab discounts the piles of structured data in his clinical and financial systems (“anything that Oracle and SQL servers can handle is not Big Data,” he says), but his team has built an engine that crawls through the unstructured data in the EHR to correlate certain variables and analyze how they’re affecting costs and outcomes.

“When you talk about Big Data, you’re talking about every component of a person’s life that may directly or indirectly affect their health,” he says. “If you’re missing any of them, you don’t know everything about the patient. Health care is no longer about taking a pill, or treating a heart attack. Now we’re trying to prevent those acute events.”

Brett Davis, consulting principal at Deloitte, says he sees a lot of “bridges to nowhere” being built to serve Big Data. “People think, ‘If we just aggregate all this data, we’ll get insights out the other end,’ and they build tech without applications in mind.”

Nonetheless, a Big Data strategy is essential. “The shift to value based medicine makes it imperative to be able to characterize populations and what works for whom and at what cost,” Davis says.

Data warehouse vendor Health Catalyst, which evolved out of the pioneering clinical analytics developed at Utah’s Intermountain Health Care, has amassed 1,400 hospital and clinic clients since its founding in 2008. CEO Dan Burton says other industries may be ahead of health care in the use of Big Data, but that’s because their jobs are easy in comparison. “Health care data is so complex relative to what you’d need to analyze in another industry,” he says. “It’s an order of magnitude more complex than a bank.”

Burton says that a properly executed Big Data strategy can pay for itself quickly enough that organizations shouldn’t shy away because of cost. He estimates that a 200-bed single hospital can get started with a Health Catalyst data warehouse for somewhere in six figures. “Now you can quantify waste and lower your cost structure by $5 million a year, and then it’s just a question of whether you’re serious about making those changes,” he says.

Following are profiles of several providers that are starting to make real gains from Big Data strategies.

Silverton (Ore.) Hospital

Mining Claims, Speeding Payment

Sean Riesterer, director of revenue cycle and reimbursement for Silverton (Ore.) Health, knew he had to speed up the time between rendering of service and submission of the bill, but he didn’t know how until he began comparing his organization with more than 1,000 others through a claims database maintained by RelayHealth. One in five of the hospital’s claims were taking between five and 10 weeks to get submitted, and the overall average was 30 days-too long for the financial health of the 48-bed organization.

“Claims data is great whether you’re talking clinical, operational or financial analysis,” Riesterer says. “It’s the best mirror we have that comes even close to being standardized. If you figure that we generate 10,000 claims a month, then we are talking ‘Big Data.'” The data is also close to real time, so Riesterer can see trends quickly, as well as the effects of any changes. “Being able to tap into it daily saves me a lot of time.”

RelayHealth’s clearinghouse processes about 20 percent of claims nationally, and using that data, Riesterer was able to identify 50 hospitals that were in the Pacific region, had 25 to 100 beds, and offered both emergency and obstetric services. He backloaded a year’s worth of his own data into RelayHealth’s analytics programs, and quickly identified which specialties were lagging in service-to-submission, compared with peer organizations.

Documentation delays were the main culprit: One physician would finish his or her charts within five days, while another might take 40, with no apparent difference in the quality of either the care or the documentation.

“Clinicians can have a different perception of documentation, so it’s important to have data that stands up,” Riesterer says. Having reliable data on the performance of other hospitals, as well as details on claim rejections and denials, helped him work with department heads to remedy documentation deficiencies and correct recurring errors in the claims.

It took Silverton only a few months to reduce the average service-to-submission time to 13 days, and its overall accounts receivable dropped by about 20 percent as well. Riesterer doubled his cash on hand.

“I’m not going to say we’ve conquered the world, but having access to a large amount of claims data has greatly improved our revenue cycle and showed us things that were harming our cash flow,” he says.

Crystal Run Healthcare, Middletown, N.Y.

Making the ACO Transition

Big Data at Crystal Run Healthcare, a rapidly growing group practice and accountable care organization in New York’s Hudson Valley, is actually only a couple of terabytes-small enough to fit on a desktop back-up drive. But it still chokes many standard analytical tools, says Chief Medical Officer Gregory Spencer, M.D. It’s enough data to warrant a data warehouse, which Crystal Run has built with Health Catalyst. The warehouse includes not only those 2 TB of internal data-EHR, lab, radiology, general ledger and practice management-but also claims data from payers. That’s particularly vital for the Medicare Shared Savings ACO patients-about 11,000 out of the practice’s total panel of 200,000. “Medicare recipients can go to whoever takes Medicare, so you can’t lock them down [into using the ACO’s providers], even though you’re going to be responsible for their care,” Spencer says. Having access to their claims data, regardless of where they’re getting their care, can reveal patterns that could otherwise do major damage to the practice’s cost structure.

Analysis showed a number of cost anomalies. For example, a single dermatologist had markedly higher pathology spending than any other provider taking care of the ACO patients. “It was a solo practice that seemed to biopsy everything that walked through the door,” Spencer says. “Maybe it’s not the wrong thing to do, but maybe there’s a better or different way to do it where there’s less waste. It’s potentially actionable, though it’s going to be a weird discussion.”

Crystal Run is now looking at bringing in socioeconomic data, based on ZIP codes and other factors, to help predict which patients will do better with more attention, and weather data, so that it can correlate which patients are likely to be no-shows during bad weather. Staff can reach out to those patients several days in advance and reschedule them.

Spencer sees that his datasets are going to balloon over the next few years, particularly once genomic data gets into the mix. He advocates that every organization consider a data warehouse-“or at least a data parking lot”-so that it’s prepared for the sophisticated analysis that value-based care is going to require. “You don’t have to trend every single thing, but you should start gathering vitals and labs and other interesting pieces sooner than you think you have to.” While EHR vendors are beginning to build more sophisticated analytics into their product offerings, Spencer says data warehousing goes beyond the expertise most vendors have. “It’s a fairly separate skill during the setup, but once the warehouse is up and humming, you can do queries and reporting over many different kinds of data and systems.”

Moffitt Cancer Center, Tampa

Populating Clinical Trials

The term “Big Data” often conjures up images of big servers and cool visualization software that mines multiple data sources for dazzling insights. But that’s not the hard part, says Mark Hulse, CIO of Moffitt Cancer Center. The hard part is loading up the warehouse with clean, reliable data so that the insights coming out of it are not only dazzling, but accurate.

Moffitt is on its second data warehouse. The first one suffered from a poor user interface-“It was easy to get the data in, but difficult to get it out,” says Hulse-and questionable data quality. When consulting firm Deloitte became involved in 2009, it recommended starting from scratch, with Oracle’s health data management platform.

The warehouse’s initial task was cohort identification, a key mission for a research organization. “We do a lot of clinical trial work, and we want to know how many patients fit the criteria for a particular trial,” Hulse says. “It’s often a last-ditch attempt to prolong the patient’s life, so it’s important to get on a trial as quickly as possible. The way it’s usually done, through chart review, can take months or years, but we can find them within a few minutes.”

Data sources include Moffitt’s cancer registry database and outside cancer registries, Moffitt’s EHR, lab test values and medication records, and its electronic intake form, which lives on a

separate platform from the EHR and includes hundreds of patient-supplied variables. The warehouse also draws tumor and blood sample information from the bio-banking system, as well as patient consent forms for research protocols. Moffitt also has gene expression profiling on about 30,000 tumors, and targeted exome sequencing on 3,000 tumors. Eventually, Hulse hopes to do more with natural language processing to mine the information in free-text notes- which may answer the often puzzling question of whether a cancer patient is actually getting better-and with pixel analysis of tumor images.

To oversee the activities of the warehouse, Moffitt has created two new departments. The data quality and standards department makes sure the data going into the warehouse has been collected and recorded consistently and can be relied on. The information shared services department has a staff of “data concierges” who help users select the right variables to identify patients eligible for a trial.

“Querying a database is an art in itself,” says Chief Health Information Officer Dana Rollison, M.D., who oversees both departments. “You’d think it would be straightforward to find the patients once you have the data, but you can come up with different numbers depending on how you query.”

North Memorial Health Care, Robbinsdale, Minn.

Pinpointing Patient Care Challenges

North Memorial Health Care’s data warehouse has proven so useful throughout the organization that it’s had to institute a governance committee to decide which Big Data projects to concentrate on. “Our resources are limited and we have to prioritize,” says Director of Clinical Quality Jeffrey Vespa, M.D. “If everyone is working on their important project, the measurement and reporting office can’t decide what takes precedence without a governance structure.

The warehouse, built by Health Catalyst, marries operational, financial and clinical data into what Vespa calls “super data.”

“You can pull on one lever and see how it affects all the others,” he says. “We didn’t used to be able to do that.” The warehouse contains mostly internal data for now, though Vespa looks forward to incorporating claims data and other outside data sources.

Through analyzing and acting on the data, the organization has had several big wins, including reducing its elective early-term deliveries by 75 percent in six months, reducing the incidence of chronic lung disease in premature newborns by 35 percent, and increasing its charges by $5.7 million over three years just by improving physician notes to including enough clinical data for billing.

Patients coming to the hospital for observation were a potential financial risk, since observation stays are reimbursed at a lower rate. Vespa used the warehouse’s analytical applications to profile observation patients and determine whether they could be handled in more efficient ways. They grouped the patients into two categories: genuine observation patients, with conditions like kidney stone pain, possible appendicitis or low-risk chest pain, and elderly or co- morbid patients who couldn’t go home because they had no support system. The first group could be cared for more efficiently if they were kept near one another in the emergency room and seen by a group of physicians especially skilled at evaluating those conditions, and the second had to be referred to social workers for the support needed to get them home. After changing the protocols for observation patients, their average length of stay dropped from 32 hours to 11 hours.

“This project turned our observation patients from a loss to a profit,” Vespa says.

Texas Children’s Hospital, Houston

Reducing Variability

The surgeons at Texas Children’s Hospital already had a sense that their patients were staying in the hospital too long, on average, after having their appendixes out, and the cost of the procedure varied more than the complications or co-morbidities could explain. When the hospital established a data warehouse in 2011, they were able to compare clinical data with billing data, and determined that family anxiety about post-appendectomy care was contributing to the long stays. They developed an order set for discharge that included providing the family with much more detailed instructions. As for surgical costs, the culprit was surgeon preferences, and the surgeons are now analyzing where they can make changes.

“They would never drop an oil well here in Texas without understanding a lot of data from a lot of places,” says Kathleen Carberry, director of the hospital’s outcomes and impact service, who’s in charge of putting together data from multiple sources to identify areas for improvement. She envies the wealth of geological, environmental, seismic and chemical data that allow those

decisions, and looks forward to the day when she has the same deep insight into health care. “Our warehouse still has a long way to go,” she says.

TCH’s data warehouse, built with Health Catalyst and analyzed with visualization software from Qlikview, contains data from its Epic EHR and eight other internal systems, both clinical and financial. It also draws patient satisfaction data from Press Ganey, market information from the hospital’s strategic planning group and data from national cardiac surgery registries, among other sources.

Stringent data governance is essential, says I.T. Director John Henderson, because it’s both inconvenient and potentially dangerous to try to scrub the data once it comes into the warehouse. “We assume that the data is accurate as it comes from the source,” he says. When problems are found in the analysis that trace back to mistakes in the data, those responsible for entering it are charged with changing the data entry process so that the information can be trusted. “You have to educate people to have a low tolerance for dirty data,” he says.


Last September, two computer science students from the University of St. Andrews in the U.K. attempted to pin down a definition of Big Data, publishing “Undefined by Data: A Survey of Big Data Definitions” in the open-source journal Their round-up included:

* Gartner: The “Four V’s” definition: volume, velocity, variety, veracity.

* Oracle: The derivation of value from traditional relational database-driven business decision- making, augmented with new sources of unstructured data such as blogs, social media, sensor networks and image data.

* Intel: Generating a median of 300 terabytes of data weekly. Includes business transactions stored in relational databases, documents, e-mail, sensor data, blogs and social media.

* Microsoft: The process of applying serious computing power, the latest in machine learning and artificial intelligence, to seriously massive and often highly complex sets of information.

* Application definition (arrived at by analyzing the Google Trends results for “big data”). Large volumes of unstructured and/or highly variable data that require the use of several different

analysis tools and methods, including text mining, natural language processing, statistical programming, machine learning and information visualization.

* Method for an Integrated Knowledge Environment (MIKE2.0) definition. A high degree of permutation and interaction within a dataset, rather than the size of the dataset. “Big Data can be very small, and not all large datasets are Big.”

* National Institute of Standards and Technology (NIST): Data that exceeds the capacity or capability of current or conventional [analytic] methods and systems.

Doug Fridsma, M.D., chief science officer for the ONC, offers a definition that will resonate with almost everyone: “More data than you’re used to,” he says. “Some people deal with petabytes and it’s easy, but if you’re a small practice, just your own data is more data than you’re used to.”

© 2014 SourceMedia. All rights reserved.