Those in Big Data and Healthcare Analytics circles will seldom hear the phrase “less is more.” In a clinical setting however, there is an important lesson to learn in regards to the effective execution of predictive analytics. We should not confuse more data with more insight. More data is simply more—as in more tables, more lists, more replicates, more clinics, more controls, more rows, tables of tables and lists of lists, etc. You get the idea. In short, for predictive analytics to be effective in a clinical venue, a specific focus will always trump global utility.
Healthcare data is positioned for momentous growth as it approaches the parameters of big data. While more data can translate into more informed medical decisions, our ability to leverage this mounting knowledge is only as strong as our data strategy. Hadoop offers the capacity and versatility to meet growing data demands and turn information into actionable insight.
Specific use cases where Hadoop adds value data strategy include:
While many people are looking to Big Data to solve a lot of healthcare’s data problems, Big Data won’t offer a lot of solutions for a while to come. For one, healthcare doesn’t have “Big” data; there just isn’t the volume, velocity, or variety seen in other industries such as banking where Big Data has been used successfully. For another, Big Data seems to be the answer to almost every question from cancer to Alzheimer’s, and that’s blinding us to the reality of healthcare analytics. A big way toward answering healthcare’s problems would be to improve data literacy among not only consumers, but physicians and administrators as well. Learning to ask the right questions about the data and learning how to read data correctly will get us further down the road to improvement than the latest buzzword (in this case, “Big Data”) ever will.
Health system leaders have questions about big data: When will I need it? How should I prepare? What’s the best way to use it? It’s important to separate the hype of big data from the reality. Where big data stands in healthcare today is a far cry from where it will be in the future. Right now, the best use cases are in academic- or research-focused healthcare institutions. Most healthcare organizations are still tackling issues with their transactional databases and learning how to use those databases effectively. But soon—once the issues of expertise and security have been addressed—big data will play a huge role in care management, predictive analytics, prescriptive analytics, and genomics for everyday patients. The transition to big data will be easier if health systems adopt a late-binding approach to the data now.
Many industries, especially those using huge amounts of data like Facebook, are using Hadoop for their processing needs. So, what exactly is Big Data and Hadoop and what are its implications for healthcare? Hadoop is a distributed processing and storage platform. The use of Hadoop is rare in the healthcare industry, but healthcare analytics hasn’t necessarily been stalled because of this. In fact, the quality of data healthcare produces doesn’t justify Hadoop-level of processing power. This article answers questions such as what is Hadoop, what are the drivers of this platform in other industries, how might it affect healthcare analytics, how would clinicians use data sources outside their environment, and what drawbacks currently exist for further adoption.
The term Big Data seems to be everywhere. It can be defined by three characteristics: volume, velocity, and variety. Traditional data management techniques include the ever-popular approach of using SQL interactions with a relational database. Here’s how health care fits into big data as defined above:
i. Volume- A typical healthcare firm stores less than 500 terabytes of data, as opposed to an investment firm that stores almost 4,000 terabytes. A traditional database engine can handle far more data than most health organizations produce. This means healthcare systems should carefully consider the tradeoff before switching to big data tools.
ii. Velocity- The speed at which some applications generate new data can overwhelm a system’s ability to store that data. However, most healthcare data is entered by employees unlikely to generate data fast enough to overwhelm a typical SQL database.
iii. Variety- There are three different forms of data in most large healthcare institutions. Discretely codified billing and clinical transactions are well suited for relational data models. The third form of data in healthcare consists of blobs of text. While stored electronically, there is very little analysis done on this data today, because SQL is not able to effectively query or process these large strings. There is much opportunity for progress in this area.