4 Essential Lessons for Adopting Predictive Analytics in Healthcare


changed more frequently (hotspots) as compared to surrounding locations. These hotspot changes are very characteristic to RET kinase and can be modeled with high accuracy when looking only at the RET gene. But if this gene-specific situation is diluted into a mutations data set derived from the entire genome, that specificity and accuracy are completely lost13.

The very features that characterize a condition well are the attributes that can train an accurate predictor. But if those features (variables) do not stand out above the background noise, the predictor only predicts the noise well. The full power of prediction is best realized when specific variables are gathered, a targeted clinical need is met and participants are willing to act.

Integrated Prediction

Lesson #2: Don’t confuse insight with value. People who work in a “database discipline” understand that data plus context equals knowledge.  In a similar vein, prediction in a comprehensive data warehouse environment is superior to standalone applications, as illustrated by the potential synergy of the existing Rothman Index, an early indicator of wellness14.  This proven algorithm captures trends from multiple data feeds of vital signs, lab values and nursing assessments. This data, taken as a whole, will often provide early warning as a patient begins to fail, where even a careful human observer cannot possibly “connect the dots” between so many unrelated data points simultaneously.  One key to the success of the algorithm is first obtaining all of the necessary data.  Assessing only part of a picture often yields an incorrect view.

Level of Trust

Lesson #3: Don’t overestimate the ability to interpret the data. Another interesting irony is the level of trust that people place in computational prediction or forecasting.  With hurricane season winding down, isn’t it comforting to know that computer models accurately predicted seven of the last three storms? Yes, you read that right, the computer overshot by more than 200%. What it really comes down to is that comparison between the weather forecast and someone whose joints ache whenever a storm is coming – which one is truly more accurate? At the end of the day, it either rains or it doesn’t, regardless of what the forecast said. It’s all about the outcome.

The same holds true for predictions in healthcare. The complication is that comprehensive outcomes data is often missing in our current healthcare system. By not capturing the “final outcome” the utility of machine-learning tools is severely limited in this particular setting and thus becomes one of the obstacles to widespread adoption and trust. Without a class outcome (label) to train the algorithm, supervised (structured, forward chain) models cannot be easily built.

In reality, however, clinicians make judgments and medical decisions using incomplete information every day.  Granted, these are typically sound judgments based on training, past experience and collective knowledge of trusted colleagues.  But at the end of the day, treatment decisions made on incomplete information and educated guesses are quite common in the current health system.

In the end, the goal is the same: to leverage historical patient data to improve current patient outcomes. Predictive analytics is a powerful tool in this regard.

4 to Lessons from Predictive Analytics

Figure 2. Overview of the machine learning modeling process.


State of the Industry

Lesson #4: Don’t underestimate the challenge of implementation. So many options exist when it comes to developing predictive algorithms or stratifying patient risk. This presents a daunting challenge to health care personnel tasked with sorting through all the buzzword and marketing noise.  Healthcare providers need to partner with groups that have a keen understanding of the leading academic and commercial tools, and the expertise to develop appropriate prediction models.  Representative examples of open source tools include popular software such as R and Weka. The statistical package R can be found on The Comprehensive R Archive Network (CRAN) hosted on servers at the Fred Hutchinson Cancer Research Center.  This is a widely used open source tool, with thousands of specific libraries or “packages” for a variety of applications.  As of September 2013, the CRAN package repository features 4,849 available packages – and that number is growing exponentially.  Packages are user submitted (shared) to assist with statistical computing for topics such as biology, genetics, finance, neural networks, time series modeling and many others.

The Waikato Environment for Knowledge Analysis (Weka) incorporates several standard machine learning techniques into a software workbench that’s issued under the GNU General Public License.  This is a Java implementation of tools for data pre-processing, feature selection, classification, regression, clustering, association rules and visualization, hosted by the Computer Science Department at the University of Waikato in New Zealand.  Using Weka, a specialist in a particular field can leverage machine-learning methods to discover useful knowledge from datasets much too large to analyze by hand.

Representative examples of commercial offerings include Spotfire and Greenwave Systems.  Spotfire is a long-standing data visualization tool kit that originated out of the University of Maryland in 1996 and acquired into the TIBCO product suite in 2007.  Similar to SAS, SPSS and other analytics vendors, Spotfire supports in-database analysis in conjunction with Oracle, Microsoft SQL Server and Teradata data warehouse platforms.  Visualization applications cover a

Loading next article...