In Healthcare Predictive Analytics, Big Data Is Sometimes a Big Mess

Healthcare Big Data and Predictive AnalyticsThose in big data and healthcare analytics circles will seldom hear the phrase, “less is more.” In a clinical setting, however, there is an important lesson to learn about the effective execution of predictive analytics: Health systems should not confuse more data with more insight.

More data is simply more—more tables, more lists, more replicates, more clinics, more controls, more rows, more tables of tables and lists of lists, etc. In short, for predictive analytics to be effective in a clinical setting, a specific focus will always trump global utility.

Specificity in predictive analytics is essential for two reasons: First, specificity improves algorithm performance and accuracy. Second, specificity improves the associated intervention’s effectiveness.

Successful Predictive Analytics in Healthcare Does Not Depend on Big Data

The key to successful predictive analytics implementation is more rooted in upfront planning than in harnessing big data; It begins well upstream of the predictor and implementation, and includes four parts.

First, healthcare organizations must accurately model the workflow and detail the specific questions they want the computer to address.

Second, health systems need to collect the necessary data specific to and characteristic of the problems they are trying to solve. Gathering this data is often guided by three questions:

  • What is known about the specific patient?
  • What is known about that population?
  • What supplementary data can be leveraged from external and public sources?

The goal of this second step is to stay specific to a system’s original question. Using the computer to help with feature selection can be especially useful during this step.

Third, health systems must recognize the weaknesses and leverage the strengths of various algorithm approaches:

  • Linear regression is typically used for continuous data.
  • Logistic regression is for categorical/discrete data.
  • Naive Bayes deals with missing data much better than many other approaches.
  • Support vector machine algorithms are powerful as non-linear classifiers (and have excellent performance in binary classification), but they are computationally demanding to train and run, and are sensitive to noisy data.
  • Human readable approaches (e.g., rules-based and regression classifiers) can implement directly into many SQL reporting environments, while machine readable approaches (e.g., random forest and neural nets) may require additional programming.

The final step, and perhaps the most important step, is finding the appropriate clinical group and environment for implementation. Without the proper framework in place, and without the willingness to intervene and give context for meaningful use, prediction is not useful; rather, it is often a waste of time and money.

Don’t Trade Utility for Big Data Hype

With so much hype surrounding market buzzwords, such as big data and predictive analytics, it can be daunting for healthcare organizations to sort through all the noise in this space. One guiding principle can help: do not trade useful for glamorous.

In healthcare, the tradeoff of a more generalized prediction model that inputs big data and global features is that targeted utility is lost or diluted. The features that effectively characterize a condition are the same attributes that can train an accurate predictor. But if those features do not stand out above the background noise, then the predictor only finds the noise; for this reason, prediction focused on a specific clinical setting or patient need will always trump a generic predictor in terms of accuracy and utility.

The full power of clinical prediction is best realized when the computational question is carefully defined, specific variables are gathered, a targeted need is met, and participants are willing to act. For predictive analytics, it’s the intervention that matters most. After all, it’s the intervention—not the predictor—that will improve patient care.

Additional Reading

Would you like to learn more about this topic? Here are some articles we suggest:

  1. Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
  2. Big Data in Healthcare: Separating the Hype From The Reality
  3. Prescriptive Analytics Beats Simple Prediction for Improving Healthcare
  4. Predictive Analytics: Healthcare Hype or Reality?
  5. 3 Reasons Why Comparative Analytics, Predictive Analytics, and NLP Won’t Solve Healthcare’s Problems


Loading next article...