The Top Three Recommendations for Successfully Deploying Predictive Analytics in Healthcare


Successfully deploying predictive analytics is an area of critical concern for health systems as its use continues to evolve in the healthcare industry. Over the past five years, advances in healthcare around data availability and open source tools have made using predictive analytics much easier. In fact, many healthcare organizations have dabbled in predictive analytics. Despite these advances, however, organizations are still struggling to make predictive analytics routine, pervasive, and actionable.

Three key recommendations for scaling predictive analytics and machine learning can help your organization successfully deploy this advanced capability:

  1. Fully leverage your healthcare system’s analytics environment.
  2. Standardize tools and methods using production quality code.
  3. Deploy predictive analytics with strategies for intervention.

Defining Predictive Analytics and Machine Learning: Much More than Prediction

The Gartner IT glossary defines predictive analytics as a method of data mining (the analysis of large data sets to discover patterns) that has “an emphasis on prediction.” In other words, the method uses pattern recognition to predict future events. This is, however, only the surface of predictive analytics, particularly in the case of healthcare.

In healthcare, you need more capability than prediction alone. Your organization needs to know how to use data to improve patient outcomes, and have the wherewithal to act and intervene in a data-driven environment. The more experience an organization has with using data for improvement, the more prepared it is for predictive analytics.

Machine Learning Is a Technique Used by Predictive Analytics

Machine learning is often included in predictive analytics discussions, and while the two are related, it’s valuable to understand that they’re not the same. According to Gartner, machine learning is the “technical discipline that provides computers with the ability to learn from data (observations) without being explicitly programmed.” It’s designed to help extract knowledge from data by using algorithms to make predictions. Data scientists use machine learning as a technique to produce predictive models. In other words, machine learning is the main technique behind predictive analytics today. It’s also the tool behind other recent technological advances, such as facial and speech recognition.

The History of Predictive Analytics in Healthcare

Predictive analytics is hot topic in healthcare today, but its roots in the industry go back to the late 1980s. The Charlson Index was introduced in 1987 as a risk predictor for mortality. It uses information on a patient’s comorbidities, and factors including their age, to determine their risk of dying. More recently, in 2010, LACE (length of stay-admission-comorbidities-emergency department visits within the past six months) was introduced with a goal to predict hospital readmissions. The index uses length of stay, acuity, comorbidities, and ER visits to determine how likely a patient is to be readmitted. LACE uses data from multiple sites across the country.

Today, however, we’re realizing that the Charlson and LACE index tools aren’t as effective as we need them to be. They tend to be too general and, therefore, not accurate for specific populations. For instance, when predicting readmissions for chronic obstructive pulmonary disease (COPD) patients, LACE will typically do a poorer job than a simple model that is based on factors specific with COPD readmissions.

It’s time for healthcare to do more with predictive analytics and grow beyond these one-size-fits-all models to models that provide more data in greater detail and data personalized to the patient needs in question.

Opportunities for Growth and Innovation in Healthcare Predictive Analytics

There’s a tremendous opportunity for growth and innovation in healthcare predictive analytics—to build on these classic predictive approaches to develop more specific models— motivated by four key things:

  1. Limitations on index models: General models, based on national data sets, don’t necessarily apply to specific health systems and situations.
  2. Data availability: Health systems have increasingly more detailed data.
  3. More pervasive analytics: Health systems are using analytics more frequently to improve outcomes.
  4. Better machine learning tools: We have access to many open source tools (including R and python libraries) and high-quality online learning.

Three Recommendations for Making Predictive Analytics Routine, Actionable, and Pervasive

The goal is to make predictive analytics and machine learning in healthcare routine, actionable, and pervasive. Currently, however, most healthcare organizations approach predictive analytics as a data silo. Data scientists typically pull data into their favorite data analysis tool (such as R or python) and perform a variety of data manipulations in addition to running the machine learning algorithms used to develop predictive models. While data analysis tools like R and python are required for data analysis and predictive analytics, they can make it difficult to operationalize certain aspects of the predictive analytics workflow—especially data manipulation. This is where the three key recommendations for scaling predictive analytics come into play:

Recommendation #1: Fully Leverage Your Analytics Environment

Before we discuss the analytics environment in detail, it’s helpful to introduce a bit of machine learning jargon. A feature is an individual measurable property of a phenomenon being observed. You can also call it an input parameter—or an input variable—to a predictive model. In a healthcare data warehouse, features that you might look at with your machine learning algorithm include clinical registries, comorbidity models, and outcomes you want to predict (e.g., readmissions or length of stay).

Your data scientists, however, need to be able to do more than just read this data. They will need to create their own features. The key to leveraging your analytics environment most efficiently is creating new features in a production analytics system, such as a data warehouse—not as one-time manipulations in an analysis tool such as R or python.

For example, the Health Catalyst® predictive model for complications for diabetic patients uses a polypharmacy feature—the number of medications a patient is receiving at a given point in time. To use polypharmacy in the predictive model, the data scientist must get medication data from the data warehouse. In the warehouse format, however, the data is difficult to apply to a predictive model—it has multiple medication records per patient, each with their own start and end dates. Further, medication dates may be missing for a variety of reasons. This data must be aggregated into a simple polypharmacy number calculated for any given point in time. The data scientist needs to transform “raw” medications data into a simple format that a machine learning algorithm can use.

The above process is known as feature engineering. Jason Brownlee at Machine Learning Mastery describes it as, “the process of transforming raw data into features that better represent the underlying problem to the predictive models, resulting in improved model accuracy.” For improved model accuracy, you need data that relates to what you want to predict and you need that data in a format your machine learning algorithm can use (hence the feature engineering described in the polypharmacy example). Data scientists must be able to engineer features for predictive analytics to be successful. A data warehouse environment is the ideal feature engineering platform because, to build and update the warehouse, data transformation is easily deployed and operationalized as part of a regular data processing routine.

Recommendation #2: Standardize Tools and Methods Using Production Quality Code

The two main roles in your predictive analytics team are data scientists and machine learning engineers. The data scientist formulates hypotheses about the data, determines what might drive the prediction in question, and tests their model’s predictive capability. A machine learning engineer (a relatively new role) blends the skills of a data scientist and software engineer to develop software based on expertise in both data science and software engineering. The machine learning engineer’s software needs to be reliable and reusable (standardized) for a production environment.

The Predictive Analytics Process

A data scientist’s role is to create the predictive model. They start with many features (between 30 and 50) that they believe are predictive of a particular outcome. Input features come from literature, best practice, and clinician input. The data scientist then uses algorithms—such as linear regression models, random forest models, or neural networks—to discover patterns in the features and how they related to the outcome being predicted. The goal is to find the right combination of input parameters with an algorithm that gives the best result. It’s an iterative process, and the data scientists typically needs to run multiple algorithms on various combinations before landing on the ideal model. The machine learning engineer’s job is to automate this workflow.

Developing a Machine Learning Code Base for Production Quality Code

Developing a machine learning code base is an important part of the predictive analytics process because it keeps the data scientist focused on the true utility of the predictive model. With a machine learning code base, you’ll standardize your methodologies and ensure that you’re deploying best practices. Furthermore, any predictive models that are in production must have production quality code.

Standardizing methodologies also helps you better measure how well your model is working by ensuring that your data scientists are using the same methods to judge model performance.

There are four key techniques for developing production quality code:

  1. Version control: A long-standing technique among software developers, version control refers to keeping a software system, which consists of many versions and configurations, well organized. This allows multiple developers to collaborate on a single code base, establish stable versions), and continue to work on the code base without adversely affecting stable versions.
  2. Unit testing: In software development, unit testing allows you to individually test the smallest testable parts of an application (units). A code base has multiple functions, so you need to be able to test them all independently and make sure they’re all still working if you add new functionality. It’s well understood in the software development world that untested code is essentially broken code.
  3. Documentation: Carefully document your code so that developers can accurately find what they need.
  4. Continuous integration: Continually merge developer working copies into a shared mainline—the base of a project on which development progresses. This helps to ensure stable, robust production code.

Two Technologies for Developing a Machine Learning Code Base

Health systems have ample choices for developing a machine learning code base, including (but not limited to) two key technologies:

  • R: open source, deeply entrenched in healthcare, and more familiar to analysts.
  • Python: open source, new approach with lots of momentum, and more familiar to developers.

Scaling People: Another Call for a Robust Code Base

Many organizations are daunted by the idea of hiring large teams of data scientists for machine learning and predictive analytics. But a strong code base that’s easy to use can help you effectively use the people you have. For example, data architects and analysts have a good base knowledge of healthcare data and given the right tools and training, they can become effective feature engineers and generate machine learning models.

Recommendation #3: Deploy with a Strategy for Intervention

A primary goal—and the most important of the three recommendations for deploying predictive analytics— is deploying your model with a strategy for intervention. In other words, keep in mind how your model will be used, who will use it, and how it will drive patient care.

Case Study: One Health System’s Successful Strategy for Intervention

The following use case—a 30-day all-cause readmission risk model for the Chronic Obstructive Pulmonary Disease (COPD) population—shows a successful strategy for prioritizing and customizing interventions. A health system with a project focused on improving care around COPD wanted to add a predictive risk model to their process. Health Catalyst helped the organization develop a model based on 20 features. The model not only determines how many patients in a unit are at risk for readmission and need intervention (and who they are), but also identifies factors that are driving their risk.

The health system is using the predictive tool to help prioritize the daily rounds of pulmonary navigators who coordinate interventions for the COPD population. As a result, the health system anticipates an increase in intervention effectiveness and improved outcomes while leveraging the same number of resources. Provider teams will be able to have conversations around a common understanding of risk and associated risk factors to quickly determine which interventions are most appropriate—rather than trying to gather additional data to determine who needs which interventions.

An important part of the success of this model is ongoing collaboration with the goal of creating something that everyone, from developers to end users (doctors, nurses, etc.), can understand and rely on. Successful deployment depends on the adoption and use of the model.

The Future of Machine Learning and Predictive Analytics in Healthcare

The next meaningful step for predictive analytics in healthcare is to host these models within EHRs so that predictive data shows up in clinical workflows—also known as closed loop architecture. Current EHRs support proprietary interoperability frameworks and can be configured to show predictive algorithms within clinical workflows. New technology such as SMART on FHIR (pronounced “fire”) will support systemwide interoperability so that medical applications (such as predictive analytics) can integrate into different EHR systems at the point of care. The goal of this integration is faster identification of high-risk patients and faster intervention.

Technology like SMART on FHIR will also add important value to predictive analytics by allowing for an application program interface (API) that can expose data from the data warehouse that the EHR might not have but that might add prediction value (see the right-hand side of the graphic above). This includes socioeconomic, claims, environmental, and genomic data—each potentially very important in predicting outcomes.

Additional Reading

Would you like to learn more about this topic? Here are some articles we suggest:

  1. There Is A 90% Probability That Your Son Is Pregnant: Predicting the Future of Predictive Analytics in Healthcare
  2. Three Approaches to Predictive Analytics in Healthcare
  3. 4 Essential Lessons for Adopting Predictive Analytics in Healthcare
  4. Patient Flight Path Analytics: From Airline Operations to Healthcare Outcomes
  5. In Healthcare Predictive Analytics, Big Data Is Sometimes a Big Mess
Loading next article...