An Inside Look at Building Machine Learning for Healthcare

As the integration of machine learning in healthcare builds momentum, the arguments about what constitutes machine learning and how it is defined are beginning to take shape. But the salient question centers on where the actual learning comes in. Where in the analytics workflow does learning occur? To establish the validity of whether a process invokes actual learning, let’s break down the components of a machine learning system.

Creating a Machine Learning Model

The idea behind machine learning in healthcare is that the machine—the computer and statistical software—learns from historical data. The mechanism of understanding, the cerebrum of machine learning, is an algorithm embedded in the software that learns from the historical values associated with specific input features or variables. It then applies that learning on new data in the form of predicting an outcome.

Some pragmatic statistical algorithms, like random forest or logistic regression, leverage the large amount of data that now exists in healthcare. At a basic level, the algorithm learns the relationship between inputs and subsequent results of a particular process. By applying unique healthcare features and client-specific data, a true predictive model is enabled.

To better understand how the learning part of machine learning works, consider the process for developing a predictive model.

Model building begins with identifying pieces of information, i.e., explanatory variables, that the model can use to determine if someone is likely to experience a certain outcome. For example, when predicting some type of outcome, age is often an input variable. In machine learning, these explanatory or predictive variables are referred to as features, and applying domain knowledge to create useful features is known as feature engineering. Either historical or current healthcare data is then pulled for any given patient from a data source like a Subject Area Mart (SAM) via the enterprise data warehouse (EDW). This can be clinical data from the EMR, lab results, financial data, claims data, or other data from multiple external sources.

Then healthcare data in the form of features are combined with an algorithm to create a machine learning model. The algorithm does the work by learning how variables associated with some outcome, and the outcome itself, are related. It learns this from the historical data that’s gathered at a health system. The algorithm not only identifies and maps the relationships between features and outcomes, but it also helps identify which features are truly predictive. For example, the algorithm might learn that age has a positive correlation with a specific outcome, blood pressure has a negative correlation, and gender has no significant correlation. This is called machine learning because the algorithm learned the relationships from historical data—rather than being explicitly programmed. The matching of the algorithm and data in this way leads to a custom model, specific to that dataset or health system.

The healthcare machine learning model then yields predictions about a specific healthcare outcome. During regular use, the model produces patient-specific predictions at whatever frequency is appropriate. If the model is designed to predict which hospital inpatients are most likely to be readmitted, a daily prediction might suffice, while using machine learning to supplement an early warning system for sepsis might require more real-time predictions. Regardless of the prediction frequency, the model continues to look for changes in data values and updates their outcome predictions accordingly. This same updating could be applied to a universe of variables that are constantly changing because of the patient’s changing condition or an intervention.

Models Can Be Regularly Retrained

Once an outcome occurs (e.g., a Chronic Obstructive Pulmonary Disease (COPD) patient is readmitted within 30 days or goes 30 days without being readmitted), that data point is fed back into the SAM. Ultimately, the model is adjusted. The outcomes of whether a person was readmitted or not ultimately become data that teach the model.

Figure 1: The Health Catalyst Machine Learning Outcome Loop.

After some time—say a month, quarter, or year—enough outcomes data becomes available and the model is retrained. Then the algorithm looks at all the historical data again, as well as the recently added data, to see if it still thinks the way it did before. If it doesn’t think the variables are interacting the same way, then the model is tuned to more accurately predict based on what it learned over time.

Changes may not be evident in the data from one day to the next if the historical dataset covers thousands of patients, or if the daily patient volume is low. The outcomes of a hundred patients will be diluted by a dataset of tens of thousands of patients, for example. The model may change, but it is subtle. A significant change might happen over a quarter or a year.

Also, a health system may want to avoid straining its computer resources, so the model can be retrained less frequently—especially if a significant change isn’t anticipated with more frequency.

A model is usually retrained automatically, but it can also be manually checked to see if additional features would help in its training. Perhaps a certain variable or feature may not have been helpful to the model last year, but the patient population has changed or the data for that feature has improved. For example, a hospital system might have only recently begun collecting data about a specific intervention that might have a direct impact on outcomes. Now the model can check to see if it would benefit from adding that feature. 

Figure 2: The possibilities of machine learning in healthcare.

Sidebar: Imagine a future where real-time data flows are pervasive and routine.  A message flows real-time into the data environment and is then leveraged by a machine learning-enabled analytics engine that has access to the full breadth of healthcare data. That message is then enriched with additional information, perhaps a risk score around readmission, sepsis, or potential treatment plans. The message, now enriched with additional analytic value, can provide insight into what action needs to be taken to help improve the outcome for the patient and provider. It continues to flow through the data system, not to a report or dashboard that requires someone to review at some later date, but to the workflow system where the point of decision is made.

The 3 Most Important Components of Machine Learning: Data, Data, and More Data

Without patient outcomes data, algorithms cannot be trained to identify patterns in treatment that lead to positive outcomes for specific patient types. This underscores the importance of collecting healthcare data. The industry has had to infer outcomes from proxies, such as readmission rates. Without outcomes data that includes quality and length of life information, clinicians can only guess about the effectiveness of treatments. Pattern recognition algorithms cannot be trained. Indeed, Patient Reported Outcomes (PROs) are generally a missing piece of data in healthcare. Routine collection of PROs will greatly enhance our ability to predict outcomes that matter to patients.

Strength in machine learning and artificial intelligence comes from the granularity, quality, and breadth of data that is collected, cataloged, and bound. Bindings and data quality management serve as pre-processors to machine learning and give algorithms a human-aided jump start.

In the past year, machine learning algorithms and models have become more commoditized, but without data, they are useless. Data content has become the most valuable asset in machine learning.

Why Completing the Feedback Loop Is Important

The machine learning lifecycle isn’t complete without looping the actionable information back to those who need it to manage care. In his article on machine learning and care management, Dr. John Haughom wrote, “to best gauge efficacy and value [of machine learning], both the predictor and the intervention must be integrated within the same system and workflow routinely used by clinicians as they deliver care.” In the past, this has been possible if the workflow includes an analytic dashboard such as a Qlikview or Tableau application. Ideally, the closed-loop analytics structure includes this, where more of the algorithm content is available within the EHR.

Machine Learning in Action

Predictive models for COPD readmission risk are integrated into the workflow at Health Catalyst partner sites. One site was using LACE scores, which are typically available only at discharge. To have more timely and more accurate insight, the site wanted a custom risk model that would predict 30-day readmission risk for the COPD population.

With this model, a pulmonary navigator or a member of the nursing staff consults a daily worklist that is updated every morning for high-risk and moderately high-risk patients, so they can prioritize their day (who to see first, what are appropriate interventions). A patient might be at end stage COPD and considered high risk for readmission, so the focus is on arranging palliative care or suitable home care. Another patient who is moderate high risk might benefit more from efforts to reduce readmission risk, so the focus is on medication management, patient education, and arranging continuing care referrals.

Essentially, a predictive model like this allows a care team to prioritize within a high COPD-patient census at any given time.

Health Catalyst predictive models exist for other classifications, including but not limited to:

  • Central Line-Associated Bloodstream Infection Future Risk (CLABSI)
  • Congestive Heart Failure Readmissions Risk
  • Respiratory (COPD, Asthma, Pneumonia, and Respiratory Failure) Readmission Risk
  • Predictive Appointment No-shows
  • Pre-surgical Risk (Bowel)
  • Propensity to Pay
  • Sepsis Risk in the ED
  • Length of Stay
  • Physician Variation in Radiology Use
  • Many additional projects in development or planned

The Machine Learning Curve

Machine learning technology is accelerating at a rate beyond Moore’s Law. Machine learning algorithms and models are doubling in capability every six months. Algorithms in healthcare are generating enough interest, and sometimes controversy, that some are calling for FDA involvement to govern their responsible use. All this attention lends weight to the importance of properly defining machine learning to proliferate authentic technologies that further the work of delivering better care more efficiently in a rapidly changing healthcare environment.

Additional Reading

Would you like to learn more about this topic? Here are some articles we suggest:

  1. Health Catalyst Introduces Machine Learning in Healthcare Is Now for Everyone
  2. How Healthcare AI Makes Machine Learning Accessible to Everyone in Healthcare
  3. The Real-World Benefits of Machine Learning in Healthcare
  4. How Healthcare Machine Learning Is Improving Care Management: Ruth’s Story
  5. The Top Three Recommendations for Successfully Deploying Predictive Analytics in Healthcare
Loading next article...