IU Health Risk Model for CLABSI Shows Great Potential



clabsi-predictive-risk-model “Healthcare has been rapidly changing the last few years and we need to be able to adjust our work to keep up. To date, our improvement work has been focused on analysis of gaps after the infection had already occurred. We have built this tool to identify and analyze the variables and measures that affect patient outcomes in real time, allowing us to act on those before the patient gets an infection.”

– Kristen Kelley MPH, BS


Central line-associated bloodstream infections (CLABSIs) are serious and sometimes fatal. Per the Centers for Disease Control and Prevention (CDC), about one in 20 patients get an infection while receiving medical care. Approximately 41,000 patients in the hospital and 37,000 patients receiving hemodialysis that have central lines end up with a blood stream infection. Nationally, one in four patients with a CLABSI die.

Indiana University Health (IU Health) is a 17-hospital system considered one of the busiest in the U.S. with 130,000 patient admissions annually. A nationally recognized healthcare system, IU Health Academic Center recognized the importance of reducing healthcare associated infections (HAI), and embarked on a mission to achieve zero central line-associated blood stream infections.

This mission was intermittently challenged by the difficulty inherent in identifying and tracking all patients with central lines and the multitude of risk factors that each patient has that predisposes them to patient harm or not. Too much time was spent attempting to collate large amounts of data and efforts were largely focused on data reporting rather than comprehensive analysis and prevention efforts. It is well known that the speed with which patients can be identified, and ensuring that preventive interventions are performed reliably, reduces the risk for infection. IU Health realized that the ability to analyze process measure trends in real time, and identification of patients at high risk of CLABSI, could provide a pathway to deliver additional scrutiny and quickly intervene with these patients, ultimately reducing the rate of CLABSI.

Using retrospective data accessible through their analytics platform, IU Health developed a predictive model to identify the leading measures and variables that were affecting CLABSI outcomes, pinpointing which patients were most at risk, moving toward its goal of zero CLABSI.

Ultimately, this led to the development of a CLABSI risk prediction model that predicts which patients with a central line will develop a CLABSI with an estimated 87 percent accuracy and a false positive rate of 0.16. Using education and focused interventions IU Health decreased the CLABSI rate by 20 percent over 6 months.


Indiana University Health is a 17-hospital system considered one of the busiest in the U.S. with 130,000 patient admissions annually. It is nationally ranked in multiple specialties on the list of “America’s Best Hospitals” by U.S. News & World Report, which indicates its commitment to quality and patient safety. IU Health leaders are dedicated to improving the efficiency, quality, and cost-effectiveness of care, including continuous improvement of its rate of healthcare associated infections (HAIs).

Central line-associated blood stream infections, or CLABSIs, are a type of HAI. A central line is an intravascular catheter that terminates at or close to the heart, or in one of the great vessels, which is used for infusion, withdrawal of blood, or hemodynamic monitoring.1 When not placed correctly or kept clean, central lines can cause serious bloodstream infections that can be deadly. CLABSIs are most prevalent in patients in the ICU and patients receiving dialysis.

Like many organizations, IU Health uses the National Healthcare Safety Network (NHSN) risk adjusted standardized infection ratio to monitor CLABSI performance. IU Health has invested significant resources to determine the factors associated with developing CLABSIs and the preventive interventions that can be performed to reduce the risk of CLABSIs. IU Health recognized the need for striving for high compliance with infection prevention guidelines and set out to develop a tool that would allow it to identify when best practice guidelines were not being met and to predict which patients were most at risk for developing a CLABSI.


CLABSIs are serious and sometimes fatal. Per the Centers for Disease Control and Prevention (CDC), about one in 20 patients get an infection while receiving medical care. Approximately 41,000 patients in the hospital and 37,000 patients receiving hemodialysis that have central lines end up with a blood stream infection. One in four patients with a CLABSI will die.2

The CDC reported that there were 58 percent fewer bloodstream infections in hospital ICU patients with central lines in 2009 compared to 2001. This translated to about 18,000 patients in 2009. Despite this decrease, CLABSI still result in thousands of deaths each year, and add millions of dollars in potentially avoidable healthcare costs.3

Experts have developed evidence-informed national guidelines for preventing CLABSI, yet the various factors that increase the risk of CLABSI, and the relative weight of each of those factors on the overall risk for infection, are not well understood.


While evidence based preventive actions have demonstrated benefits in reducing CLABSI, ensuring that these interventions have been completed, and evaluating the effectiveness of prevention activities within an organization can be challenging. Historically, ICUs, where the largest number of critically ill patients with central lines receive care, have been the focus of CLABSI improvement initiatives. Today, though, patients with central lines are in inpatient units throughout the hospital, as well as in outpatient facilities. Identifying and tracking these patients can be difficult and all it takes is one lapse in a key evidence based intervention to introduce bacteria that can cause an infection. Confirming that CLABSI prevention activities have been provided to every single patient every single time can be even more difficult.

Staff time, patient priorities, fluctuating patient volume, and patient flow all create barriers to providing prevention activities in a consistent and timely way to every patient with a central line. Additional factors such as varying levels of understanding among members of the care team on the importance of every intervention, staff turnover, and the presence or absence of supportive educational materials and decision support tools may further impede full compliance with CLABSI prevention activities. Moreover, competing priorities and time constraints sometimes limit the ability of leaders to provide meaningful feedback to all care team members regarding their individual performance.

Thankfully, not every patient that has a central line develops an infection. The therapeutic benefit of having a central line usually outweighs the risk of developing an infection. But, with the consequences of developing an infection being so dire, IU Health wanted to ensure that the therapeutic benefit of central lines was maximized, while minimizing the risk of infection.


To minimize the risk of infection, an organization must first have a reliable way to identify patients with central lines and a visible way to monitor the care that those patients receive. To support its goal of decreasing HAIs, and to ensure accurate surveillance and reporting, IU Health implemented the Health Catalyst Analytics Platform built using the Late-Binding™ Data Warehouse architecture (EDW), the CLABSI Prevention Advanced Application, and the Instant Data Entry Application (IDEA).

The CLABSI application enables users at IU Health to efficiently find, review, and document CLABSI cases to support NHSN reporting and to review outcomes and trends of CLABSI across its institutions and health systems. It is also used to facilitate compliance with practices shown to decrease CLABSI, such as best practices for insertion, device utilization, and maintenance bundle uses.

Many key details regarding the circumstances of a CLABSI event are not found in the medical record. The IDEA app is used to bridge these gaps in the EHR data. For each CLABSI, infection preventionists, clinical nurse specialists, and frontline staff review the case in detail. To gain a complete picture of the patient’s overall risk and factors contributing to the infection, IU Health uses the IDEA CLABSI form to gather additional data, including the date when the infection was first identified, insertion and maintenance data, and information regarding potential gaps in care that are not readily identifiable (or even collected) in the medical record (see Figure 1). This allows IU Health to complete a comprehensive review of each CLABSI case, and enables review of trends in support of their effort of ensuring that each patient receives the standard level of evidenced base preventive care.


Figure 1. IDEA application: CLABSI huddle form

The CLABSI applications have helped to improve compliance with best practices and to decrease the incidence of CLABSI by making compliance data visible. However, IU Health believes that retrospectively reviewing cases long after they occur to identify gaps in care is not the best way to identify future interventions necessary to prevent a similar case from occurring again in the future. Gaps identified in one case do not necessarily predict gaps in future cases. The sense was that too much of the prevention effort circled around retrospective review and not around proactive, preventive steps in the here and now. IU Health identified that the ability to predict which patients are most likely to develop a CLABSI could provide clinicians the opportunity to intervene and proactively ensure that the patients at highest risk of infection receive all the CLABSI prevention activities, 100 percent of the time.


IU Health wondered, for patients with a central line, what is their risk of CLABSI over the entire hospital encounter. Could the information in the EDW be used to predict which patients are at highest risk for CLABSI or identify when the risk factors occur that put a patient at risk for harm? As humans, we are well aware of the risk in our surroundings, so why couldn’t that same situational understanding be applied here? Could we reliably predict which patients would develop an infection? If a predictive risk model was developed, could it be effectively integrated into the CLABSI application? And, could it update dynamically, in real time, as the patient’s risk factors changed?

IU Health and Health Catalyst embarked on a partnership to develop and integrate predictive analytics into the CLABSI application, developing a risk model to predict which patients with a central line are at the highest risk of CLABSI over the entire encounter. To begin the development of an accurate CLABSI predictive risk model, a work team comprised of clinical experts—including physicians and infection preventionists, analysts, a statistician, the decision support team, and team members from Health Catalyst—embarked on an iterative, collaborative process to create such a model. The team first identified the variables thought to contribute to increased risk of CLABSI, and the source data that could be used in the risk model.

Clinical experts conducted a thorough analysis of research and literature about CLABSI, identifying intrinsic and extrinsic CLABSI risk factors, and identifying patient populations that are the most susceptible to developing a CLABSI.

  • Intrinsic risk factors include the non-modifiable characteristics of the patients like age, gender, and underlying diseases or conditions.
  • Extrinsic factors include potentially modifiable factors associated with central line insertion or maintenance, such as prolonged hospitalization, or multiple central lines.
  • Patients more susceptible to infection include children and neonates, males, and patients with hematological and immunological deficiencies, cardiovascular disease, and gastrointestinal diseases.

Initially, the work team used the literature and team member clinical expertise to identify more than 40 variables thought to contribute to increased risk for CLABSI. The team narrowed its focus, selecting 23 variables for which there was a reliable data source in the EHR.

Using retrospective data from its own patient population, two sample t-tests, (a statistical test to see if two variables are significantly different from each other), of proposed input variables were used against the CLABSI result data to determine which variables should be included in the final model. Sixteen input variables accounted for the most significant impact on CLABSI prediction (see Figure 2). Additional input variables beyond the sixteen did not improve the model accuracy.


Figure 2. Sixteen variables included in the predictive model

The initial risk model was relatively simple, using the total count of risk factors to identify those at higher risk. Intuitively, the more risk factors an individual has, the greater her or his overall risk. While this makes basic sense, the work team understood that it is the combination and interaction of risk factors that ultimately determines overall risk. With this understanding, the work team sought to leverage machine learning to build a model based on retrospective data in the EDW. Preliminary models on the data were developed by Health Catalyst using machine language algorithms, logistic regression and random forest. The logistic regression algorithm can be thought of as a regression model where the dependent (or target) variable is categorical. It is a special case of the generalized linear model. The random forest algorithm is an ensemble method, where 2,000 decision trees were used, with only four features (i.e., columns) used in each decision tree. This setup was used to train models on the previously identified 16 variables on 70,218 retrospective CLABSI cases. Once the models were developed, the Receiver Operating Characteristic (ROC) curve, (a plot of the true positive rate against the false positive rate), was used to identify the best performing model (see Figure 3).


Figure 3. Area under the curve graph used to select the best performing model

Note that the area under the ROC (AU_ROC) provides a score which indicates overall model performance.4 Classification systems that cannot distinguish between two groups have an area under the curve (AUC) equal to 0.5. An AUC of 0.80-0.90 is considered good performance. The higher the AU_ ROC score the better the model is performing. The score was 0.79 for the logistic model and 0.87 for the random forest model, which is why the random forest model was ultimately chosen for deployment. This ROC plot shows the potential true positive rates that correspond with false positive rates, such that IU Health could make the deliberate decision of whether they’d prefer more false positives versus false negatives. Considering the mortality risk of patients with CLABSI, the IU Health team decided to err on the side of having more false positives and the threshold between a “Yes case” versus a “No case” was adjusted accordingly in the application.

During the development process, the Gini importance criterion in the random forest algorithm was used to narrow the feature set down from 23 to 16 columns, with the reduced set being used in the deployed model at IU Health.


The CLABSI predictive risk model ultimately chosen for implementation by the team at IU Health predicts which patients with a central line will develop a CLABSI with an estimated 87 percent accuracy.

  • CLABSI risk model AU_ROC performance is 0.871.
    • AUC is a measure of quality for classification models, measuring the ability of the model to correctly predict an event or outcome. A score of .871 for the CLABSI risk model indicates that the model is good at separating patients who will develop CLABSI from those patients that will not.
  • The CLABSI predictive risk model’s true positive rate = 0.81 and the CLABSI predictive risk model’s false positive rate = 0.16.
    • Of the patients that the CLABSI predictive risk model variables identify as high risk, it is estimated that about 16 percent of them will not subsequently develop a CLABSI.

The CLABSI predictive risk model data is accessible in an easy-to-use visualization that has filters for hospital/unit of interest, time period of interest, and patient probability of infection, high, medium, or low (see Figure 4). The top three risk factors for each patient are displayed, providing immediate insight into the patient specific risk factors.


Figure 4. CLABSI risk model probability visualization

Informed by their risk factor analysis and using education and focused interventions with staff caring for patients with central lines, IU Health decreased the CLABSI rate by 20 percent over 6 months.


Having developed a risk model that predicts, with high accuracy, which patients with a central line will be the most likely to develop a CLABSI, data architects are integrating the risk scores into the CLABSI application. While the risk model was developed specifically for CLABSI, the infrastructure used to develop the model is now available for other patient populations, setting the stage for the development of additional predictive risk models.

The team at IU Health is planning a pilot on two test units. The pilot will focus on integrating the CLABSI risk data into the nursing workflow, with nursing staff using the data to focus CLABSI preventive interventions on the patients with the highest risk scores first. During the pilot, clinicians will use the CLABSI application to identify the patients with a central line who are at highest risk for developing a CLABSI. Clinicians will then evaluate if the central line is an absolute necessity, and if not, discontinue the line. For those patients for whom the central line cannot be discontinued, clinicians will ensure the patient receives all of the CLABSI prevention activities. IU Health believes that by leveraging the CLABSI risk score in this way they should be able to drive its CLABSI rate down even farther.

IU Health plans to evaluate the impact of using the CLABSI predictive risk model on the overall rate of infection. It also plans to share its data and CLABSI predictive risk model through presentation and publication.


1, 2, 3. Center for Disease Control and Prevention. (2011). Central line-associated bloodstream infection (CLABSI). Retrieved from https://www.cdc.gov/hai/bsi/bsi.html

4.          Tape, T. G. (n.d.). Interpreting Diagnostic Tests. Retrieved from http://gim.unmc.edu/dxtests/roc3.htm


Health Catalyst is a mission-driven data warehousing, analytics, and outcomes improvement company that helps healthcare organizations of all sizes perform the clinical, financial, and operational reporting and analysis needed for population health and accountable care. Our proven enterprise data warehouse (EDW) and analytics platform helps improve quality, add efficiency and lower costs in support of more than 50 million patients for organizations ranging from the largest US health system to forward-thinking physician practices.

For more information, visit www.healthcatalyst.com, and follow us on Twitter, LinkedIn, and Facebook.


Loading next article...