Text Analytics in Healthcare—Two Promising Frameworks that Meet Its Unique Demands

We live in a world where analytics is applied to text to enhance our daily experiences in amazing ways, whether we are searching in Google or looking for recommendations in Yelp.

But in healthcare delivery today, fewer than five percent of health systems are using—in a truly significant way—the vast trove of clinical information found only in the free text fields like physician notes, nurse documentation, radiology reports, and pathology reports.

Part of the challenge is that healthcare requires a degree of precision far beyond what’s acceptable for a Google search or Yelp recommendation.

When it Comes to Text Analytics, Healthcare Is Different

Let’s take a hypothetical example in which a health system is analyzing whether to participate in a downside risk contract with an insurance carrier. As part of this effort, we want to improve the diabetes registry–specifically, finding patients who have mentions of diabetes in their notes, but no ICD-10 coded diagnosis in their record.

Here’s a sampling of diabetes references we might find by using a simple text search on clinical notes:

It’s immediately obvious that our text search has failed us—there are a number of false positives, in which the diabetes reference isn’t about the patient. We are also missing patients who have something like “NIDDM” (noninsulin-dependent diabetes mellitus) mentioned in their notes.

Promising Frameworks for Clinical Awareness

Fortunately, there is a strong open-source community of medical and technology professionals who have built great frameworks for not only searching text, but also sorting out clinical context (e.g., when the mention of diabetes is family history or the patient doesn’t have diabetes). There are two prominent examples of promising frameworks designed to meet healthcare’s unique clinical text analytics needs:


A number of clinical natural language processing (NLP) products rely on an algorithm called ConText, developed by Dr. Wendy Chapman and her team when she was at the University of Pittsburg. In short, ConText solves for the following:

  • Negation (e.g. “patient has no diabetes”)
  • Experiencer (e.g. “patient’s mother is diabetic”)
  • Temporality (distinguish between a current condition and history of a condition)

Dr. Chapman, now Chair of the Department of Biomedical Informatics at the University of Utah, School of Medicine, continues her work with ConText. You can download the University’s implementation of the ConText algorithm on GitHub. ConText is used in a number of commercial clinical NLP solutions.


Another interesting clinical context algorithm is cTAKES, an open source project started by a team of physicians and technologists at the Mayo Clinic in 2006. cTAKES includes the functionality of ConText, and adds the ability to identify the type of clinical term (e.g., whether something like “diabetes” is a symptom, procedure, diagnosis, medication). cTAKES uses the UMLS classification to achieve this. Like ConText, you can download a copy from the cTAKES website. cTAKES is an excellent research tool for clinical NLP, though it has not been widely adopted for general use of clinical text analytics.

Context and Specificity Address Healthcare’s Unique Demands

Powerful and easy-to-use text analytics solutions, like Google, are in widespread use in other industries. We have unique challenges within the healthcare space—for example, going beyond simply finding matching text to understanding the context correctly to achieve the precision required in patient care. The good news is that solid frameworks—like ConText and cTAKES—exist to address these challenges.

We need to see two things happen next. First, the major healthcare analytics companies must integrate these frameworks into their solutions. Then, health systems will need to run these text analytics solutions through their paces in order to truly illuminate how text can be used to improve healthcare outcomes.

I hope this helps provide a better understanding of some of the challenges—and solutions—within clinical text analytics.

Loading next article...