Precise Patient Registries: The Foundation for Clinical Research & Population Health Management

Precise Patient Registries: The Foundation for Clinical Research & Population Health Management

[Dale Sanders]

Today, I’d like to open by stating a few assertions and criticisms of the current state. We’ll follow that with a discussion of the history and the definition of a patient registry, what is it all about and that registry definition has changed over time, what should we be doing differently and how do we design precise patient registries. I’ll walk you through an example for my registry work when I was at Northwestern University. And then we’ll dive into the nitty-gritty of data details of a few sample registries to give you an idea of kind of data that feeds into these.

Acknowledgements & Thanks

So, I want to acknowledge and thank a handful of folks. Steve Barlow and I worked very closely together at Intermountain Healthcare on the development of the early registries there and we bought learned a lot in that process. That was back in 1998. Cessily Johnson is our vocabulary specialist. We also worked together at Intermountain and now she works with us at Health Catalyst. Darren Kaiser and I worked together at Northwestern and we took what I’ve learned at Intermountain and improved upon it and grew out the registry program there at Northwestern. Anita Parisot helped me with these slides today. Tracy Vayo was also a good friend from Intermountain and helped me with some of the clinical content. And as usual, there are many other folks whose shoulders I stand upon. So, this knowledge is nothing but a reflection of what they have done to help me. So thank you to all of them and all of you.

Poll Question

Have you ever been directly involved in the design and development of a patient registry?

So I’d like to open up with a poll question just to get a feel for the demographics of the crowd and whether you’ve been involved in the design and development of a patient registry or not. So Tyler, if you could pop that up, that would be great, friend.

[Tyler Morgan]

Alright. We have that poll question up and the poll question is have you ever been directly involved in the design and development of a patient registry? Yes or no?

And while we’ve got everyone filling that up, I’d like to remind everyone that, yes, we will be sending out copies of the slides when the webinar is done. We’ve had a couple questions about that. And also, that if you got any additional questions you would like to ask to go ahead and type those into the questions pane on your control panel.

We’ll go ahead and close that poll and let’s share the results. Well Dale, it looks like we have 40% say yes and 60% no.

[Dale Sanders]

Well, that’s awesome. That’s higher than I thought, so that’s great and I hope from those of you that have been involved, feel free to submit questions and your feedback would be great to share with the rest of the audience. Thank you.

Assertion #1

Okay. So assertion #1 on this topic, I would argue that without precise definitions and registries of patient types, you really can’t have precise clinical research, you can’t have precise comparisons across the industry but it’s especially risky when you’re taking up financial risk in ACOs and capitated reimbursement programs. You certainly can’t have precise and personalized healthcare and you can’t have predicable clinical outcomes. And so this I believe is one of the most fundamental things that we need to improve in healthcare. It’s very foundational – are the standard definitions and the precise content of these registries because everything that we do is built upon these, or should be in a way.

Assertion #2

Assertion #2 is we can’t keep building disease registries at each organization from scratch. It’s too expensive, it takes too long to do that, and it doesn’t support standardized disease reporting, surveillance and comparative medicine. I am grateful that there has been some federal involvement in the last few years as a consequence of the affordable care act but the projects I think are moving too slowly. And so I think it’s a great opportunity for us to do something that is more collaborative with the federal government, and initially it’s probably less consensus space than what I’m seeing there. I think we’re starving better in the pursuit of perfection in the federal programs right now.

Healthcare Analytics Adoption Model

Some of you have probably seen this slide before. This is the Healthcare Analytics Adoption model that we advocate and you can see that we purposely placed this at a very foundational level, level 2, of the Analytics Adoption Model. So at level 1, when you’re establishing the basic infrastructure of the data warehouse and the content of the data warehouse, the very next most important step is to organize that core content around standard vocabulary and patient registries. And then virtually everything else above that in these higher levels will leverage either the strength or the weakness of those patient registries, depending on how precise they are. So it’s very important for us and we try to practice what we preach in Health Catalyst in this regard.

Achieving “High Resolution” Medicine

I’ve been using this term as a metaphor. “High resolution” medicine I think is impossible without these precise registries. So right now we have this very odd and sort of pixelated view of what a disease is or what a patient type is. But it’s not rocket science to get to a more clear picture and understanding. And so, this pursuit of “high resolution” medicine really is foundered upon our ability to precisely define these registries and come together as an industry about those definitions.

Patient Registry Definitions

So let’s talk about a few of the patient registry definitions that are out there in the industry and it starts from the California Healthcare Foundation that do a lot of good work in this area. “Computer Applications, they’re used to capture, manage, and provide information on specific conditions to support organized care management of patients with chronic disease.” I think that’s a reasonable definition but I think it’s too narrow. I think there are lots of other types of registries that are important beyond chronic disease and we’ll talk about those, but, you know, it’s a reasonable definition.

AHRQ’s Patient Registry Definition

AHRQ says that’s “it’s an organized system that uses observational study methods to collect uniform data (clinical and other)”, I like that because it should be more than just clinical, “to evaluate specified outcomes for a population defined by a particular disease, condition, or exposure and that serves one or more predetermined scientific, clinical, or policy purposes.” That’s a pretty good definition, I think. Not bad. And I think “evaluate specifiied outcomes”, I think we could debate whether that’s the only purpose of these, but it’s, again, a pretty good definition.

AHRQ’s Patient Registry Definition

And then AHRQ has an additional definition from the National Committee on Vital and Health Statistics and this is really about how do we understand at a national and global level, what’s happening with patients on a very consistent basis and how do we disseminate that for policy-making and national strategy. So again, a pretty good definition. And I might add that, you know, all these definitions are emerging in just the last few years.

Patient Registry Definitions

This is the definition that I used when I was on the Medical Informatics back with the Northwestern. “It’s a database designed to store and analyze information about the occurrence and incidence of a particular disease, procedure, event, device, or medication and for which, the inclusion criteria are defined in such a manner that minimizes variability and maximizes precision of inclusion within the cohort.” And that’s my favorite definition. Imagine that.

History of Patient Registries

So history of patient registries, they go back a long way. The first reference that I could find for a patient registry was in 1926. It was the first cancer registry established at Yale-New Haven Hospital. Then there was the first state data cancer registry. Then in 1973, we established the SEER program at the National Cancer Institute. And then by 1993 most states had passed laws requiring cancer registries. The earliest reference that I could see of a healthcare delivery organization using a formal disease registry strategy to improve care was at GroupHealth and they called that a “clinically related information system.”

But, you know, for many years we’ve looked at these registries as a requirement or sort of an outgrowth of external reporting in these states and regional and national reporting requirements around disease states but I would argue and I think most of you, the 40% that have been involved in this, would recognize that these are now becoming very internal registries, probably greater importance to the future of healthcare that I think the external reporting is. But the definition and the use of registries have pivoted in the last few years so that internal registries are at least as important, if not more so, than external registries. So if you’re in healthcare delivery organization, you really do need to have a disease or patient registry strategy of some kind. I think it’s critically important in all aspects, both financial, clinical, and research, that you establish these internal registries about the patients that you treat and that you are at risk for covering.

What’s a Diabetic Patient?

So to give you some idea, some of the struggles that I’ve seen in this revolved around diabetic patient definition and how do we define that. So at Intermountain we started this in 1998 or 1999 and it took us about 18 months to achieve consensus on the definition of a diabetic patient. And when you think about the progressive nature at Intermountain, this speaks I think loudly to the fact that we have such a disconnect around these definitions. If an organization like Intermountain takes that long in 1999, most organizations would probably never even undertake it.

In Northwestern, I took everything that we had done at Intermountain and put that out on the table at Northwestern and it took us only 6 months to achieve consensus because we started from something other than scratch. We borrowed what we did at Intermountain, lots of credibility there, we’d looked at other evidence-based sources that we had emerged in those six years that had transpired, and then of course our local physicians also fingerprinted these definitions as well. But it took a lot less time to define a diabetic patient at Northwestern.

Then I took this to the Cayman Islands, again leveraging everything that had been done at Intermountain and Northwestern and then we brought in British Medical Journal to help as well and it took us only 6 weeks to achieve consensus about the definition.

So each time we reuse these, we stand on the shoulders of those who have gone before us and it’s accelerating these definitions but now we need to do that across the industry. I might point out that under the shared savings program and HEDIS, the good news is there’s nice overlap there. There’s 54 ICDs between the two of those that define what a diabetic patient is and they line up precisely and accurately. Meaningful Use only uses 43 ICDs. So you see a significant difference in definition even at the federal level. It’s a very inconsistent approach to this. We’ve got to pull these together and by the way, we also have to broaden beyond ICDs and we’ll talk a little bit about that in a couple of slides here and how important it is to move beyond just ICDs for these disease registries if you want to become precise.

Sources of “Standard” Registry Definitions

So there is a growing number unfortunately and I guess, or fortunately, depending on how you look at it, of standard registry definition. And there is growing convergence and I think there is growing awareness that we have to pull these definitions together but there’s still a lot of disagreement and variability, and it’s very confusing and it’s very time consuming now as a healthcare delivery organization to decide which of these standard definitions to use and these are all some that you can reference. And an interesting thing, when I was in the Cayman Islands, of course as I had to look at OECD and WHO data at a level I never had before and they have dramatically different definitions of a lot of these disease states. So we’ve got a lot of work to do as professionals in this area.

US National Library of Medicine

I was very encouraged again, as an outgrowth of the Affordable Care Act, NLM launched the Value Set Authority Center and I applaud everyone that’s involved with that. I was involved for a time with the Value Set Authority Center but it was moving a little too slowly for me and my impatience kind of took me away from it. But I still pay attention to it, I still watch it, and I’m still hopeful that the precision and the standardization of the registries will be facilitated by the Value Set Authority Center. I have my doubts, again, because I think their progress has been a little too slow, but nevertheless, I’m hopeful. So I encourage those of you who have an interest in this to keep an eye on that initiative and do what you can to push it along.

You know, what I’ve been basically decided to do is stop participating to a large degree in the Value Set Authority Center and watching that and we decided to pull this initiative completely within Health Catalyst. So, our clinical content team is going through all these important high value disease states and we’re rounding out these definitions. Hopefully, we’ll figure out some way to transition that back into some kind of an open source environment, but we decided to take the matter- into our own hands and it’s moving quite quickly.

Precise Patient Registries Example


This slide is I think a great animation of the challenge that we have with the registries, especially when we rely only on ICD9. So this is real data from a real environment from a real healthcare delivery organization and the numbers represented in those bubbles are the number of patients identified as asthma patients. So using ICD9, we had 29,000 patients, using the problem list we identified 22,000, what we call supplemental ICDs. So not directly related to asthma but you could infer that maybe the patient does have asthma, added another 38,000. And then by looking at the medications and inferring, hmm, this person is on some kind of an inhaler related to asthmatic patients but they may not be diagnosed with an ICD9o r a problem list entry, added another 72,000.

Precise Patient Registries Example


And so when you combine all of these together, you see the effect is pretty significant. So initially a first pass would say we had 29,000 asthma patients. But if you have those additional rules, it’s 101,000 plus the 29,000. And so, that’s the significance of the misalignment in precision if you look at just ICD9 codes. So if you’re a financial risk, if you’re a clinical researcher, if you’re really trying to manage your entire population of asthmatic patients, you can’t rely just on ICD9 codes. You’re going to miss a lot of precision and it’s going to look like that pixelated photograph that I showed earlier. Well we can’t have high resolution medicine without broadening the definition beyond ICD9.


Very interesting coincidence. I think it was yesterday or the day before, an article was published that studied – it’s kind of an interesting study actually. It was the effect of coding of a patient as diabetic on their treatment protocols. So a little bit different and it just came out and this is the paper. You can go out and the slides will be available. I encourage everyone to read the paper. It’s very interesting.

Medscape Summary of Article

But here’s a summary from Medscape of the article. There was 11.5 million patients in the study, 9,000 primary care clinics across the US. And what the study found was that 5.4% of those likely to have diabetes in the database were undiagnosed by an ICD9. And in certain “hot spots”, Arizona, North Dakota, South Carolina, Indiana, etc., the undiagnosed proportion went up to 16%. Patients without an ICD for diabetes received worse care and had worse outcomes. Very interesting, right? So you’d wonder, well why is that the case? Why did these folks receive worse care just because they didn’t have an ICD code and they were undiagnosed according to ICD?

Medscape Summary of Article

Well, a quote from Dr. Holt who led the study I think reveals that, and that is, “It may be that a ‘free-text’ entry was added to the record, but unless it is coded electronically, the patient has not been included in the diabetes register and cannot therefore benefit from the structured care that depends on such inclusion.” So what that tells me is there’s a significant dependence on that diabetes register for the ICD diagnosis code and that’s precisely what I am arguing we need to move away from. But if you broaden your approach and you broaden the definition of diabetes so that you can infer it for medications and other sources, it’s going to improve the ability to include those patients on the register and then they won’t suffer from poor care.

Types of Registries, Not Necessarily Disease Oriented

So, let’s talk about registries other than disease registries because I think these are critically important as well, the patient safety, clinical research and other areas. So, you need to have product registries around especially high-risk products or expensive products. It’s important to have health services registry so that you’ll understand kind of the patient relationship management level, things like office visits, hospitalizations, procedures, full episodes of care. Creating a referring physician registry ends up being very helpful in the facilitation of care coordination. Creating a primary care physician registry has the same kind of impact, it’s also important. So these are all non-patient registries. Not so much patient-centric from a disease standpoint but more from a procedure and a process standpoint and these are equally important to our understanding of precise and high resolution medicine.

More Types of Registries

More types of registries, scheduling events, driving reminders for research and standards of care protocols, so that you know which patients are going to which departments and how often. It gives a sense of workflow. Of course, the mortality registry is critical, very important thing to know about your patients. Research patient registries are critically important and we’ve seen huge value in academic medical centers where we create these precise registries and we allow the researchers then to knock out against genomic data and it’s usually valuable to them for phenotyping and things like that. And typically my experience has been that the poor researchers, that academic medical centers are starving for data and it’s a relatively easy area to address quickly and provide very high value if you can add a platform of data to provide those registries. And we’ll talk about the role of the data warehouse in that shortly.

Then of course there’s the traditional, you know, disease and condition registries, the syndromes and things like that. And then in the properly formed data warehouse, you could combine all these and start looking at the correlations and the causations between all of these kinds of things. There’s kind of an unlimited opportunity in mind in this data when you start combining those.

Innumerable Uses & Benefits

So everybody pretty much knows this, but I’ll bring this out anyway. There are innumerable use cases. You can ask the physician or organization and the physicians individually, it gives them an understanding of how well they’re managing the diseases of their patients. And who else is treating patients like this and giving them access to that kind of information? Especially important in challenging conditions, small end populations of patients. It would be really nice to have a precise definition of those patients so the physicians could interact with colleagues who are treating those patients in some fashion.

How does my drug perform in disease prevention, progression, and cure? Those are the drug manufacturers. They’re really interested in this kind of thing. Clinicians and researchers are asking how is this disease expressed in the genome? How do I analyze patient trends and outcomes for a disease? How do I know which drug as a consumer would work best for me? This is when we’re getting into personalized healthcare. And this new use case, relatively new, what other patients are matching my specific profile for disease, medication procedure or device? And can I interact with them socially?

A couple of years ago, I tore my ACL and my MCL, fractured my tibia in skiing, and I tried to find the other men of my age, my size, my activity that I could interact with and talk to them about the rehab but it was virtually impossible. Everytime I tried to find a similar match, there was usually a very significant difference in some portion of our life that didn’t allow very effective communication about best practices for rehab. So I think this is an area of particular interest. Allowing patients to socially interact around these registries that are very specific will be very helpful, I believe.

Patient exist in one of three states, relative to a patient registry

You know, patients exist in one of three states, relative to a registry. They are either at risk of becoming a member of the registry. So they set a profile that could lead to inclusion that but yet they’re not quite there, right? For example, obesity as a precursor to membership on the diabetes or hypertension registry. Then, unfortunately, some patients progress and they become a member of the registry, and then they fit the inclusion criteria. And then at some point, hopefully for good reasons, a patient that was once on the registry can move off. So they are no longer satisfied in the inclusion criteria and they could be disease-free, and it’s important to track those patients as well because you want to understand all the good reasons for making that transition from on the registry to off the registry. So it’s not you want to keep a current registry of patients that meet the inclusion criteria but it’s really important to keep a historical registry of patients that were on the registry that moved off.

This is my cartoon about that process taken slightly different but basically the same thing, and that is a patient is at risk of being on the registry and can we intervene and stop the progression? Right? That’s what we’re trying to do right here – stop the progression, understand that profile of that patient, now, what we call the “patient flight path”, that’s a trademark we have at Health Catalyst. What kind of a glide path and flight path are these patients on and can we intervene before they become a member of the registry? Once they unfortunately become a member of the registry, we need to understand how they got there, and I say unfortunate, but the reality is in some cases like a procedure or a medication registry or a device registry, it may not be unfortunate that they’re on that registry. It may be just a fact of their treatment and that’s okay. There’s no problem with that. But what we want to understand is what’s going on with those patients and how do we move them off the registry with proper treatment and what’s their progression once they have these diseases, devices, procedures, medications, and that sort of thing.

And then the past then, the patient was on the registry, and they can move off the registry through a variety of patterns, through remission, improvement in care, so they go back maybe to the at risk state, we’ve cured the condition or of course I always joke that we all are going to fly that plane into the ground at some point but the goal is to have as comfortable and as long a flight path as we possibly can in life. And of course what we’re trying to do is avoid this loop. We don’t want a revolving door where we move patients off a registry for one of these good reasons and then they return right back on again to actually become a member of this risk and move back this way. So that’s another point of intervention, how do we make sure the patients don’t get back on the registry.

Patient Registry Engine

This is a diagram of what I call patient registry engine. And for the most part in healthcare today, this represents the various forms of data and data sources which you can feed into a registry engine to create very precise registries. Now, over time, this is going to increase and I think one of the notable missing items from this right now is genomics data but we’ll include other sources of data, including social data and things like that, eventually in this as well.

But right now, this is pretty much the ecosystem of data that we have in healthcare to develop a precise disease registry. And if we were taking a traditional approach to these registries, at best, we would look at ICD9 codes. But as we talked about earlier, that’s too imprecise. So we have to look at all of these other areas, all these other data sources and look for the opportunity that they provide to provide that high resolution picture of a patient. Every one of these is a potential new pixel. They’re not high resolution image of a patient and the disease state of that patient. And obviously some of these won’t play a part but generally speaking each of these add just a little more precision, a little more resolution to that picture. And then once you get that criteria for inclusion and what we call structured exclusion, we’ll talk a little bit more about that, you’ve got a precise disease registry, then there are just innumerable number of use cases that you can use that registry to support. And you want to tie it to cost and reimbursement data too as well. More and more we have to start measuring the numerator and denominator of healthcare quality and value, which is quality divided by cost. So every disease registry nowadays should have cost and reimbursement data hanging off of it. It’s valuable to researchers, as well as healthcare operations.

The Healthcare Process vs. Supportive Data Sources

So this is just a high level diagram of the healthcare process, starting with registration and scheduling, diagnosis, orders and procedures, encounter documentation, results and outcomes, the patient’s perception, billing and accounts, claims processing. And as those of you know, most of these is all very disparate. Now, you know, some of the EMR vendors have created a broader ecosystem of single database. It’s not quite as disparate as it used to be a few years ago, but there’s still a lot of disparate information systems that float around in support of this overall process.

The Healthcare Process vs. Supportive Data Sources

And these are just some of those. This is a reflection of things as they were at Northwestern but it’s typical of many organizations. They’ve got all sorts of disparate information systems associated with this care process and this represents all the spreadsheets and the databases and things that sit on desktops and that kind of thing.

Geometrically More Complex in Accountable Care and Most IDNs

So the only real option that we have so far for integrating this data is to put it into an enterprise data warehouse and it’s a geometrically more complex environment if you take that diagram and apply to an integrated delivery network or an accountable care organization. You’ve got not just one diagram like that but you may have 10 or 15 diagrams and you’re trying to pull disparate information from all of these supporting information systems. And so far the only thing that we can offer to solve that problem is an enterprise data warehouse. And I’d say that was a bit of regret because I’m still at heart a CIO looking to have as lean an environment as possible and spend fewer dollars on IT and only high value dollars on IT. So the fact that we have this requirement for a data warehouse is a bit of a symptom of a problem actually. It’s not addressing the root cause but in my lifetime I don’t see that changing.

A well designed data warehouse can be the platform that feeds many of these registries, and more, in an automated fashion

So with a well-designed data warehouse you can actually feed all these different registries, including internal and external registries, by creating a little registry application for each of these on top of that platform of data. Once you’ve got it integrated, if you’ve got state registry that requires cancer patients to be defined in some way, fine, no problem. That’s the definition that we’ll use for that state registry and we’ll pump that data up to the state and we do that automatically through the enterprise data warehouse rather than through a lot of manual processes that we have right now.

And the same thing applies here. Federal registries, and that kind of thing, population health management registry for internal work, research registries. All of these can plug into the enterprise data warehouse platform and you can extend this on professional societies, STS information and that sort of thing as well. So a well-designed data warehouse should be able to flexibly and adaptably support whatever kind of variation in these definitions and whatever kind of internal and external reporting requirements you have around these registries.

Mini-Case Study from Northwestern University Medicine, 2006

So let’s dive into a few of the things that we went through as a team at Northwestern.

Target Disease Registries

These were the target disease registries that we identified as important. And again, just a little bit data, this was in 2006-2007, but I don’t think it’s changed much across the industry. And the reason we decided that these were important, they either had some sort of compliance-related reporting requirement, joint commission kind of thing, (31:47) maybe, there was a significant research emphasis on the campus. We know just through our professional observation that these are high-risk patients, high dollar volume, significant impact in the community. So we got together and decided that these were the most important registries to target first at Northwestern, and actually there was a significant subset that we went after – diabetes, end-stage renal, and heart failure were the first that we really approached. It would be nice to hear from the Northwestern folks to see – I kind of lost touch with my team there. I don’t know how this has progressed since then and it would be interesting to know.

Inclusion & Exclusion for Health Failure Clinical Study

So heart failure is a good example and I want to acknowledge Dr. David Baker at Northwestern for this content. He was interested in a study of heart failure patients, and so he defined these ICD scores and we put that filter on the enterprise data warehouse and created the initial denominator, patients that he was interested in. And then he wanted to develop exclusion criteria for beta-blocker use. And so that became a numerator over the top of this general heart failure denominator. And these were the criteria that he defined for us in that study. So this gives you an idea of kind of how to go about things from a research perspective. But I think what you can see is, and we all agreed upon this once we got into it, we started off with ICD9. It’s a good place to start but it wasn’t specific enough. And so we developed other criteria similar to what I described earlier to make this study, inclusion criteria, as well as the numerator of exclusions more precise.

Disease Registry “Exclusions”

So our first attempts, and then this notion of exclusion is kind of important because what it amounts to is unique numerators on the broader denominator. And our first attempts in adjusting the numerator were a little, I would say, kind of immature and imprecise but you know, again, we were just beginning so we didn’t really know what we were doing at the time, so we were plowing new fields here. And so, the approach that we took in answering this question, “why should this patient be excluded from the registry?” Even though they appear to meet the inclusion criteria, we decided that in this particular case, that they had a conflicting clinical condition or genetic condition, the patient is deceased obviously, that’s an important exclusion criteria, or they were no longer in the care of this facility or physician.

And so, we decided to exclude those patients. We really didn’t track those exclusions so much. We pretty much removed them from the registry based upon these but we found out later the same approach to exclusions might not be a very good idea.

Our View on “Exclusion” Evolved

And so, we’ve evolved our definition of what exclusion really means. Because the notion of exclusion sort of implies that you can ignore these patients, but you can’t. Not only can you not ignore them from a clinical perspective, you can’t ignore them from a population and community-based perspective and you certainly can’t ignore them from a general research perspective because there’s something to be gained from understanding the entire patient population. So, rather than exclude them, just filter them as a special denominator.

So as we evolved our understanding of this, we came to the conclusion that these categories are more accurate, cognitive inability, economic inability to participate in a protocol, physical inability, all of these affect kind of this ability for that patient to participate in the research study or the clinical protocol but they’re specialized numerators and you need to account for those because about 30% of patients go into one or more of these categories. It’s very significant. Again, you can’t just ignore them, right? It’s important that they be included in some way. They can’t be excluded.

Institute of Medicine of the National Academes

Advising the nation – improving health

Again, nice to see some progress on this at the federal level, at the IOM level. There’s a group now studying this. It’s the recommended Social and Behavioral Domains and Measures for EHRs. I’m a little concerned that they think that this is just the world of EHRs. EHRs play a part in this but certainly data warehouse is playing an important part as well. So I hope that that study broadens the perspective a bit. But if you’re interested in understanding the evolution of this social and behavioral data that’s reflected in what we learned at Northwestern in 2007, the folks at the IOM are working on this and lots of foundations supported this as well, Robert Wood Johnson Foundation in particular. So we really appreciate these folks looking at this, because, as we know, about 80% of healthcare is attributable to social and behavioral domains. A lot of what we’re trying to affect with accountable care actually lies outside in the community and in the house. So it would be interesting to see how this evolves.

Diabetes Registry Data Model

So this is an example of high level data model, the diabetes registry data model. And if you’re a data modeler, you’ll not even get too excited about this, and I’m okay with that. We tried, our approach to analytics and data warehousing is actually minimize and simplify data models as much as possible. And this is the diabetes data model that we use at Intermountain Healthcare and is still for the most part the same diabetes data model that is now used. We use it at Northwestern, we use it at Health Catalyst. It’s a very simple data model. And you could replace diabetes here with virtually any other condition or disease but all of this other data that surrounds that registry is going to stay essentially the same.

The precision of this registry is what’s important, and then you bring in everything else associated with that precise registry. So it’s very replicable. It’s a very simple model to repeat. And of course all the use cases are pretty straightforward. How many patients do you have, that’s a basic question. What was the result for all the clinical measures, what are the medications that they’re on, how long have they been taking each, what was addressed in each of their visits for the last couple of years, which doctors have they seen and why, how many admissions have they had and why, what co-morbidities are present. And of course, imagine a whole series of these, congestive heart failure, hypertension being out here. With a well-designed data warehouse, it’s easy to link those registries together to identify these comorbid conditions. And then which interventions are having the biggest impact on LDL. Those are all typical analytic use cases you can run against this model but it has to be precise at the center of it here.

Building the Diabetes Registry

So here’s the high-level diagram of what we did at Northwestern when we built the registry. Darren Kaiser that I mentioned earlier was the architect behind this. We dove into Epic-Clarity, we pulled out problem list, orders, and encounters initially. Same kind of thing at Cerner and this is orders, it’s where we got medications and things like that and labs as well. And then we also went into IDX and the billing system and brought that into the inclusion-exclusion criteria of the registry engine and produced the diabetes registry from that with these attributes and things.

Data Quality & The Disease Registry

One of the interesting things of course in this journey is you will inevitably find poor data quality and there were lots of data quality problems. The BMI data quality one just happens to be one example of many in kind of a humorous example. You can see some pretty significant BMIs out here that were entered into Epic and Cerner. They didn’t make a whole lot of sense. And so, we had to have a formal way for looking at this data from a data quality perspective. And then going back to the patient record, going back to the source of the data collection and fixing that because it was having – you know, if you added all this up, there’s a significant number of patients out here that would suffer if we didn’t identify their true BMI. Their care would suffer if we didn’t properly identify that. So this was a big, not just data quality improvement initiative, but it was a clinical quality improvement initiative to get this data cleaned up.

Investigating Bad Data

And then, you know, you can see some of these birth weight. They’re on high numbers here. And lots of those by the way, you see all these nulls in very high numbers. Then you’ve got patients that weighed 7359 lbs. It’s a little frustrating to me that EMRs can be deployed both at the software engineering level, as well as the configuration level, but don’t make these kind of common sense data quality checks a part of the upfront data entry process. It’s crazy that we can actually enter that kind of data. But that’s the nature of the beast we have to deal with. And so, this data quality journey is an important part of these registries that you have to go through.

Closed Loop Analytics

And we also found, you know, in this work at Northwestern, and I think we’re all starting to see it more recently, that you have to close the loop of analytics back to the point of care. You have to expose disease registries back to the point of care. And we did this at Northwestern. Interventions that are overdue, recommended testing and things like that, those are all the kinds of reminders that you can drive from analytics back to the point of care. It can’t be on a piece of paper that’s separate from the physician’s workflow or the nurse’s workflow.

So we implemented this in Epic by invoking web services within the Epic programming points, we invoked external web solutions within Hyperspace, and we fed data from the data warehouse back into Epic in these various ways for the CUIs and the FYI Flags and Health Maintenance Topics, and that kind of thing.

And this is the high level cartoon of that process, where we plugged in to the EHR user interface the standard stuff here, right? Encounter data, inbox, orders, all that kind of thing. But then we opened that API and we were able to put population management cost data back in the point of care so this physician would have a broader understanding of not just patient encounter data but population management and economics of care as well. And we did it through an EHR, services-oriented architecture call, the API. This was in SQL server data warehouse at Northwestern. This is of course amongst cache environment here but our programmers there were very capable and we plugged it all in. But not every organization can afford programmers to do that. And so, we need to figure out a way to get this to be a more common thing in healthcare and I’m really happy to see progress from Geisinger and Cleveland Clinic.

Geisinger & Cleveland Clinic Make It Commercially Available

And I want to give a shout out to Brent Hicks, a good friend and a programmer, a teammate of mine at Northwestern. He’s now at the Cleveland Clinic and he’s actually providing a lot of the technical support and the technical brains behind this that we started at Northwestern and now is emerging at the Cleveland Clinic. But they’ve developed commercially available tools that fit right here in this HER, Services-Oriented Architecture. I think both the Cleveland Clinic and Geisinger are using FHIR to do that from HL7. But you can see we’re going to have commercially available options now to do what we have to patchwork together at Northwestern with duct tape and bailing wire. So great news ahead that’s going to allow us to insert analytics back to the point of care.

Nitty Gritty Data Details

So let’s go through some of the nitty-gritty data details here and I want to thank Tracy Vayo for doing this, a Health Catalyst teammate, former Intermountain teammate. Just to give you an example of sort of the work that goes into this. It’s not rocket science but it’s not simple either and it’s quite laborious to produce these precise registries, which is one reason that we can’t keep repeating these over and over again in the industry. We have to find a way to either define these faster federally or we have to figure out a way to do it faster and more commercially available maybe through a commercial outsource agreement or something like that. So, let’s dive into these.

Poll Question

Does your organization have a patient registry data governance and stewardship process?

But before we do that, I want to pause for just a little data collection here and ask another poll question and ask, does your organization have a patient registry data governance and stewardship process? And I want to emphasize the importance of this because those definitions, be they external or internal, have to have some sort of governance and stewardship to make sure that they stay consistent, and if evidence suggests that the definition should evolve that they evolve properly. So this is evidence-based medicine at the data registry level, creating stewardship over that and making sure that it evolves.

So Tyler, could you pop that up for us please, friend?

[Tyler Morgan]

Alright. I’ve got the poll up and have it up for just a minute or so. And while I’m letting you finish filling that out, again, I would like to remind everyone that, yes, we will be providing the slides and also the responses to these poll questions after the webinar. And if you do have more questions for us, you can enter that into the questions pane.

I’m going to go ahead and close the poll now. And let’s share the results. Okay. Dale, we’ve got 8% answered yes and it’s very active; 25% answered yes, somewhat; another 25% answered no, but we are talking about it; 8% answered no, not at all; and 33% answered my organization does not manage patient registries.

[Dale Sanders]

Great. Well it’s good, again, to hear that the 8% and the 25%, the top two yesses, that that’s happening. So, congratulations to those of you that are involved in those organizations and it’s going to be an important part of what we do in the future.


So let’s go into these. I asked Tracy to put together a few examples. So, think back to the diagram with the data sources on the left, the disease registry engine and logic in the middle, and then the disease registry itself on the right-hand side. So what I asked Tracy to do was put together a profile, and this is by the way the emerging profile that we are using for these disease states in Health Catalyst disease registries. I do want to note that it’s not exhaustive. We’re still working on these. It’s a lot of work. But this is how we’re going about it.

So there is a diagnostic criteria, and there’s actually a missing column here and that is over time we’ll want to list the source for these definitions. So who’s decided that an A1c greater than 6.5%, which evidence, which body has decided that that’s the scenically known definition for a diabetic patient? We want to add another column over here that lists that source of reference and it could be that it’s Health Catalyst Clinical Team, it could be that it’s CMS. But it’s important that we reference these so that we all go back to some form of standardization over time. So here you see the diagnostic criteria, then all the ICDs that could be and should be involved in the precision of that registry and the ICDs.


And continue to go on. The SNOMED codes that you’re probably going to retrieve from the problem list hopefully. Lab tests as listed above, plus these. Radiology, there’s not really a radiology result that applies to the diagnosis of diabetes, so that would not apply in this case. And then common medications, so that if you see a patient taking one or more of these medications, you can probably infer that they have diabetes, even though they may not have a lab test or an ICD9 associated with their record.


Another example, hypertension. Again, we have the diagnostic criteria, we have all the different ICDs, and by the way this is where I’m a big fan of ICD10. I think the added precision of ICD10 is going to help us with the precision of these disease registries. So I’m a big advocate of ICD10. SNOMED again for hypertension. Lab tests. Radiology, probably not much there that we can mind, and then common medications as well.

And then eventually we’ll want to add another row as we become more sophisticated with our national language processing. So let’s put terms within clinical notes to extract and infer hypertension, even though it may not be accurately reflected or collected in any of these other sources of data.

Sepsis/ Severe Sepsis/ Septic Shock

And then finally to give an acute care example, the sepsis, and septic shock diagnostic criteria. And again, a very important registry to have. You can kind of question the real-time value of a registry of this type but certainly it has huge value retrospectively to identify how patients got on that registry and how do we intervene to prevent it from happening in the future and how do we treat those patients to get them off of that registry.


And again, it’s kind of a fuzzy diagram but that gives the numbers and the codes associated with the various forms of sepsis, septic shock and others. And then you have SNOMED, lab tests again that we look at, radiology. There are pathology reports obviously and that would probably show up in lab tests. And then common medications that suggest the patient was suffering from some form of sepsis. And that’s the sort of thing that all feed into the registry engine for these different states of patient existence.

In Conclusion

Okay. So, that concludes things, friend. And again, I just would make the assertion that precise registries are required for precise high resolution healthcare. So much of what we do depends on these registries and our dependence should be growing and it is. They are tough to build. We can’t keep building these things from scratch. Federal efforts are moving a little too slowly from my preference but God bless them for trying. Precise registries can be a commercial differentiator in the vendor space but most vendors are still stuck on ICD codes and billing codes. So keep an eye on that. Precision is important and it’s going to evolve. The precision is going to evolve. Just like that pixelated photo, we’re going to keep improving the resolution of these registries for many years to come. So, the data environment has to be flexible, has to be adaptable technically for those changes and your stewardship and governance process have to keep up with that evolution.

Feel free to contact me. There’s my email address and my Twitter handle, and again thank you so much for joining us today and please share your thoughts with me. Personally, you’re in the question and answer section that’s about to follow.

Thank you

Upcoming Educational Opportunities

And just to plug too, there’s another upcoming educational opportunity on kind of an introduction to Healthcare Data Warehousing Analytics provided by Jared Crapo, one of our senior consultants.

Okay. Thank you.

Tyler, I’m all done, friend. If you have any questions, I would be happy to entertain those.

[Tyler Morgan]

Alright. We do have quite a few questions. I just want to confirm through Dale, do you have the time to stay on to answer all these questions?

[Dale Sanders]

I have time. I have to catch an airplane, so I can stay until quarter past the hour probably.

[Tyler Morgan]

Alright. We’ll see how many we can get through then.


How in your model do you account for false positives, in other words, someone that had incorrectly coded into a registry? Wow, that’s a great question.   Now, we have encountered. Let me think about how we’ve approached it.   The more data that you add, the easier it is to identify a false positive. So if for example there’s been a coding error and they show up as having diabetes or whatever, but they don’t show up as having any medications or any other data points, but it indicates they have diabetes, then the confidence level that you have in that patient actually having diabetes is lower.

And so, one of the things that we played with at Northwestern and at Health Catalyst that’s kind of a whole another level of complexity is actually assigning a confidence level attribute to each member of the registry, so that you know for certain if you’ve got a lab result, a medication, a SNOMED code, an ICD9 code, and it all points towards diabetes, then you can be 99% certain that that patient has diabetes. If they only have one of those data elements associated with the record, then there’s some question about that.

We don’t have a formal algorithmic way to attribute that confidence level yet but we’re working with an outside company that actually comes from the aircraft world and predicting aircraft failures and things like that, and we’re hopeful that we can get that kind of algorithms and formality from them.

But that’s a great question though, very good question. Thank you.

How strong is the movement nationally to get manual chart abstraction automated? I do understand that EMRs need to generally improve in their digital data quality before automated registry can become a reality, but I was just wondering how much talk there is currently about eliminating manual chart abstraction? Well, I mean there’s a constant topic of conversation. So I don’t really know the answer to that yet. It is a problem and we’ve tried various ways using national language processing to aid that process. And it helps in coding for billing purposes. There is some pretty nice NLP for helping with sort of maximizing billing and avoiding compliance issues that’s NLP-based. You can do some reasonably good things in radiology and pathology with NLP because of the structured nature of the note. But outside those areas, NLP as a tool for avoiding manual chart abstraction has been less than impressive and it’s just the nature of human language and the complexity to that that make it difficult.

And of course the other side of this is we can’t keep expecting our physicians and nurses to be the digital samplers. We keep adding more and more clicks to the lives of physicians and nurses. Click one more button please because we need that discrete data element. I wish there was a challenge. I wish there’s some sort of challenge and would say, you know what, software engineers and EMRs, we’re going to give you 50 mouse clicks from start to finish.   50 mouse clicks is all I’m going to give you as a physician as I interact with your electronic medical record.   You figure out how to maximize the value of those mouse clicks. I’m not going to give you anymore. And I think that might drive some greater emphasis with that constraint on better clinical notes that would also help reduce the impact on manual chart abstraction.

How does ICD10 affect the reporting on a diabetic patient? And I think you did allude to this earlier about precision. Yeah, I think that’s basically the answer, it’s going to help with precision. And I’m a big advocate of ICD10 for that and other reasons. That precision of clinical diagnosis is going to go way up. And I’m not overlooking the challenges that it will put on nurses and physicians but at the end of the day, we are going to end up with very little impact on workflow. We went through this consequence of being in the Cayman Islands and being associated with International Healthcare that’s been on ICD10 for a long time.   I can speak very loudly to the benefits of it and the pain of transition is not as big as we make it in the US and the impact that it has on nurses and doctors isn’t as big as we try to make it either.
How would you distinguish or would you distinguish between a patient registry and a clinical outcomes registry? I don’t think I’d make a distinction.   I think the clinical outcomes is an attribute of the core disease registry honestly. That’s how I would look at that. But I’m certainly open to having my mind open to some other creative thought on that, but I can see a clinical outcomes registry as an attribute of the core disease registry and that’s how I’ve always treated it and designed disease registries in that fashion. In fact, we’re building right now the patient flight path profiler at Health Catalyst around that concept. So imagine slicing the diabetes registry in such a way that you identify those patients who have had the best outcomes and defining the best outcome is actually a little tricky but we think we’ve got a pretty good definition of that. So slicing that as a numerator and the disease registry, those patients that have had those outcomes and that then defines the best flight path for all other patients, and we try to figure out how to get other patients on that flight path.
What is the role of patient’s generated data in the registry definition? Well it’s going to grow, that’s for sure. And if not to my dear friend and colleague, Joe Boyce, at Heartland Health, he is one of the first CMIO/CIO to actually start incorporating patient-reported data in a clinical record. I don’t know if they have incorporated that as attributes in the patient registry if they have incorporated that as attributes in the patient registry yet but it’s going to be huge, right? We have to start allowing patients. I mean I think there is a caveat that are capable cognitively and otherwise to contribute data back to the registry so that it fine tunes the precision of that registry. I think it’s especially important for outcomes. The fact that we don’t do a better job collecting patient reported outcomes yet as an industry is a tragedy. So yeah, it’s becoming very much important, especially those socioeconomic factors, the data that’s not as easy for us to measure within the four walls of healthcare delivery organization.

An elderly man over 65 that just had a CABG, it would be nice if that data has been captured somewhere, to know if he was living alone in home with no support. And that’s the sort of data that isn’t typically collected from a clinician. It’s typically – that’s the sort of thing that you only know if you’re a member of the family or that patient. But it’s critically important to readmissions.

Just for data mining and knowledge, it would seem to me that a patient’s data should remain on the registry, even if the registry criteria, which landed him on the registry, have been resolved. This allows the analytics team to continue to use the data to understand patients with similar profiles, to gather data or know who at a point in time meet certain criteria. That should not be a reason to remove someone from the registry since there is great value in knowing the background about that person.   Any comments? Yeah, I agree. I think we’re probably seeing the same thing. When I say remove, it’s probably just, it depends on how you model the data. You can either flag it as attribute, patient no longer satisfies the criteria for being on the registry, but yet you keep them in the registry. For those reasons we talked about earlier, you definitely want to know those patients that have moved off the registry for good reasons. So I totally agree. A lot of the fact, the patients that have moved off the registry for anything other than mortality are the folks who we want to understand because they’ve had, we hope, some sort of good outcome that moved them off the registry. So I totally agree.
Because traditionally registry data stewardship resides with quality departments. Does this still work given the technology involved? Well, where should it reside…I think the quality department is as good a place as any to start because a lot of those registries that we’re kind of facing right now have a compliance in regulatory reporting dimension.

But my experience is that the data stewards that are best assigned to a registry are the clinicians at the frontlines of care who are capable of sitting down, having the dialogue with the data architects and data analysts about how to translate the definition, the diagnostic criteria, and the overall definition of what constitutes a particular diagnosis and translating that into data. And you know, the quality department, there’s quite a few very qualified nurses and physicians who work in the quality department but their expertise tends to be very broad in general and I think the best data stewards tend to be the physicians and nurses and clinicians that are more directly tied to the patient types of concern. So for diabetes it’s more likely an endocrinologist, maybe a primary care physician, general internal medicine doctor. Those are the folks that I think are most qualified where they are more closely focused on the disease registry as a big part of what they do day in and day out.

What do you do with patients who are on multiple registries and how do you avoid the waste that would go into duplicated work? I’m not exactly sure what they mean by that. Duplicated registries, I suppose a patient, you know, you might have an internal breast cancer registry or you’re on internal clinical process improvement and research and then you might have to also submit that same patient’s data to a state regulatory body. If that’s the kind of redundancy we’re talking about, I don’t have any concerns with doing that in a data warehouse. It’s not a big burden to do that. The redundancy doesn’t cost much. So I’m not sure if that’s the question or not, maybe we can follow up with that a little bit later. But then again there’s the other kind of redundancy, which is comorbid conditions where a patient might show up on several registries. And again, I haven’t seen that being a big problem and even though you may duplicate some of their data across those registries, the risk, the cost of that data duplication is kind of inconsequential.   So it would be understood to know if there are other thoughts that can clarify my thinking on that. That would be great.
Were the Northwestern registries all internal registries created from scratch? Yeah, those that we listed were all for the most part internal created from scratch. Of course, we have the other registries that we were required by the state or professional societies or federal government to support.   But yeah, they were all built from scratch. Yup.

Yeah, I have a dream that at some point there will be like these little snippets of code that you might even consider in a registry API and that you’ll be able to buy those standard definitions off the shelf from some vendor or maybe it’s the federal government, I don’t know, but just foresee a library of disease registries that the definitions are reliable, you can take that code, you can plug it into your system.   If you want to fingerprint it with your local physicians, you can. But, you know, 80% to 90% of the work is already done for you, if not 100% of it.

That’s kind of what we’re trying to do in Health Catalyst. I mean it is essentially what we’re trying to do. We’re trying to create all of these disease registries that are very precise and we’ll pull those off the shelf everytime we go into a client’s site.   But I’d like to make that less Health Catalyst-specific. I’d like for those to be more available across the industry.

If you could repeat the reference to the patient-reported outcomes, the expert that’s work on that that you said is your dear friend? Oh, Joe Boyce, Dr. Joe Boyce, he is the CMIO/CIO. I wouldn’t necessarily call these patient outcomes yet. It’s just patient participation and contributing data to their Cerner medical record. I don’t know if Joe has actually moved in to the collection of outcomes data collection yet or not. But just the fact that he’s allowing patients through their portal to contribute data back into their EHR I think is just, it’s both revolutionary. And so blindly common sense, it’s amazing we haven’t done it before. But kudos for Joe for doing that.
What methods have you used for managing patient identity when needing to integrate multiple sources? Well there’s kind of a variety of patient-matching tools that are on the market now. All the EMR vendors have one. The standard for many years was Initiate. I was kind of always grumpy at Initiate because they basically took an open source, federally funded algorithm and commercialized it and then they charge an arm and a leg for it.

But the patient-matching algorithms and APIs are becoming more and more commodity, and so there are a number of them on the market and I would be happy to share those with you. I mean, Oracle has got one, IBM has got one, all the EMR vendors have one. I’m trying to think of a couple others, but they’re out there. And relatively easy to use and very valuable of course, especially in these environments that you’ve just described.

[Tyler Morgan]

Thank you, Dale. Now, before we close this webinar, we do have one last poll question. Now, our webinars are meant to be educational without various aspects affecting our industry, particularly from a data warehousing and analytics perspective. We have had many requests, however, for more information about what Health Catalyst does and what our products are. If you are interested in Health Catalyst introductory demo, please take the time to respond to this last poll question.

Now, shortly after this webinar, you will receive an email with links to the recording of this webinar, the presentation slides, and the poll question results. Also, please look forward to the transcript notification we will send you once it is ready.

On behalf of Dale Sanders, as well as the folks at Health Catalyst, thank you for joining us today. This webinar is now concluded.

[Dale Sanders]

Thanks everyone. Have a great day. Bye bye.