AI for Healthcare Leaders: The New AI Frontier for Improved Leadership Decision Making

Jason Jones:                  Thank you all for joining today, spending some of your time with us. This is a topic that I care deeply about, and yet it’s not one that I’ve seen broadly embraced out there in the world. I’ll be really interested to hear your feedback and thoughts through this process. The topic is how it is that we can think about AI for leaders in healthcare. There are a couple of things that we’ll talk about. One is, why is it that AI should be thought of as complimentary or augmentation and not artificial or replacement?

Another is to try to identify some really specific tools that you could use at your own organization as they say at the Institute for Healthcare Improvement or IHI, next Tuesday. The last is to describe how your organization can move forward to implement augmented intelligence specifically for leaders. With that, we have our first poll question, Sarah.

Sarah Stokes:                All right. Let me go ahead and launch that. Okay. In our first poll question, kicking things off here, we’d like to know what is your role? Your options are one, senior leadership C-Suite, or you’re an EVP or above. Your option two is department, hospital or regional leader. Option three is clinical or operational staff. Option four is program or large project manager, and option five is IT or analytics staff. We’ll give you just a few moments to get your responses in there.

Appreciate you being pretty quick on the draw there everyone. If you joined us just now, please know we are recording today’s session. You will have access to the slides. Okay. I’m going to go ahead and close that poll and share the results. Okay. It looks like 10% are senior leadership, 16% are department, hospital or regional leaders, 11% are clinical or operational staff, 27% are program or large project managers, and 36% or IT or analytic staff.

Jason Jones:                  Wow. This is a really wonderful mix and thank you so much. One of the things I was hoping is that it wouldn’t all be those of us who are in the IT and analytics staff part because one of the things that I’ve seen that can be discouraging for those folks is if they try to do some great work, take some risks, and then the people they’re trying to support say, “Just go back and do what you’ve always done.” That’s of the things that we’ll close off with as well is how it is that we can sort of shrink the gap between how it is that we may be able to use data, and how it is that we’re actually using data within our organizations today.

Really, really grateful both for those of you who come from the sort of decision support world, but also thank you so much to those of you who came as our clients and customers, those of you who need to make decisions especially leadership decisions, and health, and healthcare. Okay. The next thing that we’re going to talk about is how it is that we think about AI at the highest level of leadership. After that, we’ll talk about separating signal from noise and being able to achieve a future orientation, which is so critical for leaders. Then we’ll close and spend a fair bit of time on what may, for some of you, be quite an unexpected application of these tools. We’ll go on to our next poll question.

Sarah Stokes:                All right. In this poll question, we’d like to know on a scale of one to five, how likely is your Board of Directors or your senior leadership team, if you don’t have a Board, to have a machine as a voting member by 2025? Your options are one, 0% inconceivable. That’s a great word choice. Option two 25%, option three 50% chance, option for 75%, and option five 100%. You are certain that your Board will have a machine as a voting member by 2025.

Jason Jones:                  Yeah. Kind of an allusion to Princess right there. We know how that went. Some of us.

Sarah Stokes:                Okay. I’m going to go ahead and close that poll and we’re going to share the results.

Jason Jones:                  Yes. So many of you, most of you, majority of you agree with it being completely inconceivable. I love that response and I don’t know whether to agree with you or not, but you are in a really good company. This was not some kind of a random question. This has actually been surveyed. So, what we have on the right hand side here is a survey saying what’s the likelihood that each of the items on this list might occur by 2025.

Sarah, I am interested in your opinion, what on this list strikes you as interesting or you think differently about, is there anything that kind of pops out here for you?

Sarah Stokes:                I mean, the driverless cars one scares me a little bit still. But them equaling 10% of all cars on US roads, that seems pretty ambitious. But there’s actually a lot of really cool stuff that also scares me a little bit, but it’s pretty cool stuff too for robotic pharmacist. But I can see why people are leaning towards wearables since we’re already seeing that.

Jason Jones:                  Especially wearable clothes.

Sarah Stokes:                Yes, wearable clothes.

Jason Jones:                  Someone whose-

Sarah Stokes:                Clothes are quite popular right here.

Jason Jones:                  Yes, I’ve personally been shocked by an electric blanket. This one kind of makes me nervous. But anyway, what also is interesting is look at what’s at the bottom of the list. At the very bottom of the list here, what we have is the first AI machine on a corporate Board of Directors. You all who said, “No, that’s impossible,” are in really good company. The other thing that is worth pointing out is that it is the only thing on the list where people thought there was less than a 50, 50 chance that this could happen. Why might that be?

One thing I’ll point out is that surveys themselves can often tell us just as much about the respondents as they can otherwise and who was actually responding to this. The survey was conducted among 800 executives and experts. Pretty much one of the takeaways that I have from this is that among executives, we tend to think that everybody’s else’s job can be automated but not ours. How do we think about that in health and healthcare? How do we think about who our most educated and most practiced individuals are? Which some of us might argue are actually our physicians, our nurses, our pharmacists where they’ve spent, perhaps 12 years in school and another 10 to 20 years in practice. Where is it that we focus our efforts and predictive modeling, and AI, and machine learning? It’s on those people.

It’s not to say that we can’t deliver benefit there, but why do we feel as though they’re the people who need the most help? Yet somehow if we reach the highest level of leadership in the corporate world, that now… Yeah, not sure we need that help so much. I’m not saying what the right answer is, it’s just interesting to think about. By the way, is a wonderful document that was put out by the World Economic Forum and an interesting read. The link is at the bottom and you can have the slides and check it out if you’d like to.

Sarah Stokes:                Before you leave that topic, Phillip just asked for a clarification. Do you mean a voting member or using extended AI to help formulate decisions?

Jason Jones:                  That is a great question. That kind of comes down to, how do we think about our board members, especially in health and healthcare? I’ve seen them where we have healthcare organizations with boards comprised almost exclusively of people outside of health and healthcare. There we would very quickly say, “Geez, how do we make sure we get that healthcare perspective as well, especially as it comes to quality, and clinical, and operational issues? I did intend to mean as I think that the survey question intended to ask is no, it’s a voting member. How do we think about that? I mean, we have voting members, as I said, who are not clinicians. That’s standard.

We have voting members who perhaps are our lawyers or HR professionals or in the airline industries or something else, and we don’t really think twice. We actually think it’s beneficial to have that perspective voting at the table. How might we think about AI? Again, not saying it’s the right answer, but thank you for the question. Yes, I did mean an actual voting member. So, we’ll go forward. Here we have a self-driving car picture, which is a little bit scary to Sarah at the moment. But Sarah, I’m going to ask you, would you like a self-driving car?

Sarah Stokes:                For me, I don’t know.

Jason Jones:                  Do you think it could be safe?

Sarah Stokes:                It’s tough, right? Because you think, “Well, are people that great of drivers?” We all have our own opinions on that and something operating based on an algorithm do better probably. But yeah, I think for me I’d need to see a bit more testing and I’m a pretty risk averse person, so I think it’s not for me just yet.

Jason Jones:                  How about once it’s safe? Never had an accident, I’d say.

Sarah Stokes:                I guess, yeah. I was going to say what’s the definition of safe? But I mean, yeah, then for sure I think it could be good and it would help mitigate any of those distractions or things that drivers have, right? That’s part of the big reasoning to have a self-driving car I think about.

Jason Jones:                  Let me ask you this because… Let’s just pretend that that’s starting to become a reality and there are some signs that it is. that’s why your opinion matters so much and each of our opinions matters so much. Now, this conversation wouldn’t have really mattered 20 years ago, right? Because you could say, “Oh, I want a self-driving car.” Well, guess what? You can’t have it.

Sarah Stokes:                Th technology doesn’t exist.

Jason Jones:                  Right. Now, we’re getting to the point where we can have a meaningful discussion and as we think about AI, we can now start to have a meaningful discussion of what is it that we would like AI to do for us because it’s becoming technically feasible, which is wonderful. Computers are getting faster, algorithms are getting better, data capture in some respects is getting better and that’s needed for the AI to work. But now let’s just pretend that you said, “Yeah, okay, once it’s safe, I’m good with the self-driving car.”

Now the next question is, would you like that self-driving car to choose where you’re going? Not only is it going to drive you there, but you’re going to hop in and you don’t tell where to go. It just decides and it goes. Would you like that?

Sarah Stokes:                No.

Jason Jones:                  Okay.

Sarah Stokes:                I want to maintain that control over my life.

Jason Jones:                  Okay. You dropped out pretty quickly and I kind of thought about a question to the attendees and thank you again for participating. But the next question is usually there are about a quarter of people who say, “Yeah, I actually don’t care if it decides where to go. I don’t like decisions. They’re not quite as controlling as you are.” But then the next question is… And I’m glad you are because Sarah is actually running everything. It’s good she’s controlling it. But then the next question is would you like the car to not only drive and not only decide where to go, but actually to just go there, come back and tell you about it?

Sarah Stokes:                No, no.

Jason Jones:                  So you’re just hanging out on your couch?

Sarah Stokes:                Yeah. I mean-

Jason Jones:                  You will not go anywhere.

Sarah Stokes:                I guess I’m assuming I have the option of a normal drive at myself type car, which-

Jason Jones:                  No, no.

Sarah Stokes:                … if I don’t, then I guess I would go out. I’d be a little more amenable. But in my mind, I’m picturing more like suggestions. If my car says, “Well…” If I can tell my car I want to eat and it says, “Okay, I’m going to take you to a good restaurant,” then maybe I’m okay with that. But it’s something about maintaining some level of control.

Jason Jones:                  In this case, it would have not only decided to go to the restaurant, it would have gone there and come back and left you at home. Perhaps the only place you would be willing for that to be okay is if it was going to the DMV.

Sarah Stokes:                Yes. That’s accurate. Yeah. There’s a few places like that.

Jason Jones:                  Why does this conversation matter so much today? It matters because the technology is coming not just for self-driving cars, but how we think about AI more broadly and that’s why this conversation matters because we should start to think about what is it that we would like the AI to do and that’s why it’s augmented and not just artificial. We shouldn’t just assume that it’s going to replace whatever it was that we were doing. We should think about what is it that we would like it to replace? Maybe we only want it to replace the driving and we don’t want to replace the decision making, in your case. This matters even more in health and healthcare because so few of us would share an opinion about what is it that we want, what counts as health for us? What is it that we want from healthcare?

As we apply it at the board level or the leadership level, we know every organization has its own values and its own preferences and we should think carefully about how it is that we want to incorporate those values and preferences and how it is that we want to retain agency over them.

Sarah Stokes:                You did get a funny comment from Eric Tate who said you just described parenting.

Jason Jones:                  Yes. In some cases, yes. It depends on the age of your kids I think and how last night went in some cases. Good comment, Eric. Okay, so let’s take a look at separating signal from noise. It’s something that computers do actually quite well and we’re going to first go to a poll question.

Sarah Stokes:                All right. In this poll question, we’d like to know how effectively does your board or senior executive team leverage data to support future decisions and your options are one, never. They don’t look at data. Option two is not often. They’re mostly looking through the rear view mirror. Option three is frequently. They leverage data and discuss interpretations. Option four is always. They are actively leveraging analytics to drive decisions, and option five is you’re just not totally sure or this isn’t applicable.

We’ll give you just a few moments there. They’re having to think a little bit more about this one. If you joined us late, we are recording today’s session. You will have access to the slides. So please keep an eye out for an email with those items tomorrow. Okay. I think we’re going to go ahead and close that poll.

Jason Jones:                  Okay.

Sarah Stokes:                We will share those results.

Jason Jones:                  Thank you.

Sarah Stokes:                0% with never

Jason Jones:                  That’s really good And that’s actually different in this group from other groups that we’ve asked this question of. So either everybody’s shifting or you all are at organizations that represent different organizations from what we’ve seen before. Perhaps that’s why I here attending today. Really impressive that half of you are frequently looking at and leveraging data in discussions. That’s wonderful. Okay, so I’ll be curious, although we don’t ask whether or not you’re leveraging the data or providing the data or getting the data provided to you in the way that I was imagining.

Challenges with leadership reporting. One is to separate signal from noise and the other is focusing decisions on the future. What are some examples of separating signal from noise? We might ask, is hospital A better than hospital B? We might ask, have we improved over time? We might ask, where should we set an improvement goal and how can we be sure that the goal that we set for ourselves isn’t actually in the normal variation that we would expect? This can turn out to be actually quite difficult for human beings and computers can help us quite a bit as we’ll see in a moment. There are organizations I know of that have set improvement goals, worked on them for two years, and then realized that actually we achieved our goal and it turns out that that achievement was in the range of normal variation.

These are real things that actually happened to us. The other is, one of the saddest things I’ve heard actually at an organization is that we use data in our organization to justify the past instead of making decisions for the future. This is tricky and it’s easy because what are we looking at when we look at data? By definition, when we look at data, we have to look at something that happened in the past. How can we help people reorient to the future? Not, “Let me come here so I can slap you for your performance six months ago.”

But where do we think that will be in a year? Now we really start to get into sort of the equivalence of if you had a self-driving car, would you like to go to a restaurant? If so, which one? Are we satisfied with the performance that we think we might attain in a year? If we’re not satisfied, what do we think we might change? If we are satisfied, how do we ensure that the performance trend that we’re on is actually sustainable and not on the backs of people who are going to break with their superhuman efforts over time.

Again, these are the two challenges that we’re going to focus on for a moment around leadership reporting, separating signal from noise, and then trying to achieve or facilitate this future orientation as it relates to decisions. We’ll just look at one image quickly and it can be a little bit of an eye chart and requires a little bit of an orientation. I’ll just step you through it. What we’re looking at is a key measure of performance. In this case, we have seven different geographies in the system, over all. Those are indicated on the Y axis. On the X axis, we have the level of performance attainment.

For instance, the top performer in the system, this geography number one, looks like it’s around 83%. The vertical blue line indicates that the target was around, say, 77, 78%. By the way, this green bar on the right indicates the direction of good. This might be something like say hemoglobin A1c control in diabetic patients where more is better, but if we were looking at something like hospital falls, then less would be better.

In this case, being to the right is a good thing. The lowest performance in the organization is down here and it looks like it’s around about 64%, say. We can quickly get each of the different point estimates for where it is that our geographic performances. But there are a couple of things that I’ll point out quickly and it’s where it is that we get to use the same sort of AI machine learning statistics tools that we know and love and other places. They’re in fact the same tools that we can use to, for instance, predict the probability of readmission for an individual heart failure patient walking out the hospital door right now.

We can use those same tools to help us understand what’s the nature of the variation that we’re seeing across our hospitals. One thing that you’ll see is that these lines are different widths. Why are these lines of different widths? Well, they’re of different widths because the geographies are actually of very different sizes from each other. So, where are you see a very narrow with… Up here, geographies one and two are very large. Geography five is very large, but this seventh geographic region is actually quite small and that’s why we’re seeing the error bar be so wide here. We’re using here sort of standard statistical and machine learning techniques to figure out our certainty around our estimate, the same things that you would expect to see in any peer reviewed publication.

But here we’re going to apply it again to leadership decision making. The next thing that you might notice is that some of the shapes are different. An upward triangle means that we’re statistically significantly better than the target. The downward facing triangle says that we’re statistically significantly worse than the target. After that, you might notice that there are actually some different colors. Each of the triangles that we see can have a different color. Blue is indicative of the top performers, red is in indicative of the bottom performers. You’ll see there’s some kind of different shades. Sarah, I only ever graduated the eighth color crayon box. Can you tell us, what are these colors?

Sarah Stokes:                The bottom of the three looks more of a pink, like a hot pink. Then we have kind of purple above that. Both those look purple.

Jason Jones:                  Okay, okay. Thanks. One of the things that we can do for people who, for instance, may be color blind or otherwise our eyes are just not good enough to detect the differences is over on the right hand side you’ll see some letters and those letters match perfectly with the colors and so what’s that telling us? It’s not only telling us that geographies one and two are better than the target, but it’s also telling us that they’re actually identical to each other if we think about their performance. Similarly, both of these hospitals… Sorry, not hospitals. Geographies three and four are tied with each other, but different from this fifth geography, which again is different from the sixth and again is different from the seventh.

I want to be careful here because there are different ways of figuring out different techniques that we can use for figuring out which geographies or hospitals or something else are really different from each other, but what we’re doing is facilitating that dialogue among leaders who now are not making it up in their own minds. They’re not just saying, “Oh, well, I think geographies three and four are really the same and the same as five.” At least now we have the opportunity to see while the computer is making visible to all of us that three and four are similar and five is different. Do we agree with that assessment?

Here we’re talking about difference in terms of performance but we can also of course bring a lot of our other intelligence to bear different in what other ways are we looking at rural communities versus urban communities or communities where we have a very high market share versus ones where we don’t or whatever else, new leadership versus seasoned leadership or something like that. We can bring a lot of our other intelligence and knowledge to the discussion. But what we’re asking the computer to do for us, the job that we’re asking the computer to do, is to make visible for all of us to see where it is that it is seeing important differences.

We’re going to extend that a little further. So far what we’ve been talking about is separating signal from noise. Is one geography different from another? Are either of them better or worse than the target? But the next thing that we’re going to look at is try to shift the discussion away from where are we today and start to think about where will we be in the future and how do we feel about that? Those are the gray diamonds that you see here. Now, not only do we know that geographies one and two are at the top, but actually geography one is trending towards continued improvement, whereas geography two looks like it may be faltering a little bit. Perhaps more differently, we look at these two hospitals here, our two geographies here, I’m sorry, that are tied with each other today, both doing worse than the target, but whereas geography three might be sliding back ever so slightly, geography four seems to be accelerating quite quickly.

How might the discussion go differently with that information on this plot? As I indicated before, we may now say, “Ooh, geography three. You don’t seem to be making progress towards the target. What’s getting in your way?” Geography four, “You seem to be accelerating really quickly. Do you think this is sustainable? Are you actually confident that you’re going to be able to improve at the level that we’re seeing or do you feel you may need some additional support or reinforcement?” It can totally change the nature of the discussion. Again, not because the computer is right and the know all and the be all and end of all of our decisions, but because it’s able to provide to us information that it’s able to assimilate quite well to do the math for us. Then that liberates us, in the case of the self-driving car, to text away if we want to once it becomes safe.

But in this case, it frees us up to have discussions of, “Okay, geography three, if you’re not really sure what to do next to reverse your performance, then who are you like? Who might we pair you with? Who might you reach out to, to learn what’s working for them and apply that in your circumstance?” That’s when our human intelligence tends to be much, much more valuable than calculating rates or doing forecasts or anything else.

I’ll leave out the description on the bottom for another time because that tends to throw people, but suffice it to say or take a little while to incorporate, but what we’re looking at here is an ability to understand the nature of the problem that the organization may be facing as a learning system and how it is that we might approach it. We’ll make one other comment and that last comment is, how does the organization feel about a small geography that not only is underperforming but is actually not showing signs of improvement and how do we want to think about that through the lens of our organizational values? That’s not a one-size fits all.

If we look at the bulk of the population of our members, they’re up here and maybe we should feel good about it. If as an organization we’re focused on equity, then we need to be deeply concerned here and think about possibly shifting resources to this geography or something else, finding some way to help them. Now, the thing that usually causes the gravest concern, especially among leaders and where we could really use your help going forward, is placing these gray diamonds in here. As soon as we place these grade diamonds, people immediately say, “That’s stupid. That’s wrong. I don’t believe it. Analyst, please go back to the basement and don’t bother me again.”

Two perspectives on that. One is, again, more for your reading later, but actually computers do quite well at forecasting and we have several studies that have been published over the years here demonstrating reasonably unambiguously that computers tend to do better than even expert human beings. However, it turns out that when we pair computers with humans, we can do even better. But just want to convey that actually computers can do quite well at forecasting and often better than humans. Please consider that.

From a leadership perspective, if you’re blessed to be in that role, please consider how it is that you can support the rest of us who are trying to support you by allowing us to take reasonable risks and see how it is that we can present better information to you to support your decisions. Again, forecasting tends to be the stumbling block and please consider whether it’s really as obviously terrible as you might think or if we can actually help. Okay.

Sarah Stokes:                You did have one question just about the type of graph you were showing before with the diamonds, if you knew the name of that type of graph, Simon was asking.

Jason Jones:                  Oh, wonderful question. Thank you, Simon. Thank you for typing that in immediately and being able to help us figure out where we were so we could address the question. Yes, so I always think about the lucky charms commercial, but this is not called the lucky charms chart. This is actually called a forest plot. The idea behind a forest plot if you’ve seen it in the sort of Peer Reviewed Medical Literature is it gives us especially in evidence synthesis is it gives us an understanding about how it is that we’re doing across subgroups or across different studies or things like that. The idea is the name suggests is that it allows us to see the forest as well as the trees. We’re not looking at just geography one. We can actually see the whole system and each tree as it were within the system. This is called a forest plot. Thank you for that question.

Okay. Now, we’re going to shift gears dramatically and talk about an unexpected application. We’re going to start with, what do we think of when we think of quality and health and health care. The Institute of medicine gave us a perspective on that a while ago. I happen to be partial to the AHRQ or Agency for Health Research and Quality page that describes the six domains or dimensions that the IOM did, and one easily digestible page that I can keep on my phone if I ever forget. The dimensions are safe, effective, patient or person centered, timely, efficient. At the bottom here, we have equitable. I like to think… Although I don’t know and maybe some of you on the audience do.

I like to think that equity was placed not at the bottom of the list because it was an afterthought, but because as human beings, one of the biases we have with our memory is that we tend to remember the last thing we heard. Maybe that was why equity was placed at the bottom. If anyone knows the answer by the way, I would love for you to let me know. So, what is equitable? Well, equitable is… the definition here in slightly larger font is that equality does not vary by personal characteristics such as gender, ethnicity, geography, and socioeconomic status. Of course, we can place other things in there like primary spoken language or other things.

There are a lot of dimensions where we would like to say to someone, “You don’t need to adapt. We should be able to adapt to you. We don’t want you to have to move to a different state or county. We don’t want you to have to change your age or your ethnicity. We should be able to adapt our delivery system to ensure the highest possible quality, taking into account how you present yourself.” We’re going to go onto another poll question here.

Sarah Stokes:                All right. Okay. In this poll question, we’d like to know how do you assess or address healthcare equity in your system? Your options are, one, we don’t, it’s not a priority at this time. Two, a health equity group that discusses opportunities. Option three is a scalable quantitative approach to finding opportunity. Option four is formal equity evaluation or optimization embedded in major initiatives. Five is other. We’ll give you just a few moments here.

Again, we are recording today’s session. You will have access to the recording and slides starting tomorrow. All right. It looks like the votes are starting to slow down. I’m going to go ahead and close that poll and share our results. So kind of across the board here, no major, I guess 27% is our majority here with a health equity group that discusses opportunities, but it’s not a clear winner, I’d say. What would you have expected to find here?

Jason Jones:                  I am really impressed by the 24% of equity evaluation and optimization is embedded in major initiatives. This is the first time we’ve seen something like that. Kudos to you who have shifted up that side of the curve already. What I’m seeing in this chart is what we call a bi-modal distribution. We have people who are not really focusing on equity as a top priority or they talk about it but don’t really have a scalable approach or have it embedded into their major initiatives. Then we have another group of people who have taken it all the way to the other side.

By the way, for those of you who have ever done work in medication adherence, you’ll often find this to be the case. We’ll often say, “Oh well, patients are adherent to their medication 50% of the time.” But if you take a closer look, often what we see, especially for asymptomatic chronic conditions is that we’ll see there’s a group of super adheres and a group of non-adheres and really nobody in the middle. It’s a little bit like having two and a half kids. None of us really has two and a half kids. Here, we’re starting to see, we have sort of super adheres or super equity folks, and then we have people on the other end of the spectrum.

I hope this discussion addresses both of those ends. Bell, I saw a question from you about cultural competency and I’m hoping that some of what we’re going to talk about here feels relevant to you. So, thank you for that question.

Sarah Stokes:                We did have a question about how many respondents we had to the poll just now. I think we had a ballpark around 60.

Jason Jones:                  Around 60?

Sarah Stokes:                Yep. Erin, that’s your answer.

Jason Jones:                  Thanks. I did notice, I think that was our lowest response rate yet, and I don’t know how much of that was because people just weren’t sure. That’s commonly in the place. Okay. I’ll just point out, at least for me personally, Sarah mentioned that I came to Health Catalyst from Kaiser Permanente and perhaps all of you know, but if you don’t Bernard Tyson, the CEO of Kaiser Hospital Foundation Systems, actually passed away not long ago, very recently, a week ago, a little over a week ago now. He was one of my heroes in this space along with many of the people that he brought on board. It’s somehow timely for me that we’re talking about this in a way, hopefully, that he might be proud himself.

Okay. Where are we starting from with AI? I think it might be fair, especially recently that there’s a deep concern that our AI will actually exacerbate disparities. On the left hand side, here, we have something that was published on Ron Wyden’s site talking about how he wanted to make sure that going forward, people did not experience what his parents experienced trying to buy a house in the 1960s where his family was discriminated against. Then more recently, we’ve had a publication and no less than science that talks about racial bias in algorithms, sort of sadly and ironically specifically designed to help us focus on our most vulnerable members was actually introducing bias into the process. Both of these are worth checking out. The links are at the top. There are many others, but I think the place that we’re starting from is there is grave concern that AI will make the world worse.

Where I hope we get to together in this discussion is that, that could absolutely be the case. We have specific examples we can point to. But just like self-driving cars being made safer and just as if we may want them to respect our preferences and goals and what we count of as good, we have an opportunity if we think about AI as augmented intelligence and think about what it is that we would like the computer to do for us? Maybe we can think about how we design it to actually help us improve equity and specifically how it can help us at the leadership level in organizations.

For anyone who noticed, you might’ve seen that the people who won the Nobel Prize in economics this year did so for applying randomized controlled trials to the field of economics and specifically public good projects. You might say to yourself, “How could somebody almost in 2020, win a Nobel prize for something that we have used in health and healthcare for at least 70 years?” It’d be right to question that, but I’m still glad that they did their work. Worth checking out their work if you haven’t.

Now, what I’m going to do though is take actually something out of economics and the journey that we’ll go on is, how can we take an idea that came out of economics and apply it to help us think about health and healthcare equity? Sarah, when you see this picture, and if I were to tell you that this was a heat map of where it was that we had achieved financial equity in the world and where it was that we were struggling, what things pop out to you where we see that we’re doing well or not well, and is there anything that surprises you in this map?

Sarah Stokes:                I mean, I’m assuming green is good, typically.

Jason Jones:                  Green is good, yes.

Sarah Stokes:                Red is bad, so it looks like the Southern areas of Africa are hurting the worst in this area. But it is interesting that the US is in a pink shade. I think I expect that we’d be in green, Canada’s lit up green, and interesting that some areas of Eastern Europe are lighting up really green too. That’s surprising to me.

Jason Jones:                  Isn’t that? Thanks for that. What we’re looking at that’s behind the colors and the deep apologies for those of you who are red, green color blind, I wanted to stay true to the original map and do realize that we’re biased against people with red, green color blindness. But what we’re looking at here is a map of something called the Gini Index. The Gini index is named after a person that’s about a hundred years old now. So maybe one of you can win Nobel Prize for applying a hundred year old technique from economics to health and health care. There’s still the opportunity, I think.

But what the Gini Index does is it says how are we distributing wealth within a country? Sarah, your interpretation was spot on. This isn’t saying that people in Southern Africa make less money than people in Canada, for that matter. People in the United States make less money than people in Canada. What it’s saying is within the United States, within Canada, within Sweden, within South Africa, how equally is our financial resources distributed? Just to call out a couple of specific ones there, we have South Africa, which does in fact at the time of this report generation have the deepest inequity at a Gini Index of 0.63.

United States, you were right to call out as a nod in the green zone is at 0.41. Then we start to see some of the ones that we would expect like the United Kingdom and especially Denmark. We tend to think of the Scandinavian countries as being quite equitable. I was once yelled at by someone who actually lived in the Ukraine saying that this is insane, that the Ukraine could be the best in the world, but at least by this measure it was. I should probably take Ukraine off and leave it at Denmark. But these are some specific numbers that we’re looking at across the globe.

Now, here’s the leap that we’re going to make. It turns out that the Gini Index is mathematically and algebraically linked to something that if you’ve ever built a predictive model, you know and love, and that’s the area under the receiver operating characteristic curve. That is a measure of how well can we separate, for instance, between two people who are leaving our hospital today with heart failure, which one is going to be readmitted? If we correctly identify the person who gets readmitted, then we get a bonus point. If we say the person who was going to be re-admitted doesn’t get readmitted, then we lose.

We summarize that as what’s called the area under the receiver operating characteristic curve. But it turns out that that AUROC or C statistic or C index, as you might’ve heard published about or used is linked directly to the Gini. What that allows us to do then, is ask the following question, should we restate what counts as equitable in the context of health and healthcare quality to the predictive model that we wished we couldn’t build? If we’re trying to identify who’s going to be readmitted to the hospital, we hope we can identify things like the total number of medications they’re on or the recency of their illness or their access to care because these are all things that we can act upon and do something about.

We hope we’re not able to predict their readmission on the basis of things like gender, ethnicity, or geography or socioeconomic status. What we’re doing here is rather than looking at a list of countries like the United States, and Canada, and Ukraine, we are looking at a list of possible measures that a healthcare organization can have. There are a lot on the list. I’m just going to highlight a couple. Hemoglobin A1c less than eight is probably something you’re used to. Hypertension control, colorectal cancer screening. Hopefully, these are measures that at least some of you recognize. What we’ve done is we’ve built a predictive model using features that we hope that we could not use, like ethnicity, like primary spoken language, and so on, and then converted that to a Gini, which I said we can do through simple algebra.

As we look at these measures, for this particular organization, that can actually feel somewhat good. It turns out that the worst measure they have has a Gini Index that’s actually better than where the United States, Gini Index is for financial and economic equity, right in the range of what the United Kingdom has been able to achieve. Some of the least equitable measures still are in the general range of Denmark. We find this fairly repeatedly in different organizations that we’ve had the pleasure of working with around equity. We tend to find that actually we’re doing okay.

Now, there are some possible challenges. Maybe we’re not collecting the data correctly and the like. But the important thing is to be able to make that leap for ourselves of thinking about equity as the predictive model that we wished that we couldn’t build. Because as soon as we do that, we have the entire toolbox of AI, and machine learning, and predictive analytics open up to us so that we can turn that toolbox into a tool for good. Not only is the AI not making disparities worse, we can now leverage that same toolbox to help us understand where might an organization focus.

I’ve highlighted three here because these are ones that are commonly focus areas for equity for organizations, but often on the basis of not really being able to leverage data or the tools that we have in the space. Once we’ve ranked things like this, we can actually take it a step further. If any of you are familiar with the concept of feature importance, what we’ll do next is layer on top, not just how would we rank our ability to deliver equitable care per this measure, but what’s actually contributing to the inequity?

For instance, three features we could consider the geography, the race, and the age. If we look only at those and you look for the longest red bars, then you can see this measure number 844 has the greatest geographic inequity. If we look at age, we see an inequity around measure 435. These are a little bit silly, but it’s just a way… They are real measures. But it’s just a way where hopefully you can see quickly. Not just where do we have the greatest inequity overall, but what is driving or what’s associated with the inequity contribution?

I’ll zoom in a little because the focus for a particular organization that we have the pleasure of working with was actually around hemoglobin A1c control. Specifically, if you’re familiar with this measure, you can be less than eight, less than nine, so on and so forth. They had selected hemoglobin A1c less than eight as a focus area. Why? Because they had hoped to improve racial disparities. What the data opened up is, yeah, there is opportunity to improve equity within hemoglobin A1c less than eight. However, it was principally being driven by age. Again, this is a value judgment on the basis of the organization.

If their focus was really on racial inequity, then maybe they should look somewhere else and maybe the answer is, “There are no opportunities for racial inequity,” in which case they should feel great or maybe they’re saying, “No, until this gets to zero, we don’t feel satisfied and we’re going to keep moving ahead.” Again, that’s a value judgment, just like whether you want to go to a restaurant and if so, which one?

Let’s take one step further then. It turns out that the measure immediately below hemoglobin A1c control actually had the greatest racial inequity across all measures that this organization had and that measure had to do with antidepressants. There’s several measures around any depressants. This one in particular was the continuation phase of antidepressants. The machine, the AI, has not only been able to identify for us where do we have inequities generally, but given our preferences, given our values, given where we want to focus as an organization. If our focus is on race disparities, then perhaps we should look at antidepressant continuation. What’s really cool is that for this organization, it turned out that they had another focus area. One focus area was improving health and healthcare equity. Another area that we had as a focus was, let’s improve mental health and wellness.

Now, the computers actually helped us find the perfect interrelationship that we would always hope to be able to find within an organization and especially as leaders or for that matter, anyone else in the organization that is struggling with initiative overload. Here we’ve actually been able to combine focus areas for two major initiatives, two major aspirations of the organization. We see both that if we want to improve racial driven disparities in care, we can focus on antidepressants and not just any antidepressants but specifically the continuation phase. If we want to focus on mental health and wellness, we now have a guiding principle to tell us where, should we focus on inpatient care, outpatient care? Should we focus on schizophrenia? Should we focus on early diagnosis? There’s so many ways that we could focus.

Thankfully, in this case, we had the perfect collision or synergy, however you want to think about it, of our aims actually coming together and doing so. Not because the AI told us what to do, but because the computer’s really good at crunching through the hundreds of different measures, the hundreds of different dimensions that we might be able to consider and bringing to the surface and visualizing and making a transparent to us all what it’s seeing, not so that we defer judgment, but actually so that our judgment can become even more important.

Our value system becomes even more important because we’re able to have those discussions transparently and leveraging the data assets that we have tried so hard to get our members, and our clinicians, and everybody else to donate for our benefit. So, just to review some of the topics that we’ve talked about and then if you have any questions, try to address some of them. We know AI gets a boatload of attention at the point of care and service. It’s super sexy, everyone’s excited. I hope that by the end of this, you can see some opportunities for how it is that we may be able to leverage those same tools in the space of leadership? Where as an organization should we focus? How should we allocate resources? What’s reasonable when we set up goals for accountability and the like? Because it turns out that the many of the tools that we know and love at the point of carrying service actually apply just as well at the leadership level, although admittedly from a very different lens.

If you’re in the leadership position, what you can do is to help support attempts to leverage the data and the kinds of ways that we’ve been talking about and support your analytic and technical staff, trying to figure out how we close the very real gaps today. This is not easy. We can easily represent a readmission risk within an electronic health record. We can easily represent the willingness or the predicted probability of a member being willing to quit smoking in a care management or population health tool. It’s difficult for us today to be able to represent some of the ideas and the techniques that we’ve been talking about within our standard business intelligence tools. But we’re not going to get there until leaders get behind the effort and start pushing us and the industry in that direction.

With that, we’re done. But Sarah, you have at least one other question.

Sarah Stokes:                I do. I’ll go ahead and launch that right now. Okay. This is our final poll question before we dive into the Q&A time. While today’s webinar was focused on the important role that AI can play in leadership decision making, some of you may want to learn about the work that Health Catalyst is doing in this space, or maybe you’d like to learn about our other products and professional services.

If you would like to learn more, please answer this poll question. I’m going to go ahead and leave that open as we dive into these questions. We have about five minutes right now. Were you able to stay on for five minutes extra if we need to?

Jason Jones:                  Sure if there are questions.

Sarah Stokes:                Okay. So we’re going to-

Jason Jones:                  Can we actually go to that last one that you were just on, the A1c?

Sarah Stokes:                Last one from Paul.

Jason Jones:                  Yeah, go ahead.

Sarah Stokes:                I was just saying it was from Paul. Paul’s question was for A1c. Is the age disparity a contributor for the younger or older population?

Jason Jones:                  Paul, that is such a terrific question. So many people that I have spoken with assume, and this is true for hypertension as well, that the challenge is in the older populations, for instance, in A1c and hypertension. What we’ve seen routinely at different organizations is it’s actually the older population that achieves the highest levels of controls there and they’re all kinds of hypotheses. Why? Other things that we’ve seen, although you didn’t ask this directly, is that we’ve seen that it’s actually different across different race and ethnic groups. We want to look at not just how is age contributing, it turns out, but what is it that, for instance, we may be able to see in the attainment of hemoglobin A1c control in younger Hispanic populations at some organizations and apply that, for instance, in whites. It’s a wonderful question and really great to do a deeper dive.

Sarah Stokes:                All right, maybe we’ll move on to this next question that just came in from Christina because it seems to also be about the HbA1c. Christina asks on the slide and discussion about the HbA1c and anti D. This appears that you’re correlating HbA1c people with those that have anti D. I even saying that properly?

Jason Jones:                  Yeah. Antidepressants.

Sarah Stokes:                Okay. But isn’t it possible the populations do not overlap directly? So, how can you deduce that this elevates a single population? In other words, a person A may have a high HbA1c but not be depressed and require antidepressant medicine, wouldn’t one have to do a second level of analysis to form that deduction?

Jason Jones:                  Yeah.

Sarah Stokes:                She said, “Perhaps I missed a step in the analysis.”

Jason Jones:                  Yeah. You know what? Christina, that is a terrific question because what it highlights is that I was not at all clear in my description, so thank you. When we’re looking at the Gini Index here, we’re actually not assuming any overlap, although we can look at overlap across the populations, just like we weren’t assuming overlap between the countries of South Africa and the Ukraine or Canada. The nice thing is that we’re able to look at each of these populations and each of these measures, either independently or in conjunction.

Let me give you a specific example of why that matters so much. At one organization that we looked at their greatest opportunity was actually in long-term high dose morphine equivalent or opiate utilization. It was driven principally by agent to the hemoglobin A1c question earlier. It was actually the older people who were struggling more with long-term higher dose opiates, but the organization wasn’t quite sure what to do with that, not because they didn’t have any hypotheses, but because the population was so small. They actually decided to focus their efforts disproportionately on larger populations like hypertension at least in the short term because they saw there also opportunity and the ability to test what might work. Whereas in the opiate population that was so small that they weren’t sure how to design tests to achieve improvements. Hopefully, that helps. If I’ve not answered your question, please just put back in your question and I’ll try to address it in a better way. Thank you.

Sarah Stokes:                Okay. I’m going to circle back to this question from Bell to see if we fully addressed it. She had asked, does or can this AI take cultural competency into account when making recommendations to overcome any racial bias introduced by zip codes or other metrics like them?

Jason Jones:                  Bell, if you’re still on, I would love to know if you feel that we’ve started to address this question, but I think we’ve not actually. We’ve seen how AI may be able to be used for the benefit of achieving equity. But what we’ve not directly addressed is how it is within the context of, for instance, achieving hemoglobin A1c controller reducing readmission or something like that where exacerbating bias or disparities. The science article that I had brought up earlier has a nice description about how it is that we can think about assessing for bias. Then what we can do is to start to design not just our predictive models but our interventions in such a way that we test for how it is that we can optimize for equity.

What might be a specific example of that? One that I’ve worked on is trying to consider when is the best time to do population health outreach and what types of languages should we take into account for that? We can design a predictive model that helps us understand which of our members or patients might be amenable to the intervention, but then explicitly look at that through the lens of race or primary spoken language or geography or something else and actually implement that as part of our optimization plan. That’s a wonderful way to be able, I hope, to more directly address the concern, the very real concern, that you raised. If that helps, please let me know. If I missed your point, please let me know that too.

Sarah Stokes:                Okay. I think we’re just going to do one or two more as we are past the top of the hour now. Dan had asked earlier, does the Gini or Gini-

Jason Jones:                  Gini.

Sarah Stokes:                … Gini Index measure wealth inequality or income inequality? He says he could see that being an important distinction.

Jason Jones:                  Yeah, it is an important distinction and it’s been used both ways. I’ll just point out, there’s a… I didn’t put it here, but there’s a wonderful literature in the education space around how they have used the Gini Index to some effect as well within the space of education. Just to be clear, Gini is a general mathematical concept that that essentially has to do with, how equally is something dispersed in a population. It can be applied in many places. It’s most commonly applied in the econometrics space, but we’ve seen great work done in education. I’m hoping we’ll see more in health and healthcare as well. Thank you for that question, Dan.

Sarah Stokes:                Okay. I think we’re going to call this our last question from Simon who said, “Some researchers use the P percent rule to account for fairness. Could you comment on the validity of that metric?”

Jason Jones:                  Oh, I would need a little more clarification on this one. Simon, if you’re still on, maybe you can say a little bit more or we’re also happy to field questions via email and then post those. So, happy to do that. But that may be the only way that I can address the question. I’m sorry.

Sarah Stokes:                Okay. I think we’re going to go ahead and wrap up on that then. Is that all right?

Jason Jones:                  Yeah. Thank you.