Seven Ways DOS™ Simplifies the Complexities of Healthcare IT


Healthcare leaders need to motivate change and constantly advocate for innovation, particularly in the IT ecosystem. I personally approach this from a position of humility, fully understanding that I have not mastered what I am advocating. I’m not satisfied as a patient or as a citizen with the current trajectory of digital health at the macro level. Nor is Health Catalyst satisfied with where it is on this trajectory. We are far from perfect, with plenty of flaws, which is why we are intentionally disrupting ourselves with the idea of the Health Catalyst® Data Operating System (DOS). I will offer some criticisms about healthcare IT, with the understanding that it is from the position of wanting to do better, while being personally and organizationally accountable.

The idea of DOS is equally important for C-Suite executives as it is for IT-domain leaders. Software runs everything today, and executives need to understand these technical topics because the most expensive capital purchase won’t be a hospital, but an EHR. Recent ransomware attacks on health systems are a case in point. If leadership can’t own up to the notion that software now runs the company, for better or worse, it is behind the times. Healthcare CEOs who will thrive going forward will understand their software technology and data. They will be the leaders who rise to the top in the next generation of U.S. healthcare.

The Components of DOS

What is the Health Catalyst DOS and why is it different from traditional data warehousing? Is it real or just a buzz phrase? Can it be implemented? If so, what are the implementation options? And why does healthcare need one now more than ever?

As Figure 1 illustrates, DOS combines real-time, granular data and domain-specific (e.g., healthcare), reusable analytic and computational logic about that data, into a single computing ecosystem for developing applications. DOS can support the real-time processing and movement of data from point to point, as well as batch-oriented loading and computational analytic processing on that data. This amounts to the merging of a data warehouse and an HIE.

Figure 1: The Health Catalyst Data Operating System

This DOS involves three layers, each supported by multiple components:

Layer #1: Data Platform

Catalyst Analytics Platform – Batch-oriented, less-than-real-time analytics calculations and computational services.

Core Data Services – Pattern recognition, Natural Language Processing (NLP) governance tools, metadata repositories, and data quality tools across subject areas.

Real-time Data Services – Real-time data streaming and processing, a reference Lambda Architecture (the ability to process real-time and batch-oriented data for analytics in the same ecosystem), and HL7. An emerging improvement on Lambda is Kappa Architecture, the combination of real-time and analytics processing, and batch-oriented processing in the same environment.

Layer #2: Fabric and Machine Learning

The fabric is the layer of clinical and business logic and services laid on top of the granular data and services. This layer includes open application program interfaces (APIs), with an emphasis on developing FHIR-based services, where possible. Where it’s not published or possible, Health Catalyst will extend FHIR or pursue other means, but FHIR will be the default for services in the fabric layer.

Also, a native part of the fabric is the notion of machine learning, which is embedded as part of virtually everything Health Catalyst does.

Layer #3: Applications

The top layer of DOS consists of applications built by Health Catalyst, hospital and clinic IT development teams, and third parties.

These layers comprise the high-level DOS architecture, but a key point is that Health Catalyst is developing the fabric to lay over a variety of topologies and data platforms, including platforms other than what Health Catalyst produces. The fabric will be compatible with, for example, IBM, Oracle, Epic, Cerner, and homegrown data warehouses. A Health Catalyst granular data platform is only one option and an important part of the future.

There are three big differences between DOS and a traditional data warehouse:

  1. Real-time transaction data and analytic computations in a single ecosystem that supports everything.
  2. A fabric of microservices and data bindings that can lay on top of any data system, not just the Health Catalyst platform.
  3. System design that uses open APIs, making the data platform and fabric services available to third-party application developers and builds for future extensibility.

Seven Attributes of DOS

The healthcare DOS is defined by seven attributes:

  1. Reusable clinical and business logic – Registries, value sets, and other data logic lays on top of the raw data to be accessed, reused, and updated through open APIs in the healthcare IT environment, specifically enabling third-party application development against it.
  2. Streaming data – Near- or real-time data streaming from the source all the way through to the expression of that data through DOS, that can support transaction-level exchange of data or analytics processing.
  3. Integrated structured and unstructured (text) data in the same environment – This will eventually incorporate images.
  4. Closed loop capability – Methods for expressing knowledge in DOS, including the ability to deliver that knowledge at the point of decision making (e.g., back into the workflow of source systems, such as an EHR.)
  5. Microservices architecture – The ability to update constantly, with continuous development and release. This eliminates the painful upgrades to which healthcare has become accustomed and desensitized. In addition to abstracted data logic, open microservices APIs exist for DOS operations, such as authorization, identity management, data pipeline management, and DevOps telemetry. These microservices also enable third parties to develop applications on DOS without having to recreate them.
  6. Machine learning – DOS natively runs machine learning models and enables rapid development and utilization of those models, embedded in all applications. This is a primary strength of the big data Hadoop ecosystem that came out of Silicon Valley. It is natively designed to support machine learning and computational analytics that traditional relational databases cannot.
  7. Agnostic data lake – Some or all of DOS can be deployed over the top of any healthcare data lake. The reusable forms of logic must support different computation engines (e.g., SQL, Spark SQL, SQL on Hadoop).

These are the required attributes for becoming a healthcare DOS and meeting the future needs of data in this industry.

Why DOS Is Important in Healthcare Right Now

A convergence of things happening around the country (Figure 2) is driving the business need for DOS.

Figure 2: What’s driving the need for DOS in healthcare

New big data technology has emerged subsequent to open source collaboration in Silicon Valley. We are fortunate to be here at this point in history to take advantage of what Facebook, Google, Amazon, Twitter, and others have developed for the public to consume free of charge. The value we can derive from these services is significant.

Healthcare expenses continue to rise and are anticipated to hit 20 percent of GDP by 2025. This is cannibalizing the U.S. economy, and if healthcare cannot change the trajectory through digitization, it spells trouble. We are eating up the future of the country in healthcare expenses.

Physicians are burned out in large part because of the technology they are now using. It’s taking time away from clinical care and human decision making. More than 50 percent of their time is spent in front of a computer instead of a patient. This has to change. Personal health records (PHRs) have been unsuccessful on several fronts, interoperability being one. It’s also in no one’s economic interest to surrender data to a PHR that’s transportable from one facility to another. Until PHRs are successful, patients can never really be at the center of care.

FHIR is emerging and we should be very optimistic about this. There have been concerns with HL7 and its message-oriented architecture in the past, but credit is due to the rebels within HL7 who started FHIR. It’s a very well-founded framework.

HIEs have largely been unproductive. When arguing from the position of data or logic, HIEs have been unsuccessful on many levels. Economic models and technical usability of the data within the EHR have not worked, and it’s time to do something different.

The good news is that EHRs have been deployed like never before, so the digitization process is starting. Data is available now that wasn’t in the past. The problem is that those EHRs are still not well received. 54 percent of doctors surveyed in 2016 said the EHR has detracted from efficiency, and only 25 percent said it has improved efficiency. There is a lot of opportunity for working with doctors to improve EHRs.

Content Is King, the Network Is Kong

When looking at modern businesses, data content is becoming the driving force behind business strategy and value. Companies like GE, Tesla, Google, Facebook, Amazon, United Healthcare, and Optum, all understand the value of data content and are pursuing it. But the network around that data is as important as the data content itself.

Consider Metcalfe’s Law—the value of a telecommunications network is proportional to the square of the number of connected users of the system—to understand the value of the community around data, versus the hub and spoke model. Sticky relationships occur with great data content and a network of people around that content. The reason Google Plus never took off (and yet Facebook is still accelerating) is the combined content and network of people who make that sticky relationship with Facebook difficult or impossible to transport to Google Plus. The executive of the future must understand the need for data content and a network of people—patients, healthcare providers, physicians, and researchers—around that data. This will create the sticky relationships successful businesses need going forward.

The Healthcare Digitization Index

The McKinsey Global Institute produces a Healthcare Digitization Index (Figure 3) that is a product of data assets, data usage, and skilled labor. Essentially, this translates to what kind of data an industry has, how the industry is using it, and whether the industry has the skilled labor to take advantage of that data. Healthcare is one of the least digitized sectors among large U.S. industries. The graph in Figure 3 plots this low digitization score against the y-axis of three-year changes in post-tax profit margin to show that healthcare is extremely anemic.

Figure 3: The Healthcare Digitization Index

There’s a strong correlation between digitization index and post-tax profit margin. As margins get tighter to manage, executives need to understand the importance of digitization to retain whatever competitive edge they might still have.

C-Level Advice for a Digital Healthcare Future

Population health, value-based care, and precision medicine are data centric, so executives need a strategic data acquisition strategy that goes beyond bricks and mortar. It’s imperative to think about the data needed for managing population health, risk contracting, and precision medicine, and how it will be acquired.

Healthcare organizations need a chief analytics or chief data officer. This is critically important. And will this be the CIO or a new position? Regardless, someone must be appointed to fill the role to manage, and be the executive cheerleader for, this critical asset.

Physicians and nurses are over measured and undervalued, and this is in large part because they are controlled by data entry and poor software. C-Suites should push all vendors to follow modern, open software APIs, including, but not limited to, FHIR. This cannot be relegated to others, which would minimize its importance in the organization. C-Suites need to be aware of the impact software has on the business and capabilities of these open APIs. DOS concept is necessary and can be created by leveraging and expanding the capability of the enterprise data warehouse, if one exists.

How DOS Addresses Healthcare System Needs

DOS addresses seven substantial U.S. healthcare system needs:

#1. Complexities of Healthcare Data Management & Acquisition

The first need starts with a shark tank story from a business perspective. I was in the audience of healthcare IT startups pitching great software applications and creative ideas about healthcare. As brilliant as they were, none offered a solution for the underlying healthcare data they needed. All had decent demo data, but no answer for the massive acquisition of data and the scalability of that acquisition across an entire industry. Nor did they have an answer for both clinical and business logic that resided on top of that data. Startups like this, with great ideas and applications, need data. They cannot possibly afford to build the data infrastructure and skills Health Catalyst offers. Nor can the industry afford it. It is not scalable.

Computer science has greatly expanded modern programming languages at the top of its ecosystem (Figure 4). There are many things we can now build quickly with different libraries and awesome programming environments. This goes beyond the languages, to the DevOps tools that support the languages, giving us the ability to manage and measure applications once they’re in the field.

Figure 4: The data content layer needs to be updated in the computer science ecosystem

Modern databases and modern movement of technology exists thanks to Hadoop and a big data Apache ecosystem. This all sits on top of modern operating systems, like iOS, Android, Windows, and Linux.

But application development still needs a solution for the middle layer, the raw data content that’s bound and organized according to the domain it needs to support. Great programmers take advantage of great languages (at the top of the ecosystem) and technology (at the bottom), but it’s still painful for them to recreate the data content and organized logic around that content that exists in healthcare. This would require building a dozen or more Health Catalyst-type organizations in the industry. It’s unscalable.

#2. Integrating Data in Mergers and Acquisitions

A new company isn’t integrated until the data is integrated. Executives jump into mergers and acquisitions and, within a few months, realize they can’t pull together basic financial reports about the new company, much less complicated clinical quality measures that put their reimbursement at risk. HIEs are not sufficient for this kind of data integration. They barely support rudimentary clinical integrations, much less balancing a new company’s general ledger. Ripping and replacing EHRs with a single common vendor is not an affordable strategy for interoperability. Besides, hybrid vigor is a good thing in this context. It’s not a good idea, long term, to put all the organization’s digital and data eggs in one vendor’s platform.

Rip and replace is not an answer for mergers and acquisitions (M&A). Keep the existing, disparate source systems, like finance, supply chain, registration, scheduling, A/R, and EHRs. These are just a few of many source systems to deal with in an M&A. But they can all be virtually integrated with DOS, and transaction-level data can be shared the same as with an HIE. DOS can integrate data for common metrics around finance, clinical quality, and utilization, without replacing those source systems.

#3. Enabling a Personal Health Record

Healthcare needs to finally enable a PHR. A patient could have multiple records depending on how many places she has lived and her various care environments. It’s up to her to figure out how to consolidate all that data into a concise PHR that she can move around and share as she feels appropriate. This is not putting the patient at the center of healthcare.

With DOS, a fabric can lay over all these disparate systems and pull them together. Healthcare should think about pulling these systems into a single, effective PHR, Microsoft’s HealthVault. Regardless, we can provide a better PHR than current offerings. And patients are not willing to manually enter and consolidate all their personal health data into one repository. They don’t have time. That content has to be seeded by the data that exists in the different facilities and treatment areas.

#4. Scaling Existing, Homegrown Data Warehouses

Homegrown data warehouses are easy to start and build, but expensive to evolve and maintain. There are a lot of them in healthcare and there is no easy way to retire them. Ripping and replacing with another vendor solution isn’t an option, as mentioned earlier.

This is what motivated Health Catalyst to develop DOS. We can lay Health Catalyst (and other) applications over DOS fabric, which can then be laid over the top of homegrown data warehouses. We should expand this market because the value to the industry isn’t necessarily in the aggregation of granular data (this is quickly becoming a commodity). The value is in the logic that resides on top of the fabric and applications, and the models that reside on top of the granular data.

#5. The Human Health Data Ecosystem

When I was working on the Alberta Health Services population health initiative, we concluded that only eight percent of the data needed for precision medicine and population health resides in today’s EHRs. Even less data is available for healthy patients. Figure 5 shows the datasets that come from other sources. The ability to have a scalable platform for ingesting data and a scalable fabric on top of that is only going to get more challenging if this is not addressed. We will never achieve precision medicine and population health without something like DOS.

Figure 5: Only eight percent of data needed for precision medicine and population health resides in EHRs

Ingesting healthcare data is a commodity

Ingesting healthcare data into a data lake or data warehouse is now essentially a commodity, thanks to open source technology and a late-binding, schema-on-read approach to data models. It’s fast and cheap to ingest data, but understanding the data content, data models, and vastly complicated nuances of healthcare data will not be commoditized in our lifetime. The analytic logic or data bindings that apply to that data, how to organize it into meaningful chunks, and the technology and skills to deliver to the right person at the right time, in the right modality, all contribute to this complexity.

The mundane process of keeping up with changes in the source system data—change data capture—is enormously complicated. Data quality management and scaling all this for a single healthcare system is not going to become a commodity—ingesting data is.

Data content and sources

The volume of data content and sources in the Health Catalyst library illustrates how impossible it would be for the industry to scale them if left up to individuals to do on their own. Health Catalyst has a long list of different data source systems, which is just the beginning of the healthcare data ecosystem. With every data source, we understand data models, and how to access, bind, and use the data across the continuum of care.

EMR data sources
  1. Affinity – ADT/Registration
  2. Allscripts – Ambulatory EMR Clinicals
  3. Allscripts Enterprise/Touchworks – Ambulatory EMR
  4. Allscripts Sunrise – Acute EMR Clinicals
  5. Aprima ERM
  6. Cerner – Acute EMR Clinicals
  7. Cerner – PowerWorks Ambulatory EMR
  8. Cerner HomeWorks – Other
  9. CPSI – Acute EMR Clinicals
  10. eClinicalWorks – Ambulatory EMR Clinicals
  11. Epic – Acute EMR Clinicals
  12. Epic – Ambulatory EMR Clinicals
  13. GE (IDX) Centricity – Ambulatory EMR Clinicals
  14. McKesson Horizon – Acute EMR Clinicals
  15. McKesson Horizon Enterprise Visibility
  16. Meditech 5.66 EHR w/DR
  17. NextGen – Ambulatory Practice Management
  18. Quality Systems (Next Gen) – Ambulatory EMR Clinicals
  19. Siemens Sorian Clinicals – Inpatient EMR
HR/ERP data sources
  1. API Healthcare – Time and Attendance
  2. iCIMS
  3. Kronos – HR
  4. Kronos – Time and Attendance
  5. Lawson – HR
  6. Lawson – Payroll
  7. Lawson – Time and Attendance
  8. Maestro
  9. MD People
  10. Now Solutions Empath – HR
  11. Oracle (PeopleSoft) – HR
  12. PeopleStrategy/Genesys – HR
  13. PeopleStrategy/Genesys – Payroll
  14. Ultimate Software Ultipro – HR
  15. WorkDay
Claims data sources
  1. 835 – Denials
  2. Adirondack ACO Medicare
  3. Aetna – Claims
  4. Anthem – Claims
  5. Aon Hewitt – Claims
  6. BCBS Illinois
  7. BCBS Vermont
  8. Children’s Community Health Plan (CCHP) – Payer
  9. Cigna – Claims
  10. CIT Custom – Claims
  11. Cone Health Employee Plan (United Medicare) – Claims
  12. Discharge Abstract Data (DAD)
  13. Hawaii Medical Service Association (HMSA) – Claims
  14. HealthNet – Claims
  15. Healthscope
  16. Humana (PPO) – Claims
  17. Humana MA – Claims
  18. Kentucky Hospital Association (KHA) – Claims
  19. Medicaid – Claims
  20. Medicaid – Claims – CCO
  21. Merit Cigna – Claims
  22. Merit SelectHealth – Claims
  23. MSSP (CMS) – Claims
  24. NextGen (CMS) – Claims
  25. Ohio Hospital Association (OHA) – Claims
  26. ProHealth – Claims
  27. PWHP Custom – Claims
  28. QXNT – Claims
  29. UMR Claims Source
  30. Wisconsin Health Information Organization (WHIO) – Claims
Clinical specialty data sources
  1. Allscripts – Case Management
  2. Apollo – Lumed X Surgical System
  3. Aspire – Cardiovascular Registry
  4. Carestream – Other
  5. Cerner – Laboratory
  6. eClinicalWorks – Mountain Kidney Data Extracts
  7. GE (IDX) Centricity Muse – Cardiology
  8. HST Pathways – Other
  9. ImageTrend
  10. ImmTrac
  11. Lancet Trauma Registry
  12. MacLab (CathLab)
  13. MIDAS – Infection Surveillance
  14. MIDAS – Other
  15. MIDAS – Risk Management
  16. Navitus – Pharmacy
  17. NHSN
  18. NSQIPFlatFile
  19. OBIX – Perinatal
  20. OnCore CTMS
  21. Orchard Software Harvest – Pathology
  22. PACSHealth – Radiology
  23. Pharmacy Benefits Manager
  24. PICIS (OPTUM) Perioperative Suite
  25. Provation
  26. Quadramed Patient Acuity Classification System – Other
  27. QXNT/Vital – Member
  28. RLSolutions
  29. SafeTrace
  30. Siemens RIS – Radiology
  31. SIS Surgical Services
  32. StatusScope – Clinical Decisions
  33. Sunquest – Laboratory
  34. Sunrise Clinical Manager
  35. Surgical Information System
  36. TheraDoc
  37. TransChart – Other
  38. Varian Aria – Oncology
  39. Vigilanz – Infection Control
HIE data sources
  1. Adirondack ACO Clinical Data from HIXNY (HIE)
  2. ADT HIE Patient Programs
  3. Vermont HIE
Patient satisfaction data sources
  1. Fazzi – Patient Satisfaction
  2. HealthStream – Patient Satisfaction
  3. NRC Picker – Patient Satisfaction
  4. PRC – Patient Satisfaction
  5. Press Ganey – Patient Satisfaction
  6. Sullivan Luallin – Patient Satisfaction
Master reference and terminology data content
  1. AHRQ Clinical Classification Software (CCS)
  2. Charlson Deyo and Elixhauser Comorbidity
  3. Clinical Improvement Grouper (Care Process Hierarchy)
  4. CMS Hierarchical Condition Category
  5. CMS Place Of Service
  6. LOINC
  7. National Drug Codes (NDC)
  8. NPI Registry
  9. Provider Taxonomy
  10. Rx Norm
  11. CMS/NQF Value Set Authority Center
Other healthcare data sources
  1. 2010 US Census Detail for State of Colorado
  2. Affiliate Provider Database
  3. All Payer All Claims (certain States) —In process UT, CO, MA
  4. Alliance Decision Support
  5. Allscripts – Ambulatory Practice Management
  6. Allscripts – Patient Flow
  7. Allscripts EHRQIS – Quality
  8. Avaya
  9. Axis (MDX)
  10. Bed Ready – Other
  11. Cerner Signature
  12. CMS Standard Analytical Files
  13. Daptiv
  14. Echo Credentialing – Provider Management
  15. ePIMS
  16. First Click-Wellness
  17. FlightLink
  18. GE (IDX) Centricity – Practice Management
  19. HCUP (NRD, NIS, NED Sample sets)
  20. Health Trac
  21. HealtheIntent
  22. Hyperion
  23. InitiateEMPI
  24. Innotas
  25. IVR Outreach Detail
  26. MIDAS – Credentialing Module
  27. Morrisey Medical Staff Office for Web (MSOW)
  28. National Ambulatory Care Reporting System (NACRS)
  29. Nextgate EMPI
  30. Onbase
  31. PHC Legacy EDW
  32. QXNT/Cactus – Provider
  33. SMS Legacy – Other
  34. Truven Quality
  35. University HealthSystem Consortium – Clinical and Operational Resource Database
  36. University HealthSystem Consortium – Regulatory

This is all the data we have in the U.S. healthcare ecosystem today and we have barely started. Imagine what the future data ecosystem looks like. We must create a more scalable way for ingesting that data, organizing it, and delivering it back to the point of decision making. What we offer with traditional data warehousing and with what’s emerging from the EHRs will not scale to this volume and variety of data sources.

#6. Providers Becoming Payers

The insurance industry is the tail wagging the healthcare dog. The current payer insurance economic model isn’t working. To improve the situation, providers need to model an assumed financial risk and compete with, or completely disintermediate, insurance companies. With DOS, providers have more and better data to model and manage risk than insurers. This is the hybrid we need in the future: providers becoming payers to change the situation that’s so unhealthy for the industry.

#7. Extend the Life and Current Value of EHR Investments

DOS can extend the life and value of current EHR investments. Initially, the expectations of EHRs were high. We haven’t quite reached the trough of those expectations yet (Figure 6), but as we start to optimize EHRs and try to make them work in different revenue cycles, with population health and different reimbursement models, reality will eventually settle in.

Figure 6: The expectation of EHRs over time

With open APIs and DOS, we can reduce this trough’s depth and increase (and achieve) the expectations that we set for EHRs a while back. EHR vendors need to participate in the development of DOS and open APIs to make their products better.

Dr. Robert Pearl, CEO of the Permanente Medical Group, said that healthcare is using “information technology from the last century.” This is a big statement from an executive who leads 9,000 physicians and 34,000 staffers at one of the more impressive healthcare systems in the world.

The inevitable technology lifecycle impacts the demand for EHRs (Figure 7). We’ve invested more than $36 billion dollars on EHR Incentive Program payments. Federal incentives artificially stretched demand, but that has passed. The underlying software and database technologies of EHRs were commoditized long ago. Following the demand curve in the near future portends trouble for EHRs.

Figure 7: Lifecycle vs. demand for technology products

Nobody has the appetite to replace all the outdated, last-century technology, but with DOS and open APIs we can change the trajectory. This pivot toward extending and reinventing products needs to start while in the comfort zone of the maturity phase. This is where Health Catalyst is now. We don’t want to wait to be disrupted by someone else, so we’re going to disrupt ourselves. We can improve this curve for the EHR vendors to the betterment of the industry.

Collaboration Role Models

Vendor collaboration from Facebook, Google, Amazon, Microsoft, and Twitter is a role model for healthcare. The evidence is very clear that healthcare has a long way to go toward achieving this kind of partnership. Some EHR vendor app stores appear to support open APIs, like FHIR, but by contract, any application submitted allows the vendor to take the intellectual property and profit from it. We need to collaborate on standardization and compete on innovation, which is exactly what the vendors in Silicon Valley are doing. They know that, at the end of the day, innovation, not mundane standards, achieves significant advancements.

The Rapid Pace of Change

The list of relevant open-source technology products available from Silicon Valley (Figure 8) changes literally every week. This is an example of what can be done when the focus is on collaboration, followed by innovation around that collaboration and standardization.

Figure 8: The fleeting matrix of open-source technology products

The same thing applies to software development tools. We are at the beginning of a software technology renaissance and healthcare has to take advantage of it. We must put pressure on ourselves to do better.

The Possibilities of Open, Standard Software APIs

“EHRs would become commodity components in a larger platform that would include other transactional systems and data warehouses running myriad apps, and apps could have access to diverse sources of shared data beyond a single health system’s records.” This statement about open, standard APIs may sound threatening to EHR vendors, but only if they don’t participate in what’s happening. They can leverage the concepts in these open APIs to be more competitive and extend their product lifecycle and value.

We can leverage open APIs technologically now more than ever before. I have a deep history of open-system standards evolution. I know the major patterns of success and failure that started back in 1983 with my first exposure to Abstract Syntax Notation One (ASN.1). Now, success can be characterized by things like FHIR and JavaScript Object Notation (JSON), two standards that are indicative of the current renaissance.

When my team and I built the data warehouse at Northwestern Memorial Healthcare, we didn’t call it DOS, but we had what amounted to an early version of it in 2006 (Figure 9).

Figure 9: A prelude to DOS

This is to prove why DOS is a very tangible concept. We now have the tools, techniques, and more data content than we’ve ever had before, enabling us to build it like never before. In Northwestern’s data warehouse, we supported analytics and near real-time exchanges of single records, and we were pushing data point-to-point before there were HIEs. We had text data and discrete data in a single platform. We ran analytics and batch processing computations on that data. We could also pull up just single records and display those in an application (e.g., single lab results served up through the data warehouse). This was all running on an early version of a Microsoft SQL server, which is much better now, with the ability to handle mixed environments. Add the big data technology coming out of Silicon Valley and this concept is easily achievable. This DOS is not just a pipe dream. We’re going to do this and it’s not that far away.

Hybrid Big Data-SQL Architecture

In a paper titled “The Data Warehouse DBMS Market’s ‘Big’ Shift,” Gartner analysts Mark Beyer and Roxane Edjlali wrote, “Because traditional data warehouse practices will be outdated by the end of 2018, data warehouse solution architects must evolve toward a broader data management solution for analytics.”

This is why we are disrupting ourselves now. We have the ability to pull data in and take advantage of streaming pipelines through tools like Kafka and Spark. (Figure 10).

Figure 10: The Hadoop, big data ecosystem provides options that we never had before, technologically and financially

We can run analytics on that data, run an elastic search, populate a SQL or a NoSQL data warehouse, and then push this out through APIs to the EHRs and other source transaction systems. We have the technology around the data that we’ve never had before.

Lambda Architecture

Lambda Architecture (Figure 11) is a conceptual design supported by the big data world that says incoming data can be split into two branches: one for batch computations and one for real-time transactions and computations.

Figure 11: Lambda Architecture

These can be served up to end users in the serving layer underlying all of this historical, as well as results, storage.

Kappa Architecture

Kappa Architecture (Figure 12) has some appeal. It also comes out of big data in that it uses one incoming data source and one code set for both real-time and batch-oriented analytics.

Figure 12: Kappa Architecture (thanks to Julian Forgeat of Google)

Both Lambda and Kappa Architectures can be implemented with a combination of open-source tools, like Apache Kafka, Apache HBase, Apache Hadoop (HDFS, MapReduce), Apache Spark, Apache Drill, Spark Streaming, Apache Storm, and Apache Samza. Microsoft and other vendors are blending these two environments so SQL and NoSQL work together seamlessly.

A few people will say this is too hard or impossible. My response to that is we created a precursor to this at Northwestern in 2006 without the tools we have today. And just because something is hard doesn’t mean we shouldn’t do it. Climbing Denali is hard and hiking the Appalachian Trail is time consuming. They’re both difficult for different reasons. But it doesn’t mean they can’t or shouldn’t be done. The alternative is not doing it, which ignores all the reasons mentioned earlier for forging ahead. If we don’t, then we stay in the lower left quadrant of McKinsey’s Digitization Index. As patients and citizens, we cannot allow this to continue.

Health Catalyst Initial Fabric Services

Some detail into the initial fabric services will provide a sense of how we’re approaching this layer in DOS:

  1. Fabric.Identity and Fabric.Authorization microservices

Fabric.Identity provides authentication (i.e., verifying the user is who he/she is claiming to be). Fabric.Authorization stores permissions for various user groups and, once given a user, returns the effective permissions for that user.

  1. Fabric.MachineLearning microservice

A microservice that plugs into a data pipeline (like ours) and runs machine learning models written in R, Python, and TensorFlow. It encapsulates all the machine learning tools inside so all you need to do is supply a machine learning model.

  1. Fabric.EHR set of microservices

Enables SQL bindings, machine learning models, and application code to show data and insights inside the EHR workspace using SMART on FHIR.

  1. Fabric.PHR set of microservices

Provides the ability to download, share, and update a personal health record. Integrates data from all available EMRs in a patient’s health ecosystem.

  1. Fabric.Terminology set of microservices

Provides the ability for application developers to leverage local and national terminology mapping, and update services.

  1. Fabric.FHIR microservice

A data service that sits on top of any data platform (Health Catalyst EDW, data lake, Hadoop, etc.). Applications using this data service become portable to any other data platform. It uses data to FHIR mappings (written in SQL, Hive SQL, etc.) to map data and implements an Analytics on FHIR API using a cache based on Elasticsearch.

  1. Fabric.Telemetry

Provides infrastructure to web and mobile applications to send telemetry data to our Azure cloud, and provides tools to analyze it using Elasticsearch.

The real-world script example in Figure 13 gives tangible proof of how we are converting our relational data models into FHIR information models. This is one of the scripts the team has developed for that conversion.

Figure 13: Actual example of programming for FHIR mapping (SQL version)

The output into FHIR is shown in Figure 14.

Figure 14: FHIR output from a mapping script

Converting what we have in the relational world into FHIR is difficult, but not impossible. It’s going to be time consuming, but we can accelerate it.

The Measures Builder Library

We are porting more than 200 Health Catalyst reusable value sets (Figure 15) into a content management system and code repository called the Measures Builder Library (MBL).

Figure 15: Sampling of the 200+ Health Catalyst reusable value sets

We can reuse these in Health Catalyst and third-party applications. There are also now more than 2,000 value sets and quality measures from CMS and the National Quality Forum (NQF) library, and this is just a portion of what we have to measure and keep track of in healthcare. From MBL, we’ll be able to express those value sets and reuse them in the microservices of the fabric.

The same concept applies with DOS machine learning models. These will reside in the fabric.machinelearning service described earlier, and any application can invoke these models. Figure 16 shows the Health Catalyst machine learning models in three phases of development.

Figure 16: Three phases of machine learning models in development for DOS

Figure 17 shows how we manage and reuse the explosion of measures and value sets in the industry through MBL.

Figure 17: Measures Builder Library (MBL) is a content management system and set of APIs that allows registries, value sets, and other measures to be consistently managed, verified, governed, and reused for application development

Through this system, app developers can reference the 2,000+ NQF and CMS value sets both programmatically and manually without having to hunt them down. Health Catalyst and our health system partners will contribute where value sets cannot be automated. The Health Catalyst Precise Registry Builder will feed MBL, then we will push this out to the Health Catalyst fabric and make it available to our applications, third-party applications, and client applications.

Role Model Software Development for the Fabric

In addition to building DOS, we want to be role models in software development for the fabric because this effort should be led by healthcare, not Silicon Valley. Health Catalyst is following these attributes of role model development to implement and achieve the concepts in DOS:

  • Open Source and Collaborative Development: Our code is available on External developers can submit enhancements.
  • Open and Modular: All APIs will be publicly published. Customers can pick and choose from the Health Catalyst components or replace any component with their own or a third party’s.
  • Secure by Design: Security services make it easy to build security into any application.
  • Microservices Architecture: REST-based services that can be called from web, mobile, or BI tools.
  • Big Data: Leverages big data technologies to provide a high-speed and reliable platform.
  • Easy Install and Updates: All services install via Docker.
  • Scalable: All services are designed to run in multiple nodes and cluster themselves automatically.

We will know we’ve reached role model status once we can demonstrate these eight software development vital signs:

  1. Successfully implement DOS.
  2. Fast, simple releases every two weeks. Constant improvement of our apps.
  3. Analytics-driven UI and applications (intelligent user interfaces, driven by situational awareness of the physician, nurse, patient, etc.).
  4. Constantly consuming and expanding the data ecosystem as the enabler of great apps, not apps as the enabler of data.
  5. Machine learning and pattern recognition that clearly amazes all of us with its value to humanity.
  6. Economic scalability. We’re so efficient with our products, which work across multiple OS and data topologies, that it’s economically efficient to constantly deploy.
  7. Auto-fill analytics. This is a play on words, but how do we, through pattern recognition and machine learning, anticipate next steps in our partners’ decision making?
  8. Google, Facebook, Amazon, and Microsoft come to us for advice about software success and value.

We will do our best to become this role model.

Ongoing DOS Development and Maintaining Focus

Health Catalyst partners can track development, ask questions, request features, and review roadmaps and release notes about DOS in the Health Catalyst Community. This is a community effort with a lot of uphill work ahead.

There will be those who want this to fail because they are afraid of being disrupted and want to protect the status quo. There will be those who expect failure because of the degree of difficulty and because healthcare IT doesn’t have a great reputation for breakthrough achievement.

But there are many more who hope DOS succeeds and these are the people we work for. As patients and members of a global community, this is something we need to do, can do, and will do.

Additional Reading

Would you like to learn more about this topic? Here are some articles we suggest:

  1. The Health Catalyst Data Operating System (DOS™) Solution
  2. The Data Operating System: Changing The Digital Trajectory Of Healthcare
  3. Health Catalyst Data Operating System (DOS™)
  4. $200M Later: Health Catalyst Changes the Digital Trajectory of Healthcare with the Data Operating System
  5. No More Excuses: We Need Disruptive Innovation in Healthcare Now