Enterprise Data Warehouse / Data Operating system

What are the typical benefits of your solution?

Response: Leveraging Health Catalyst’s proprietary methodology and design for data engineering, known as the Late-Binding™ approach, Catalyst will integrate and unlock the client’s EMR, financial, patient satisfaction, and other data sources. This rich data will become the bedrock upon which meaningful analytics are created to support the client’s understanding of areas of excellence within the system as well as areas that need improvement.

Once the client’s data is aggregated and exposed, we will apply Catalyst’s analytics to use the results of these analyses to drive improvements in care delivery while reducing costs. Catalyst professionals will empower the administrators and clinicians to identify and implement concrete solutions to achieve these objectives.

This approach makes us very different from other vendors. It allows organizations to save valuable time and money by quickly forming the necessary infrastructure to allow for analysis that can begin addressing the clinical, financial, and operational topics of importance to your organization. This approach is also reflected in the investment commitments that we ask you to make with our company. The basis of our approach is that as you achieve value, you will want to make further investments with us to go even further with your initiatives.

To illustrate this value and difference, allow us to share two specific examples of successes other hospitals have enjoyed based on the Health Catalyst difference. Health Catalyst has successfully worked with many of our customers to reduce both hospital mortality and readmissions. In one specific example, one of our customers saw a 22% reduction in septicemia mortality rates and a savings of $1.3 million in just the first 12 months of working with Health Catalyst. Using Health Catalyst tools, another hospital partner identified that the incidents of readmissions for Heart Failure (HF) patients was a significant challenge for them. Together, we worked on a program to reduce HF readmissions, and after just six months, they enjoyed a 21% reduction in 30-day HF readmissions and a 14% reduction in 90-day readmissions. These are both powerful examples of the unique approach and difference that Health Catalyst brings to bear for our customers.

Does your solution provide versioning support, including rollback versioning?

Response: Yes. Catalyst provides content versions and continual upgrades.  In the event that the solution needs to be rolled back to a previous version, Catalyst provides that capability.

What are your key solution differentiators?

Response: A few of the many Health Catalyst differentiators are

Time to Value

  • Catalyst’s unique data warehousing approach allows data sources to be easily integrated into the Health Catalyst platform. The Late-BindingTM Data Warehouse approach forgoes complex data modeling up front, favoring comprehensive feeds of source data into the data platform. Our library of pre-existing data feeds, combined with our advanced toolset for creating new data feeds, allows Health Catalyst to claim the fastest implementation times in the industry. The platform can be installed and configured in as few as three months with real ROI in as few as six months.


  • Catalyst’s architecture supports ultimate flexibility. Because we do not perform heavy data modeling or aggregation up front, our solution always allows drill-down to the most granular data collected in the source system. The Health Catalyst platform supports quality improvement, financial reporting, operational reporting, and research. We train our clients on how to use our platform for custom development and the extension of the platform. Although we have a growing and rich application layer, we realize the uses of the data warehouse can never be limited to applications that we provide.

Rich Content and Application Layer

  • Once data have been loaded into the Health Catalyst platform, a rich ecosystem of data marts supports an interactive visualization layer. Health Catalyst has data marts that define populations using lab, medication, coding, and other criteria; data marts that assess patient risk and allow stratification in our visualizations; data marts that catalog and asses regulatory metrics; data marts that look deeply into clinical areas for improvement; and many, many more. These data marts power an extremely interactive visualization layer that allows users to filter on a host of variables and drill into patient- and encounter-level detail.

Low-Risk Contracting

  • We at Health Catalyst believe so strongly in our team and our platform’s ability to deliver real business value in a short time that our contracts are structured to remove risk from our clients. If you are not satisfied with the Health Catalyst system, after nine months we offer you the chance to receive a full refund and we will part ways. This is in contrast to our competitors, many of which would still be mid-implementation at this time.

Catalyst has deep experience in healthcare data warehousing and analytics. Our team is assembled from technical and clinical leaders in healthcare analytics from leading organizations like Intermountain Healthcare, Northwestern Medicine, and Peace Health. We pride ourselves on our experienced team, which has over 500 years of healthcare data warehousing experience.

Can the data dictionary be altered or modified?

Response: Yes. Catalyst provides EDW Atlas, which provides easy-to-use interfaces for modifying the data dictionary. Catalyst understands that for a successful implementation, the data dictionary must be able to be customized to fit the unique needs for each client.

What type of analytics do you provide to support big data?

Response: We support structured and unstructured analysis of text and discrete data. Our queries and algorithms are designed to support descriptive, predictive, and prescriptive analytics.

What is your approach to big data?

Response: Our solution is a combination of relational and non-relational platforms. We believe this strategy is the best approach to analytics in healthcare, given the volume, structure, and type of data that currently exists in the healthcare data ecosystem. From a product perspective, we use Microsoft SQL Server 2012, PolyBase, HDInsight, and Microsoft’s Parallel Data Warehouse. Microsoft SQL Server alone, without the Big Data extensions, scales easily to the petabyte level. We utilize PolyBase to serve as the Big Data/RDBMS translation interface engine.

How do you manage master data?

Response: We manage master data through ETL using our SSIS-based technology stack. When available, we leverage existing master data sources, such as an EMPI for patients and provider credentialing systems for providers. When a client does not have an EMPI, we use matching algorithms to identify and resolve duplicate patient records. We employ similar methods for providers.

We use our metadata-driven ETL platform in conjunction with pattern-matching technology.

What tools do you provide for metadata management?

Response: The Catalyst platform includes a Metadata Management set of tools in the Atlas application. Atlas is Catalyst’s web-based metadata management tool. Atlas stores, searches and displays metadata, including data mart, table, and column descriptions for both the source systems and source marts. Data architects use Atlas to manage and manipulate the metadata that is critical to the operation of the Catalyst EDW.

Atlas includes a Google-like search capability that allows users to search all metadata for terms and returns exact lexical matches as well as similar text matches found in the metadata system.

In addition, Atlas manages the metadata for the mapping of tables and columns from source systems to source marts in the data warehouse. As such, it is a critical tool for the development and maintenance of Source Marts.  Click here to learn more about Atlas.

What type of metadata do you provide?

Response: We provide business, technical, and process metadata. As part of the platform, Catalyst provides a metadata tool that allows business data stewards to modify, enhance, and augment business metadata. Technical metadata is also created and used by the platform to run the ETL. Process execution metadata is generated by the ETL processes and is used by the management tools.

How is data updated or refreshed?

Response: The underlying data in the platform is updated or refreshed using the Health Catalyst ETL engine in batches according to user-defined schedules. Some source systems are updated as often as every 10–15 minutes while the bulk of them are updated nightly. Additionally, the Catalyst Analytics Framework (SAM Engine) can be configured to refresh the analytics from the source data via either being triggered by a schedule at a certain time of day or when it detects that certain dependent data elements have been changed or refreshed. Catalyst Analytics Framework allows for many different visualization tools such as QlikView, Tableau, BusinessObjects, etc., which can be configured to run directly against the calculated analytics in the platform or to cache the analytical application or report and update on a regular schedule. That ability varies by visualization tool.

How do you handle natural language processing (NLP) of unstructured data?

Response: Health Catalyst is currently developing an NLP subsystem that is intended to become part of the standard Health Catalyst Late-BindingTM Data Warehouse platform. We refer to the project as Pragmatic Natural Language Processing, or pNLP. As the name suggests, we are not attempting to solve the entire NLP problem for healthcare. We are focused on addressing a subset of this huge problem, rather than solving the more general problem which is still largely unsolved. We are focused on a pragmatic subset that will enable the extraction of additional data currently trapped in various unstructured text blocks. We are focused on basic text processing using techniques like regular expressions, word tokenization, sentence segmentation, word normalization and stemming, as well as edit distance (Levenshtein) to extract information of value from physician notes as well as other clinical documents. Examples of potential targets include medications (frequently misspelled), diagnosis, common codes (e.g. ICD9, ICD10, RxNORM, NDC). As a starting point, we are leveraging common, existing technologies that support Approximate Regular Expressions (AREs). AREs support minimal edit distance calculations using Levenshtein edit distance algorithms to determine the degree of insertions, deletions and substitutions in text. These techniques help manage misspellings and other problems common to text. Technologies like eGrep and TRE are examples of representative technologies that perform these functions very well. Negation is another feature we will support. Studies show that 50% of concepts in clinical reports are negated, so support for negation is essential. Negex and ConText are two technologies we are exploring as part of our solution to deal with common negation as well as pseudo negation terms. Examples of potential negations include: “patient denies chest pain,” “no indication of congestive heart failure,” and “no shortness-of-breath.” The aforementioned tools address many of these negations cases. Regular expressions with negation support are of fairly low value on their own. Therefore, we are adding support for algorithms that are sequences of regular expressions which form a pipeline to operate on sentences and larger text blocks. Algorithms allow one to link a number of regular expressions together (basically subroutines) in interesting combinations and facilitate reuse of regular expression components. Concept discovery and indexing, leveraging technologies like MetaMap from the NIH and Knowledge map, allow us to do more general data mining over results generated by the regular expression pipelines. These techniques are typically called Named Entity Recognition techniques. Lastly, we plan to include support for common lexicons. ICD, SNOMED-CT, UMLS, etc. will be leveraged for concept searches. We are exploring the inclusion of other UMLS lexicons that are typically licensed as optional extensions to the lexicon sets. In summary, we are leveraging two complimentary approaches, Approximate Regular Expressions and Concept Discovery & Indexing, to create a pragmatic NLP solution that will be fully integrated into the Health Catalyst solution in a future release.

How do you handle very large amounts of data with fast query and calculation performance?

Response: The EDW is built upon a Microsoft SQL Server Platform, which is capable of handling large amounts of data and performing calculations while maintaining performance. Furthermore, fundamental to Health Catalyst’s EDW design is the concept of a Late-Binding™ Data Warehouse.

This Late-Binding™ architecture avoids unnecessary complexity that delays time to value and leads to a very fragile and inflexible data warehouse infrastructure that could not adapt to rapidly changing analytic use cases or new data content. This has proven many times more scalable and adaptable to new analytic use cases and data content than the methodologies that utilize early binding, tightly coupled data models and vocabulary management. (For more information on our Late-Binding™ warehouse, please see our website.)

The late binding of attributes allows users to adjust or modify as needed. This leads to higher performance and enables the data to be loaded into multiple visualization applications for further analysis, summary, and drill-down.

What is your database management system?

Response:  The Health Catalyst Data Warehouse Platform is built on Microsoft SQL Server 2012 Enterprise Edition. Extract, Transform, and Load (ETL) processes are built, scheduled, and executed with Microsoft SQL Server Integration Services (SSIS). MS SQL Server Management Studio (SSMS) is used for day-to-day management of the SQL Server environment. SQL Server Reporting Services (SSRS) are also part of the MS SQL Server product family and can be leveraged for simple reporting tasks.

To date, Microsoft SQL Server 2012 has scaled to meet the demands of Health Catalyst customers, but we acknowledge that very large, single-instance implementations are likely to require the aforementioned advanced relational database management system features currently unavailable in MS SQL Server 2012 Enterprise Edition. We are prepared to adapt to satisfy those requirements.

When does normalization occur in your data warehouse?

Response: There are two uses of the term “normalization” in healthcare data management. In traditional database design, normalization is the term used to describe the process of organizing the fields and tables of a relational database to minimize redundancy and dependency. In healthcare, we use the term normalization to describe the process of rationalizing data to a common vocabulary or definition, such as Patient ID, SNOMED, CPT, LOIN, RxNorm, etc. The term can also be used to describe the process of standardizing the representation of clinical algorithms and business rules, such as calculating length of stay, provider-patient attribution, and defining cohorts of patients for inclusion and exclusion from patient registries. Catalyst has deep and successful experience in all of these definitions of the term. Our Late-Binding™ Data Warehouse methodology is unique in the industry in its ability to quickly adapt and map to the constantly changing vocabulary standards, business rules, and clinical algorithms in healthcare. The diagram below shows the two layers at which data is normalized in the Catalyst architecture.


What data import and export capabilities do you have?

Response: Data can be imported into the EDW using traditional methods from flat-files (such as Excel) or other databases using import wizards, which are typically part of the relational database system. The Catalyst EDW includes a flat-file ETL interface for importing formatted files on a daily or on-demand basis. Data can also be exported to Excel files or other databases via secure, standard ODBC-style database drivers and connection strings.

What is your data quality and validation process?

RESPONSE: Some people in the data warehousing community advocate a position that involves heavy downstream cleansing of source system data. We have found that this is a risky practice that has great potential for damaging credibility with the analyst communities that the EDW is trying to serve. The danger lies in the fact that the downstream data cleansing requires extensive subject matter expertise, is prone to error, and is not transparent to the users. When an analyst community distrusts the data from the EDW, it is very hard to win that trust back. We advocate a methodology that maintains data fidelity with the source system and provides feedback to source system data stewards about data quality issues. Once those data quality issues are fixed, the corrected data automatically flows into the EDW.

We have a data validation framework that helps us maintain the fidelity between the EDW and the source system. One of the primary features compares source table row counts and EDW row counts at varying degrees of granularity to identify when a table is out of sync with the source system.

What EMRs and external sources do you integrate with?

Response: The Late-Binding Architecture of the Health Catalyst data warehouse streamlines and simplifies the process of bringing data into the warehouse. A Source Mart brings a new data source into the data warehousing platform and provides the necessary linkages with Atlas. The platform includes sophisticated tools that enable Health Catalyst or our clients to quickly develop, test, and deploy source marts. There is also a library of pre-configured Source Marts for popular clinical, financial, and ancillary systems, including the following:

  • EMR: MEDITECH, Cerner, Epic, McKesson, Siemens, Allscripts
  • Financial: Lawson – GL, Lawson – AP, Lawson – MM, PeopleSoft – GL, PeopleSoft – AP, PeopleSoft – Supply Chain, PeopleSoft Payroll, MEDITECH, McKesson
  • Human Resources: UltiPro – HR, Lawson – HR, Time Card Tracking, Kronos, API
  • Patient Satisfaction: Press Ganey, NRC Picker
  • Standard Costing: EPSi, Alliance
  • Cardiovascular: LUMEDX (Apollo)
  • Radiology: GE Healthcare, McKesson, Philips Healthcare, Fujifilm Med Sys, Agfa Healthcare, Merge Healthcare
  • Laboratory: Sunquest Lab, Cerner Labs, MEDITECH, SSC Soft Computer, McKesson

Pharmacy: MEDITECH, Cerner, McKesson, Epic, Siemens

What exporting tools do you offer?

Response: Microsoft SQL Server 2012 provides several tools and a number of mechanisms for extracting and formatting data for export purposes. At the simplest level, SQL Server Integration Services includes an “Import and Export Wizard” that provides a simple graphical user interface to create data extractions in a number of forms, including common separated values (CSV), XML, and other textual forms.

Another extraction option is direct connection to the database via SQL Server database connectors and extracting the desired data directly from the database for importing into another database or into applications like Power Pivot, Excel, and Access. ETL jobs can be used to extract data directly from the warehouse on an ad hoc basis or during regularly scheduled ETL processes. Health Catalyst also provides a facility within our Cohort Finder application that allows select patient information to be exported by end users into Excel spreadsheets.

The Health Catalyst EDW contains extensive metadata for each database, table, and column to simplify the extraction process.