How to Turn Data Analysts into Data Scientists
There is a large shortage of healthcare data scientists which is hindering the ability of health systems to leverage the power of artificial intelligence (AI). Instead, health systems have data analysts who understand the organization’s data, people, and workflows. What if there was a way to harness the power of existing data analysts to enable health systems to leverage AI faster?
An August 2018 Linkedin Workforce Report described the demand for data scientists as “off the charts” and estimated that there is a national shortage of more than 150,000 people with data science skills. So, how many data scientists in the U.S. have healthcare experience? According to a recent study of Linkedin data, approximately only 180 work in the healthcare field.
This shortage should incentivize organizations to better utilize their existing data analysts, and, where possible, eventually turn some data analysts into data scientists. For data analysts, it creates a tremendous opportunity to expand their skillsets and increase their job security. Data analysts are the untapped engine for AI.
Categorization of AI Use Cases in Healthcare
To learn what types of AI use cases are appropriate for data analysts to tackle, it’s helpful to categorize AI use cases into four buckets as shown in Figure 1 below.
Figure 1: Healthcare Use Cases
Data analysts should focus on the first two, simple use cases with known patterns, because they can follow existing patterns for solving them. Typically, as they dig deeper into the problem, they will find that the data fits the defined pattern and they can continue by themselves. Other times, the data will not fit the pattern and they will need to manipulate the data or apply different algorithms and at this time they can seek advice from a data scientist.
Data scientists can focus on the next two, complex use cases with known patterns and use cases without known patterns. Data scientists can leverage data analysts for manipulating the data.
By addressing each category of use cases differently, health systems can foster a good partnership between data scientists and data analysts where everyone is working at the top of their license.
Top-Down vs. Bottom-Up Artificial Intelligence (AI)
While many people think data scientists spend their days creating algorithms and doing predictive analysis, most data scientists spend 80 percent of their time collecting, cleaning, and organizing data–otherwise known as data wrangling. Not only is this not an effective use of their time, data scientists don’t enjoy it. In fact, 76 percent of data scientists view data preparation as the least enjoyable part of their work.
A typical strategy that most healthcare organizations are employing is top-down AI. In this situation, the company hires a few data scientists and then trains them to understand the organization’s data, problems, and people. This typically takes a year or more until the data scientists are comfortable enough to hand over a prototype or a working solution to the data analysts, who then have to integrate it with production code.
While there’s nothing wrong with this strategy, and there are plenty of use cases where this works well, there’s a complementary strategy–bottom-up AI–that utilizes the skills of data analysts first.
Here, existing data analysts shape the data, understand the problem, and design the solution. They work with healthcare data scientists (employed or consultants) to find the best approach and validate results. Then, they work with clinicians and others to embed insights into existing workflows.
This strategy works well with some use cases and better utilizes existing data analysts so they can get the work done faster.
In Figure 1, bottom-up AI model is appropriate for the first two use cases while the top-down AI model is the best fit for the next two use cases.
How to Turn Data Analysts into Healthcare Data Scientists
Start With What They Already Know
Data analysts are already about 90 percent of the way to becoming healthcare data scientists. They already know:
- The organization’s data.
- How to wrangle the data.
- Feature engineering. Even though data analysts may not realize it, they already know how to create features.
- The people in the organization.
- The workflows.
- How to validate results with users.
Identify the Best Candidates Amongst Your Data Analysts
The definition of a data analyst varies across organizations; the best candidates for healthcare data scientists have the following attributes:
- Have analytical skills to create insights from data.
- Have domain knowledge to understand the problems at hand.
- Have experience with SQL or Python.
- Have shown an ability to learn new skills.
- Are good at communication with end users.
Teach Them to Think Like a Data Scientist
Data analysts are not going to become data scientists overnight, but a little knowledge and some consulting from a data scientist will help them get started on their AI journey.
The first step in a data analyst’s AI journey is learning to think like a data scientist. For example, a data analyst thinks, “If A, then B.” However, a data scientist thinks in probabilities: “If A, then most probably B, but less probably C.”
A data analyst also thinks in terms of how long a project will take: “Done in five days.” A data scientist first explores if a question is even answerable with the data she has.
Learn the Common Classes of Algorithms in Healthcare
Once a data analyst begins thinking like a data scientist, they also need to learn and understand the common classes of algorithms found in healthcare. Here are some examples, illustrated below in Figure 2:
Figure 2: Common Classes of Algorithms in Healthcare
- Regression – In this example of linear regression, age of the patient is on one axis and length of stay in the hospital is on the other. With these data points, an analyst can draw a trend line that denotes the average of the values. Then, when a new patient comes in, she can plot their age on the line and predict how long the patient’s stay might be.
- Classification – In this binary classification example, the analyst has data points showing patients that were readmitted and patients that were not. If he draws a line separating the red from the green data points, he can classify who’s likely to be readmitted and who’s not likely to be readmitted.
- Clustering – In this example of clustering, there are several different data elements grouped together. To know why, the analyst would need to explore the data more. For example, one possibility is different diabetes subgroups, and each cluster is people behaving in a certain way.
Regression and classification are considered supervised machine learning because the answer is available to train with. Clustering is considered unsupervised machine learning because the answer is not available as to why the data is clustered in this way.
Follow the Six Steps from Data Analyst to Data Scientist
Once a data analyst has learned the basics of data science, she has a good start on her AI journey. There are six steps to incorporating data science into her everyday work:
- Using basic data science skills, define the use case.
- Prepare the data.
- Develop a model.
- Deploy the model.
- Surface insight and guidance.
- Review with data scientists and learn.
When Is a Data Analyst Ready to Be a Data Scientist?
While purely technical skills are unlikely to be the stumbling block toward growing into a data scientist, the transition is more than just a job title change. Once the data analyst acquires the skills outlined in this report, she will also need to build soft skills, such as understanding workflows within an organization. She’ll need to become experienced in building predictive models or statistical analyses related to machine learning. She’ll need to work on gaining more skills related to AI.
Recommended Resources to Get Started
As data analysts work towards learning and incorporating data science skills, good resources are available:
- ai – Machine Learning Community.
- Think Like a Data Scientist: Tackle the Data Science Process Step-by-Step – a great book for learning the basics of data science.
- Data Science Central – Online Resource for Big Data Practitioners.
- Books by Hadley Wickham.
- Healthcare data scientists from the field.
Data Scientists Helping Data Analysts Do More for the Organization
The bottom-up AI approach and growing data analysts into data scientists will produce far better results than either approach can do on their own. One of the best ways to get started is for data analysts to bring the data to data scientists and treat a project more like a partnership with an exchange of ideas. Elitism is not helpful. The approach should be to get people to the wrong answer faster, and then work together to find solutions.
Would you like to learn more about this topic? Here are some articles we suggest:
- The Changing Role of Healthcare Data Analysts—How Our Most Successful Clients Are Embracing Healthcare Transformation
- When Healthcare Data Analysts Fulfill the Data Detective Role
- 6 Essential Data Analyst Skills for Your Healthcare Organization
- The Number One Secret of Highly Effective Healthcare Data Analysts
- 4 Ways Healthcare Data Analysts Can Provide Their Full Value