data science skills |

Summary

In October 2020, The World Economic Forum published the report “The Future of Jobs”. This report has deep insights on technological adoption in the next five years, and it maps the jobs and skills of the future including a deep dive into Data and AI Skills. The report shows that technological adoption continues expanding, while skills availability remains the #1 barrier to that adoption. Businesses and governments around the world are investing significantly in upskilling and reskilling programs, with a significant percentage of that investment going towards transitions into Data and AI Jobs. Demand for Data Analysts, Data Scientists, and AI Specialists is high, but the skills gap that needs to be addressed to successfully transition into those roles is large.

Some of the key findings:

Skills gaps continue to be high. This includes skills like critical thinking, analysis, problem-solving, and skills in self-management such as active learning, resilience, stress tolerance and flexibility. On average, companies estimate that around 40% of workers will require reskilling of six months or less and 94% of business leaders report that they expect employees to pick up new skills on the job, a sharp uptake from 65% in 2018.
Online learning is on the rise. There has been a four-fold increase in the numbers of individuals seeking out opportunities for learning online through their own initiative, a five-fold increase in employer provision of online learning opportunities to their workers and a nine-fold enrollment increase for learners accessing online learning through government programs
The window of opportunity to reskill and upskill workers has become shorter. The share of core skills that will change in the next five years is 40%, and 50% of all employees will need reskilling
The large majority of employers recognize the value of human capital investment. 66% of employers surveyed expect to get a return on investment in upskilling and reskilling within one year. Employers expect to offer reskilling and upskilling to over 70% of their employees, but employee engagement into those courses is lagging, with only 42% of employees taking up employer-supported reskilling and upskilling opportunities.

Over the past decade, a set of ground-breaking, emerging technologies have signaled the start of the Fourth Industrial Revolution. By 2025, the capabilities of machines and algorithms will be more broadly employed than in previous years, and the work hours performed by machines will match the time spent working by human beings. This augmentation of work will disrupt the employment prospects of workers across a broad range of industries and geographies, and we will see job growth in the ‘jobs of tomorrow’— such as roles at the forefront of the data and AI economy, as well as new roles in engineering, cloud computing and product development.

Technological Adoption

The past two years have seen a clear acceleration in the adoption of new technologies. Cloud computing, big data and e-commerce remain high priorities, following a trend established in previous years. However, there has also been a significant rise in interest in encryption, and a significant increase in the number of firms expecting to adopt robots and artificial intelligence. These new technologies are set to drive future growth across industries, as well as to increase the demand for new job roles and skill sets. Figure 1 shows technologies likely to be adopted by 2025 (by share of companies surveyed).

By 2025 the average estimated time spent by humans and machines at work will be at parity based on today’s tasks. Algorithms and machines will be primarily focused on the tasks of information and data processing and retrieval, administrative tasks and some aspects of traditional manual labor. The tasks where humans are expected to retain their comparative advantage include managing, advising, decision-making, reasoning, communicating and interacting.

Emerging Jobs

Similar to the last survey in 2018, the leading positions in growing demand are roles such as Data Analysts and Scientists, AI and Machine Learning Specialists, Robotics Engineers, Software and Application developers as well as Digital Transformation Specialists. However, job roles such as Process Automation Specialists, Information Security Analysts and Internet of Things Specialists are newly emerging among a cohort of roles which are seeing growing demand from employers. The emergence of these roles reflects the acceleration of automation as well as the resurgence of cybersecurity risks. Figure 2 shows the top 20 job roles in increasing demand across industries, with Data Analysts, Data Scientists, and AI Specialists ranked with the highest demand overall.

Figure 2 – Top 20 job roles in increasing demand across industries

These emerging jobs have been organized in clusters, and this report presents a unique analysis which examines key learnings gleaned from job transitions into those emerging clusters using LinkedIn and Coursera data gathered over the past five years. The main clusters are: Data and AI, Cloud Computing, Engineering, Content Production, Marketing, People and Culture, and Product Development and Sales. Figure 3 shows Data and AI roles organized according to the scale of each opportunity within the cluster.

Figure 3 – Data and AI Job Cluster

Emerging Skills

The ability of global companies to harness the growth potential of new technological adoption is limited by skills shortages. Figure 4 shows that skills gaps in the local markets and inability to attract the right talent remain among the leading barriers to the adoption of new technologies.

Figure 4 -Perceived barriers to the adoption of new technologies

Skill shortages are more acute in emerging professions. Business leaders consistently cite difficulties when hiring for Data Analysts and Scientists, AI and Machine Learning Specialists as well as Software and Application Developers.

To address skills shortages, companies are investing in upskilling and reskilling programs. However, employee engagement into those courses is lagging, with only 42% of employees taking up employer-supported reskilling and upskilling opportunities. There are however significant challenges in the amount of skills that need to be developed especially for emerging roles in Data Science and Artificial Intelligence. Figure 5 illustrates the skills gap that needs to be closed for individuals to transition into these roles, with Artificial Intelligence, NLP, Data Science and Signal Processing representing the largest amount of skills needed to be developed for a successful transition.

Figure 5 – Typical skills gaps across successful job transitions

Furthermore, the report uses data from Coursera learners to estimate the distance from the optimal level of mastery for learners targeting to transition into Data and AI, and quantifies the days of learning needed for the average worker to gain that level of mastery. (Figure 6).

Figure 6 – Top 10 skills by required level of mastery and time to achieve that mastery

Mastery score is the score attained by those in the top 80% on an assessment for that skill. Mastery gap is measured as a percentage representing the score among those looking to transition to the occupation as a share of the score among those already in the occupation.

In conclusion, technological adoption continues expanding, and skills availability remains the #1 barrier to that adoption. Businesses and governments around the world are investing significantly in upskilling and reskilling programs, with a significant percentage of that investment going towards transitions into Data and AI Jobs. Demand for Data Analysts, Data Scientists, and AI Specialists is high, but the skills gap that needs to be addressed to successfully transition into those roles is large.

Data Science is an emerging field, but it is definitely not a new field. Yet, many people still struggle to define Data Science as a field, and more importantly, struggle to define the set of skills that collectively define a “Data Scientist”.

What is data science?

Data Science is a cross-disciplinary set of skills found at the intersection of statistics, computer programming, and domain expertise. Perhaps one of the simplest definitions is illustrated by Drew Conway’s Data Science Venn Diagram (Figure 1), first published on his blog in September 2010. Discussions about this field, however, go as far back as 50 years. If you are interested in learning more about the history of the Data Science field, you can read it in the 50 Years of Data Science paper written by David Donoho.

Figure 1 – Drew Conway’s Data Science Venn Diagram

The bottom line is that Data science comprises three distinct and overlapping areas: a set of math and statistics knowledge which provides the ability to understand and model datasets, a set of computer programming/hacking skills to leverage algorithms that can analyze and visualize data, and the domain expertise needed to ask the right questions, and put the answers in the right context.

It is important to call out attention to the “Danger Zone” above, as there is nothing more dangerous than aspiring Data Scientists who do not have the appropriate math and statistical foundation to model data.

What skills define the role of Data Scientists?

A Data Scientist is not a just a computer programmer, or just a statistician, or just a business analyst. In order to be a data scientist, individuals need to acquire knowledge from all these disciplines, and at the minimum develop skills in the following areas:

1.Probability, Statistics, and Math foundation. This includes probability theory, sampling, probability distributions, descriptive statistics (measures of central tendency and dispersion, etc.), inferential statistics (correlations, regressions, central limit theory, confidence intervals, development and testing of hypothesis, etc.) and linear algebra (working with vectors and matrices, eigenvectors, eigenvalues, etc.)

2.Computer Programming. Throughout the years, SAS has probably been the most commonly used programming language for Data Science, but adoption of Open Source Languages Python and R has increased significantly. If you are starting today to acquire data science skills, my recommendation would be to focus on Python. Looking at worldwide searches on Google for “R Data Science” and comparing them to “Python Data Science”, the trends are clear (Figure 2). Interest in Python has surpassed R, and continue on a positive trend. This makes sense given that python allows you to create models and also to deploy them as part of an enterprise application, so within the same platform data scientists and app developers can work together to build and deploy end to end models. R while easier in some cases for modeling purposes, was not designed as a multi-purpose programming language.

Figure 2 Worldwide searches for “R Data Science” vs. “Python Data Science”. Google Trends (June 2018)

3. Data Science Foundation. This involves learning what data science is and its value in specific use cases. It also involves learning how to formulate problems as research questions with associated hypotheses, and applying the scientific method to business problems. Data Science is an iterative process so it is critical to have a solid understanding of the methodologies used in the execution of this iterative process (Define the problem, Gather Information, Form hypothesis, Find/Collect data, Clean/Transform data, Analyze Data and Interpret Results, Form new hypothesis)

Figure 3 Data Science Iterative Cycle

4. Data Preparation/Data Wrangling. Data is by definition dirty. And before data can be analyzed and modeled, it needs to be collected, integrated, cleaned, manipulated and transformed. Although this is the domain of “Data Engineers”, Data Scientists should also have a solid understand of how to construct usable, clean datasets

5. Model Building. This is the core of the data science execution, where different algorithms are used to train models with data( structured and unstructured) and the best algorithm is selected. At this stage, data scientists need to make basic decisions around the data such as how to deal with missing values, outliers, unbalanced data, multicollinearity, etc. They need to have solid knowledge of feature selection techniques (which data to include in the analysis), and be proficient in the use of techniques for dimensionality reduction such as principal component analysis. Data scientists will be able to test different supervised and unsupervised algorithms such as regressions, logistic regressions, decision trees, boosting, random forest, Support Vector Machines, association rules, classification, clustering, neural networks, time series, survival analysis, etc. Once different algorithms are tested, the “best” algorithm is selected using different model accuracy metrics. Data scientists should also be skilled in data visualization techniques, and should have solid communication skills to properly share the results of the analysis and the recommendations with nontechnical audiences.

6. Model deployment. A very important part of building models is to understand how to deploy those models for consumption from a data application. While this is typically the domain of machine learning engineers and application developers, data scientists should be familiar with the different methods to deploy models.

7. Big Data Foundation. A lot of organizations have deployed big data infrastructure such as Hadoop and Spark. It is important for data scientists to know how to work with these environments.

8. Soft Skills. Successful data scientists should also have the following soft skills:

a. Ability to work in teams. Because of the inter-disciplinary nature of this field, it is by definition a team sport. While every data scientist on a team will need a good foundation on all skills defined above, the depth of skills will vary among them. This is not a field for individualistic stars, but a field for natural team players.

b. Communication Skills. Data scientists need to be able to explain the results of their analysis and the implications of those results in nontechnical terms. The best analysis can go to waste is not properly communicated.

Last but not least, it is important to remember that the most important characteristic of great data scientists is CURIOSITY. Data Scientists should be relentless in their search for the best data and the best algorithm, and should also be lifelong learners as this field is advancing very rapidly.

In summary, if you are interested in the Data Science field, or if you are exploring ways to develop your skills, make sure that you are addressing all these areas, and especially make sure not to end up in the danger zone having programming skills and domain knowledge but lacking the math and statistics foundation needed to model data correctly.

Tag Archives: data science skills

Data Analysts, Data Scientists, ML and AI Specialists are the jobs with highest demand according to WEF’s 2020 Future of Jobs report

What are the skills that define the role of Data Scientists?

What is data science?

What skills define the role of Data Scientists?