Data science skills in the energy sector: closing the gap - Elisa Evans

Comment by Elisa Evans, Digital & Data Consultant at Energy Systems Catapult.

A recent report produced by Dr Stephen Haben and I took an introductory look at the data science skills landscape across the energy sector. The report considered some of the tools and techniques deployed by the industry, a cursory look at types of teams, and considerations of leadership.

Perhaps the most pertinent finding of the report was the struggle organisations have with recruiting the necessary talent, and training staff to have the necessary skills required to implement operational data science models. We observed that just under 40% of the organisations in the report found it difficult to recruit the necessary data science skills.

These results were taken from organisations which already had some degree of data science resource, which means the struggle is likely to be much more pronounced amongst less data mature companies in the energy sector.

With the digitalisation of the energy sector, data science is going to be a fundamental component for enabling Net Zero and will be a requirement for any organisation that wishes to be competitive in the sector. Despite the growing demand for data scientists, there is a shortage of suitably skilled candidates. We made the following recommendations to address this shortage:

  1. Enabling training for future data scientists. Prospective data scientists must have all the resources and training available to them to ensure they can be immediately valuable to the energy sector.
  2. Upskilling: many organisations don’t have the time to keep the team up to date with the latest methods and technologies. There needs to be easily accessible resources so that the cutting-edge research and development in the areas of data science can be identified and used across the sector.
  3. Reskilling: as well as upskilling the workforce, there will be a need to repurpose existing skills. Data science plays a crucial role in leveraging what we know from existing vectors and can learn from early adoption of new technology. We will need a workforce that adapts, continually develops data science approaches, and embraces digital tools (such as digital twins) to accelerate us to Net Zero.
  4. Support building data science/analytics teams: Recruiting, upskilling and reskilling are irrelevant if the frameworks and infrastructure are not in place for the data scientists to flourish and optimise their outputs. This includes understanding the skills needed within the organisation, that the leadership is chosen appropriately, and that the team has the right mix of personnel.

There were four attributes which were particularly challenging to find when recruiting new data science employees, even for organisations who reported little difficulty in recruiting.

These were:

  • Seniority and practical experience delivering solutions to business problems
  • Advanced skills in data science
  • Advanced coding and software development skills
  • Getting candidates with sufficient domain knowledge

Below we consider these recommendations in further detail as well as share additional resources.

Seniority (Problem Specification and Solution Design)

Many data science teams were relatively new (created in 2010 or after) and the leads tended to have less than five years’ experience in the role. Interestingly, although technical skill was acknowledged as a valuable attribute for a lead, many other soft skills were highlighted as desirable for a leader such as communication, stakeholder management, troubleshooting, decisiveness and strategic thinking. These skills may be acquired through previous experience in other sectors as they are transferable but ideally are blended with technical skill to enable optimal leadership.

In particular, one of the challenges is finding employees who have sufficient knowledge of data science to define business problems effectively and then be able to deliver a solution that really addresses the business need. It is widely recognised that the most common reasons for the failure of Data Science projects typically relate more to alignment with business needs and navigation of business decision making processes than technical limitations of the project.

Resources

The following may be helpful in accelerating the development of these skills:

  • Project lifecycle tools for Data Science – for example this detailed Data Science Project Checklist
  • Guides that help place Data Science within a wider business context – for example Google’s Rules of ML
  • General training on skills like requirements gathering, stakeholder management, problem structuring, design thinking, project management and solution architecture.

Advanced skills in data science

Fundamental data science skills in terms of data analytics, visualisation techniques, feature engineering and modelling development are quite common but can also be quite quickly upskilled from those with rudimentary data analysis experience. However, more complex skills and knowledge of advanced techniques are more difficult to come by. As demonstrated by the recent tools such as ChatGPT, AI and machine learning is a rapidly developing field and it is difficult to keep up with the latest technologies and methods such as Reinforcement Learning. To facilitate this requires having access to the latest learnings, and the data necessary to train these increasingly complex models.

A primary source of the latest techniques and research are universities. However, as covered in our “Data Science: From Academia to Industry” report, such learnings are behind paywalls, or are not written in a digestible way. Further, without open code and data, it is difficult to validate the models or understand the nuances of the methodologies deployed. Therefore, there is a requirement for universities to encourage more open science, which encourages accessibility and reproducibility.

It should be noted that many of the new tools, such as the stable diffusion model which underlies the impressive text-to-image models are being shared openly and therefore supporting uptake within the data science community. Further, academia now is not the sole gatekeeper to the latest developments in AI and machine learning with major tech organisations now at the forefront of some of the most powerful models and research. This does mean that many advancements may not be shared (even behind paywalled journals) due to commercial risks and IP concerns.

Resources

The following are some of the ways to support the development of new skills and knowledge of data scientists:

  • Encourage research to be shared openly to encourage reproducibility and accessibility. See for example the Turing Way guide on open science.
  • Academics and researchers should aim to publish in open access journals and share their findings with the wider industry at events and conferences.
  • Data science research should be shared with code to enable replicability and more rigorous science, for example, consider Papers with Code, or the Open Sustainable Technology
  • Data science challenges hosted on websites such as Kaggle should be utilised to develop skills and identify new advanced methodologies.
  • Webinars and conferences, for example our Value in Energy Data seminars.

Advanced Coding and Software Development Skills

Programming advanced machine learning algorithms for research or exploration are different capabilities than those required for operational organisations who need to implement reliable business products. As the respondents in the survey noted, “it was difficult to find someone who can perform the entire data science pipeline from data analysis to model building, productionising, testing, and evaluation”. Although such skills can be taught on the job this can require significant investments of time from the organisation.

It may be suggested that academia could be one resource for filling the data science skills gap, but as shown in our recent supplementary report “Data Science: From Academia to Industry” many data science masters courses are inadequate for teaching advanced data science coding skills.

Software development skills require understanding of the latest technology stacks being deployed, as well as fundamental best practices such as code review, unit testing, and even proper commenting which are not currently taught within university data science courses and which tend to focus on only machine learning models themselves, with a view to being utilised in research rather than operations.

The following are some resources which may help current and future data scientists develop their coding skills:

  • Skills bootcamps for coding and software development skills.
  • Online courses such as Coursera and Udemy.
  • Gain experience with version control software such as Github, Gitlab, and Bitbucket which are cornerstones of collaborative software development.
  • Utilise ready-made solutions to create well-defined standardised pipelines and templates for data-centric projects. For example, Cookiecutter Data Science, provides a project structure with code templates focused on data analytics.

Note, there are other resources as detailed in the Data Science: From Academia to Industry report, especially the one on Code Development for Academics.

Domain Knowledge

Although perhaps something that can be learnt on the job, having domain knowledge can help speed up the development of models and provider deeper insights into the data and associated process compared to general data science skills alone. Such knowledge can better isolate the causes of different model behaviours, help quickly identify model constraints and limitations, and can help develop informed features and inputs to feed the models.

Learning data science skills as well as gaining sufficient knowledge about the energy sector can be complicated even with dedicated courses in these topics. Further, although there are some university data science courses available which are focused around particular sectors such as UCL’s Energy Systems and Data Analytics course, they are relatively few and insufficient to meet the increasing needs of the energy industry.

Hence there is a need for fundamental energy networks information to be available especially with links to the core data centric applications and data science products.

Resources

Now that we have started to understand the data science skills gap in the sector, we hope that it provides a foundation to build upon and investigate further. Our recommendations are just a starting point, but they will help the energy sector move forwards in the right direction.

If you’d like to find out more, or to discuss how we can investigate these challenges further, get in touch.

Read the Report

Data Science Skills in the Energy Sector: Survey Results

Harnessing Digital and Data

Independent thought leadership and practical expertise that harnesses digital innovation to tackle the hardest challenges on the way to Net Zero

Find out more

Want to know more?

Find out more about how Energy Systems Catapult can help you and your teams