Chevron Home page

Reinforcement Learning: A deep-dive with digiLab – Dr Charlotte Avery

Comment by Charlotte Avery, Data Scientist at Energy Systems Catapult.

Decarbonising our energy consumption in a timely manner is critical. This is complicated by many factors, not least the fact that renewable generation is intermittent.

Today, keeping the lights on relies on National Grid performing a continuous balancing act between electricity supply and demand at a national scale. In the future, we might well see the development of a more decentralised system, where electricity is generated, bought, traded, stored, and consumed in ‘local energy markets’, and the balancing act is managed across many sub-sections of the network simultaneously.

This setup is advantageous in a future where the uptake of low-carbon technologies (which run on electricity rather than gas) is high, making the balancing act on a centralised grid system very challenging.

Either way, when there is not enough supply to reach demand, we can’t just turn on the wind or the sun to generate green energy, resulting in us firing up polluting coal power stations.

To facilitate the UK’s acceleration towards Net Zero, we can optimise the amount of energy we use by reducing energy waste, taking energy from the grid when the penetration of renewables to the transmission system is high, and using less energy (or discharging batteries for energy supply) when there is not enough green energy available. Furthermore, in a decentralised system, we can optimise the energy flow between local systems and the storage, distribution, and demand within these individual systems.

Automated solutions using Artificial Intelligence (AI) and Machine Learning (ML) can play an important role in helping us optimise our energy use in exactly this way. Reinforcement Learning (RL) – a particular type of ML technique – has gained increasing interest from tech innovators and the data science community owing to its astonishing success in beating world champions at the game Go and playing a key role in transforming the way we use the internet in the chatbot ChatGPT.

Reinforcement Learning and decarbonisation

Perhaps we should be using RL to tackle, arguably, the hardest challenge on earth right now – decarbonising energy consumption to mitigate the devastating effects of climate change.

RL is found to be a promising advanced solution for optimised decision making and control, in other words, RL algorithms can act as a controller for processes. For example, controlling the charging/discharging of an energy storage battery over time. But using RL for such solutions is non-trivial and there are many complications along the algorithm development journey.

Progress in the RL-energy space is somewhat slowed by companies wanting to use their RL skillsets in the energy sector but not knowing what problems it can be best applied to, or the energy innovation sector lacking the skills required to understand and implement RL-based solutions.

To aid this, I decided to investigate example use cases of Reinforcement Learning for energy solutions:

  • Carbon Re are using RL to reduce carbon emissions in cement production.
  • E.ON are using RL to determine the optimal placement of wind turbines in wind farms.
  • digiLab are using RL to determine the optimal locations for solar panels on rooftops in local areas.
  • Microsoft: Project Bonsai improved the energy efficiency of their HVAC systems and uncovered counterintuitive recommendations to what a human would assume using their RL solution.
  • digiLab are using RL to optimise room temperature control for their ‘TwinCity’ solution.
  • Debmalya Biswas published a report on their successful implementation of RL-based HVAC control in a factory in Switzerland.
  • Google showed 40% improvement in energy efficiency of their data centres using their RL solution.
  • Phaidra – a company founded by the people responsible for the improvement in Google’s data centre cooling who focus on providing RL-based solutions for optimising industrial systems.

More detail on how RL is being successfully applied in these scenarios, and where RL shows potential for future innovation in the energy space, is outlined in detail in our white paper.

In conversation with digiLab

To get more insight into how innovation is happening in the RL-energy space, I spoke to digiLab about how they are taking advantage of RL to develop cutting-edge solutions in the energy sector:

digiLab are using deep-RL (i.e., RL agents combined with neural networks) methodologies to design the distribution of solar panels across rooftops in local areas, and to design flexible energy management systems for commercial buildings.

The advantage of using RL for these problems is that RL can be used to find solutions in large state spaces with time-varying components and is capable of integrating quite sporadic data types together.

In other words, RL can be useful in scenarios where you have lots of varying parameters such as roof angle, panel facing direction, physical placement etc., and variation over time in factors like strength of solar energy and the development of the built environment.

One problem with RL is the training costs. RL agents learn optimal control strategies by interacting with the external environment through a ‘trial-and-error’ based approach (analogous to a child exploring and learning in their surrounding world).

This means the training phase is particularly costly in terms of time and computational power. Agents are often trained in a simulated environments ‘off-line’ prior to being deployed in the real-world (which can be computationally difficult to do). I asked digiLab how this problem can be addressed:

High computational power is becoming more accessible and computing time is becoming cheaper. Because of this, data acquisition (e.g., using digital twins) and data cleaning will become less of an issue moving forward. One way we can improve RL algorithms is not only through human feedback (i.e., getting a human to intervene with the reward function) but through anticipation feedback.

Anticipation feedback is essentially where you add in anticipation information (from pre-training) into the RL states to avoid aimlessly searching through paths.

Stepping back and thinking about the broad picture, what is the motivation for pursuing solutions in the RL-energy space, how important will RL solutions be?

For renewable energy solutions there are a lot of design choices. When designing a solar-powered energy system, for example, you need to determine the best places to put transformers and solar panels. As well as dealing with a large state space, you have a time-varying component which becomes important, say, if you want to design a solution where you can sell solar energy back to the grid at optimal times. RL is an advanced solution for such problems.

Another example is optimising the control parameters in nuclear fusion reactors; RL can be used as a method for determining feature importance which can be used to inform and improve human-managed control. Furthermore, stakeholders in the energy industry are going to have so much data to process and the challenge will be to do this sensibly so humans can make decisions from it. RL adds value here as is it all about optimal control – making the most of what you have.

I asked digiLab what their advice would be for companies who are looking to develop new solutions in the RL-energy space, as they are doing at present.

Outsource your AI problems – starting from zero is expensive! As more AI and ML tools are developed, we should be using and adapting these tools for new innovations.

Lessons learned

  • Reinforcement Learning is most advantageous to solve complex problems with a huge number of parameters and time-varying components, especially where many factors depend on each other in non-trivial ways, and when you have sporadic data types.
  • There are multiple success stories of using RL to improve energy efficiency and cut carbon emissions in industrial and commercial settings, and there is ongoing development in this area.
  • Reinforcement Learning operations (that is the underlying model development, environment development, testing and validation methods) are not very mature – there are not loads of sophisticated resources out there to aid with the development of RL solutions. Some RL operations challenges are addressed in the open source deep-RL library AgileRL, and OpenAI Gym attempts to address the need for better benchmarks and lack of standardised environments in RL development. However, in the latter example, the Gym environments are focused on generic use cases of RL (such as game playing) and sophisticated environments for ‘real-world’ applications in the energy sector tend to be developed and kept in-house.

Overall, Reinforcement Learning is an exciting prospective solution to complex control and optimisation challenges, however deployment doesn’t come without its difficulties.

Many of these hurdles are addressed in our white paper, and we further discover how RL compares to alternative, competing methods in the paper as well as the risks that come with RL implementations (including where and, importantly, where not to apply RL).

Read the Report

Prospects for Reinforcement Learning

Harnessing Digital and Data

Independent thought leadership and practical expertise that harnesses digital innovation to tackle the hardest challenges on the way to Net Zero

Find out more

Want to know more?

Find out more about how Energy Systems Catapult can help you and your teams