Why developing household level energy forecasts is difficult: the double penalty effect – Dr Stephen Haben

Comment by Dr Stephen Haben, Senior Data Scientist at Energy Systems Catapult.

Smart meters provide multiple opportunities for energy and cost saving products and services that can help the UK meet its Net Zero ambitions. Smart meters provide half hourly measurements of individual household consumption which gives suppliers better visibility of how and when occupants use their energy (Figure 1).


Figure 1: Illustration of smart meter usage and how different demands are produced from various appliances.

Advanced applications that smart meters can enable in combination with controllable tech (Figure 2) include:

  • Smart, dynamic tariffs: new tariffs can respond to real-time electricity prices, providing occupants the opportunity to shift their demand to save money, and reduce network strain.
  • Demand side response (DSR): households can sign up to DSR schemes where they are paid to turn up or down their usage in response to network needs.
  • Smart storage control: Batteries (either stand alone or utilised from an electric vehicle) can help shift demand or better utilise renewable solar energy by charging when the sun is shining and demand is low, and discharging when demand and costs are high and there is lower renewable generation.
  • Intelligent heating systems: by learning occupants’ preferences and responses to outdoor conditions, smart heating systems can save money, while not compromising customers’ comfort levels.

Figure 2. Illustration of a smart home and various controllable appliances and devices.

Each of these applications (and more) requires varying levels of accurately estimating their near-time future energy usage. Devices cannot respond appropriately without anticipating how this usage is likely to change. However, accurately creating and assessing household level forecasts is not trivial.


Figure 3. Transmission system demand. Picture generated using National Grid ESO Data Portal

Demand forecasting has been implemented for decades but primarily at the national level, with millions of consumers (commercial and domestic) all bundled together (Figure 3). As a result, it is relatively smooth and regular since the uncertainties and deviations in demand typically cancel each other out. Contrast this with household level demand (Figure 1) which is spikey and can vary from day-to-day due to the dependence on each occupant’s behaviour, the different household appliances, and the decisions and responses to weather conditions and other events.

To illustrate this irregularity, consider a scenario where the sole occupant has reasonably similar weekday behaviour. They wake up at 7AM every day, use the electric shower, cook breakfast and then go to work. They then return at around 5PM when the heating is timed to go on, they plug in their electric car, so it is charged for the following day, and they cook their evening meal. This behaviour will result in an electricity profile something like the blue line shown in Figure 4 with the main peak of the day in the evening (like many households).


Figure 4. Typical weekday electricity profile for our illustrative occupant (blue), and their shifted late profile (pink)

Now suppose for some reason (say they forget to set their alarm) one random day a week they wake up at 8AM and their entire day is shifted by about an hour, they arrive at work an hour late, and therefore they come home an hour late, resulting in a shifted peak demand. This is illustrated in pink in Figure 4. The question is, how do we create a prediction for this household’s energy usage so that they can benefit from the services that we outlined in the previous section?

An obvious solution is to use the blue profile as the forecast, that way, four out of the five weekday profiles are very accurately predicted. But what about the fifth day? Since we don’t know what day this occurs we can’t use the pink profile. If we presume the blue profile, then in fact our forecast will be recorded as terrible. Why? Well, the way forecasters measure the accuracy of a forecast is to take the difference between the forecast and the actual demand at each time step of the day and then aggregate them to produce a final ‘error score’. This is where the double penalty effect comes in. If we forecast a peak and we miss it, even only slightly, it will produce relatively large errors. This is because you get one large error for missing the actual peak, and another larger error for predicting a peak which isn’t there. This is illustrated in Figure 5. Even more concerning, the score would be much smaller if instead you had simply forecast a flat line since you only get the one large penalty.


Figure 5. Illustration of the double penalty effect on the peaks in Figure 4. The two arrows indicate the large contributions from the peak errors. The first error is caused by missing the actual peak, and the second large error is from predicting peak which didn’t occur.

The scenario above is of course only one way that a peak may ‘shift’ from day to day. In fact, there are other unpredictable ways where energy usage behaviour may change: traffic delays, shift changes, after work activities, etc. Therefore, the situation is likely much more complicated than above. However, this suggests that the double penalty effect can occur frequently enough to be disruptive.

Why does it matter?

The double penalty errors are likely to proliferate especially in spikey data where the high magnitude peaks are likely to move around. The effect had long been observed in weather forecasting, where the amount of rain may be precisely predicted, but the position may be slightly wrong. The double penalty effect in household electricity forecasts was first described in this paper. The authors compared hundreds of public smart meters and found that a flat forecast was better than a more realistic spikey forecast for 35% of the households.  This suggests that the double penalty effect is likely to be quite prevalent and have serious implications for many of the households’ applications discussed above.


Figure 6. Top picture shows energy flows from grid and battery to household. Bottom picture shows the state of charge (SoC) of the battery. The forecast informs the battery when to charge from the grid (pink) increasing the energy within the battery. From the forecast estimate of the peak, the battery then discharges (blue) and the house therefore uses less energy from the grid. The total energy from the grid is represented by the grey plus pink bars on the top graph.

To illustrate one example, consider using a forecast to schedule a battery storage device. The forecast can help prepare the battery to charge and discharge at the most appropriate times, reducing the peak and using energy at times of lowest demand, see Figure 6. An incorrect forecast can mean using the battery inefficiently, or worse, can create a larger peak than original generated (if it charges when a grid demand peak occurs).

Now, consider using the pink forecast in Figure 4, which predicts the correct magnitude but has estimated the timing slightly incorrectly. Despite this the forecast still provides useful information, and a battery could still use this to prepare the battery since it knows a peak is coming, albeit without knowing the precise timing. In contrast, if we tried to use the flat forecast, we have no information concerning the distribution of demand and cannot use it to optimise the battery device.

The problem is, the standard error metrics will have told us that the flat forecast is the best one, and if used on its own will select a method which is completely inappropriate for the application being considered.

Of course, the actual challenge is more complicated than the above. For one, the flat forecast is obviously a very extreme benchmark to compare to, and likely other, more powerful, methods will be applied. However, this makes selecting an appropriate forecast model more complicated since it is much less easy to reject a reasonable looking forecast (unlike the flat one). In addition, to create a forecast in the first place is usually dependent on identifying patterns in the historic data. This is more difficult if the history isn’t consistent from one day to the next (for example, the peaks shifting by an hour or two).

What can we do about it?

This poses the obvious question, what can we do about it? If our traditional methods for selecting and scoring household-level forecasts are unreliable and can lead to suboptimal results in our applications, what are the options? Below we discuss some alternative approaches to mollify or remove the double-penalty effect.

However, before doing so there is two fundamental principles which should be followed when dealing with smart meter data:

  1. Investigate the fundamental properties of the data. There is still much we don’t know about smart meter data (and low voltage level for that matter). For example, analysts have detailed knowledge of what works at system level from decades of investigation and experience. However, this doesn’t necessarily mean the same assumptions will apply at the distribution or household level. For example, recent work has shown that temperature may not be a significant driver of demand for low voltage feeders, despite it being a known driver of national level demand. One reason for this could be that heating is currently, primarily driven by gas usage, although this could change as we move to electrified heating. Regardless, it highlights that thorough exploratory data analysis and assumption testing is vital to achieve high quality and accurate results.
  2. Understand the heterogeneity of the data. Secondly, it should be noted that since we are dealing with relatively volatile data consisting of very different numbers of connections and types of occupants, it is highly likely that there will be no one-size-fits-all solution to low voltage-level forecasting. It is almost certain that there will be a small proportion of households who have very stochastic energy usage and therefore their demand will be impossible to predict. These will clearly require different approaches than those households who are completely predictable and every week is virtually the same.

Notwithstanding the above points, there are some possible technical approaches which could mitigate against the double penalty effect:

  • Create new error metrics: There are ways to reduce the double penalty effect. One such family of methods is “time-series matching” algorithms such as dynamic time warping, which aligns two profiles by shifting and/or stretching them. These are often used in speech recognition, for example, to show recordings come from the same speaker, regardless of how fast they are talking. However, there are limitations to these types of approaches and there is often a subjective choice required to ensure their usefulness.
  • Adapt the application mechanisms: Rather than trying to create forecasts which are optimal according to a particular metric, instead adapt how recorded data is used within the application. For example, instead of producing a single forecast which we think represents the most likely scenario, we could create multiple possible forecast, one for each type of possible scenario identified in the data (In the above example there is two types, the blue typical scenario, and the red late scenario). In the storage application a schedule could therefore be produced for each individual type so that all bases are covered and there are no surprises due to shifting peaks.
  • Probabilistic forecasts: If there is sufficient data, then rather than modelling individual profiles, a full probabilistic distribution can be estimated which describes the full range of possible values that the profile could take. Multivariate distributions can also incorporate the interdependencies between demand at different time steps ensuring that the possible demand scenarios are all captured (Figure 7). These are much more complicated to model and understand, but there are methods such as the Energy Score which can help assess and compare different types of forecasts.

Figure 7. Example of actual load (blue) and several forecast scenarios (grey). In reality there would thousands of forecasts to estimate the possible energy profiles.

Take aways: gaps and needed actions

Data knowledge gaps:

Data being collected at the household level and low voltage substations has new features and characteristics which have not been present in traditional datasets (such as national level demand).

Action: Take time to explore and understand the patterns, prominent features, and relationships in new data types. This will help you understand what the opportunities and limitations of this data are, and what adaptions or mitigations are required.

Data gaps

There are very few open datasets from smart meters or secondary substations which hinders knowledge, limits deeper understanding and slows down innovation, especially in areas of low voltage and household level data analytics.

Actions: The changing of aggregated smart meter data to be presumed open is a significant stride forward for enabling further research into analytics, but there is still work to be done. The opening of individual smart meter data will be necessary to support the development of new, robust and resilient, methodologies and techniques.  Furthermore, it will be important to share and link these datasets to other valuable datasets, for example weather data, to ensure that the rich diversity and interdependencies are identified and captured.

Research gaps

As the examples above have demonstrated, there is not a simple correspondence between the measures and metrics that are commonly used and the performance of the applications. We need to be able to identify the most important features of our datasets within the context of the intended products and services, so the most detrimental effects are taken into account. Furthermore, we need to consider the wider interactions and interdependencies in this system otherwise our applications will perform sub optimally.

Actions: In addition to the need for more data (see above), innovation will require new whole-systems led research and investigations to fill our knowledge gaps and ensure we can make most of the data we are collecting and digital technologies we are deploying. Energy companies will need to fund research and universities to help move forward our knowledge and help us rapidly deploy the learnings into operations and help us meet our Net Zero targets.

Community gaps

The double penalty is an example of an important and disruptive feature which is still not fully investigated and has no easy solutions. Further, awareness of this issue is largely minimal, even within the relevant research community. As smart meter and secondary substation data becomes more widely available there is likely to be other similar issues which not only hinder progress but may create a domino effect of inefficiencies in cases where technologies are reliant and dependent on each other, for example, smart homes feeding into smart local area networks.

Actions: It is important that interdisciplinary researchers working in the areas of smart local energy systems are actively engaging and sharing with each other, and that learning is disseminated widely to ensure innovation can progress as rapidly as possible. Energy Systems Catapult is supporting this through our engagement with academic groups, the ADViCE programme, our webinar series with experts in data and digital, amongst many other activities, but this is the responsibility of the entire energy innovation community. Net Zero is the biggest challenge of our lifetime and will require collaboration and engagement on multiple different fronts if it is to be achieved.

If this is something you are interested in getting involved in then we’d love to speak to you.

Harnessing Digital and Data

Independent thought leadership and practical expertise that harnesses digital innovation to tackle the hardest challenges on the way to Net Zero

Find out more

Want to know more?

Find out more about how Energy Systems Catapult can help you and your teams