A Guide to Cultivating an Open Ecosystem Part 2 - Damon Roberts
Comment by Damon Roberts, Digital Consultant at Energy Systems Catapult.
Part 2: Open Data Licenses
Special thanks to Robbie Morrison for his input to this post.
As detailed in Part 1 of this blog series, releasing data openly is a powerful way to impact and benefit society, however it must be released in a manner that enables users to confidently make use of it. This blog aims to clearly lay out the importance of understanding licence terms, what it means to release data openly, and analysis of the two licences currently recommended by Ofgem.
This article does not constitute legal advice, it simply aims to provide a better understanding of what applying a particular licence means for a data provider, and data users in simple terms.
In the context of data, a licence is an agreement between the provider of a dataset, and someone wishing to use that data, which defines how it may be used. The licence serves two equal purposes; the first is to protect the author’s work, and the second is to give the user confidence in how they can use the content. If data is made available publicly but without a licence, any rights are retained by the provider, and no one can make use of it without explicit permission.
There are numerous other licences available for data, ranging from releasing data into the public domain, to relatively restrictive, non-commercial and share-alike licences. If a licence restricts against a particular field of endeavour, for example stating no commercial use, it is not considered open. The full range of data licences will not be explored within this post. If a more complex licence is required, the key rule to remember is to use an existing, approved and accepted licence, and not modify an existing one, or create one from scratch as this typically reduces the legal interoperability of the data being licenced.
What happens when you apply a licence?
When data or other content has an open data licence applied to it and the content is made public, the licence is typically irrevocable. Even if the provider stops distributing the dataset, any copies or versions being used by data consumers will still be valid, and subject to the terms of the original licence.
The major benefit of using an established licence, such as CC-BY-4.0, is that data consumers can be confident in using the data in their research or products, with no risk of legal repercussions, assuming they honour the terms of the licence. Open data is at its most valuable when it ‘is legally secure, community curated, and avoids duplication of effort’. Applying CC-BY-4.0 to data in the energy sector achieves this aim.
Metadata can, and should be licenced separately to the underlying data to enable easier cataloguing of data, without the overhead of maintaining a list of attributions. The Creative Commons CC0 1.0 Universal (CC0 1.0) Public Domain Dedication is well suited to this purpose.
CC (Creative Commons) BY 4.0
CC BY 4.0 is the least restrictive Creative Commons licence available, and the preferred choice of the two options recommended in . The end user of the data is only required to attribute the original provider, dataset title, and licence. It is good practice, but not required to detail any changes made when adapting or modifying the dataset. When a data provider releases data under CC-BY-4.0, they still own their dataset, and are free to sell, or commercialise the dataset as they wish.
Data licenced under CC BY 4.0 can therefore be used in a closed, commercial setting, and adaptations built upon it can be resold, or used privately, assuming the attribution requirement is still fulfilled. One specific line from the licence text, Section 3(a)(4), can cause confusion.
3(a)(4) If You Share Adapted Material You produce, the Adapter’s License You apply must not prevent recipients of the Adapted Material from complying with this Public License.
It can be read that any adapted material must also be licenced openly, but this is not the case. As long as the original work is attributed, a more restrictive licence can be applied to the specific modifications by the data user. What this line is restricting, is releasing your modifications under a public domain, or other licence that does not require attribution, as this would result in the requirement to attribute the original author being lost.
OGL (Open Government Licence) v 3.0
The OGL is a broadly similar licence to CC-BY-4.0, with even fewer restrictions than CC-BY-4.0. It still requires attribution to the original source, however, has fewer clarifications around database rights, warranties and liabilities, and attributions. OGL is designed to be used by public sector organisations releasing data, and therefore some of the wording is specific to the public sector and may not be appropriate for use by businesses, institutions, and individuals. In practice, this is unlikely to cause issues, however in order to maximise the number of potential data users, and maximise protections around warranty and liability, using CC-BY-4.0 would be preferable for the majority of private sector organisations.
Current and Future Impact
The success of IUK’s PFER (Prospering from the Energy Revolution) program, making use of data already made available through DBP, and ESC’s partnership with National Grid Electricity Distribution (NGED) (formerly Western Power Distribution) on a series of data science challenges highlight the breadth of possibilities opened up by openness
Open-source software and open data encourage collaboration and knowledge-sharing across communities, industries and individuals, allowing for more efficient and effective innovation. By openly sharing data and software tools, anyone can work together to develop new solutions and improve existing ones, leading to better informed problem solving, and ultimately more sustainable and cost-effective technology, systems, and policies.