Why Open Data is Important

Open data is part of the broader open science movement. Advocates recognise that, in a world powered by data, it is valuable for them to be equally accessible. They argue that open data is important because it can advance science and scientific communication and help boost innovation and collaboration.

Here, we’ll define open data and its advantages and disadvantages. Also, we’ll explore the FAIR Data Principles and give you information on how to ensure your data are open.

What is open data?

Scientific data refers to information that are based on research. These can include material like gene markers, temperature recordings, and medical records.

Open data are data that can be freely used, reused, and redistributed by anyone, subject at most to the requirement to attribute and share-alike. Attribute and share-alike are terms commonly used in Creative Commons licensing, and refer to giving credit to the creator and resharing in the same format, respectively.

Generally, open data must meet three criteria:

  1. Accessibility: The data must be freely and easily accessible, such as through an open data portal, formatted, and in its entirety.
  2. Reuse and redistribution: Permission should be clearly stated regarding reuse and redistribution, typically through a license.
  3. Availability: The data must be equally available to anyone.

Various countries have introduced mandates requiring data produced from public funding be made openly available. In many cases, these mandates are supported by national or even international repositories, such as the European Union Open Data Portal or the Global Data Index.

Overall, open data is part of a global movement to ensure that data are accessible and usable to all. The movement’s values are supported by the FAIR Data Principles.

What are the FAIR Data Principles?

The FAIR Guiding Principles for Data Management and Stewardship emphasise standardisation and machine-readability to cope with the increase in volume, complexity, and creation speed of data.

  • Findability: Metadata and data should be easy to find by humans and computers.
  • Accessible: The user needs to know how the data can be accessed.
  • Interoperable: Data must be integrated with other data and applications for analysis, storage, and processing.
  • Reusable: Data must be clearly described so they can be replicated or combined in different settings.

The goal of the FAIR Principles is to optimise the reuse of data. The abbreviation FAIR/O is often used to indicate that a dataset or database complies with the FAIR principles and have an open license.

Here, data refers to digital objects and metadata to information about that digital object, which can helps ensure data are interoperable. Interoperability is an important aspect of data management and openness.

Why is interoperability important?

Interoperability refers to the ability of different systems to work together, i.e., inter-operating, and mixing different datasets. This is often achieved by producing standardised and clear metadata, which improves useability and enables users to easily trace data back to its source.

For example, it would allow for a database to pool together different databases into one centralised source. Open data advocates often refer to creating a ‘commons’ of data, which refers to an open and accessible large collection of data.

Furthermore, the amount of data being generated far surpasses what anyone can read or effectively manage. Ensuring data are interoperable can enable automation by improving their readability by artificial intelligence.

Benefits of open data

Open data is important to people advocating for improving transparency, encouraging collaboration, and boosting innovation.

By publishing data openly, reviewers can ensure that the research it supports is reproducible and free from errors. And it can provide a deeper insight into the research process. Moreover, Dorothy Bishop, Professor of Developmental Neuropsychology, argued in a recent blog post that

People can invent raw data as well as summary data, but realistic data are not easy to fake, and requiring open data would slow down the fraudsters and make them easier to catch.

At a time when paper mills and fraudulent research are rife, this is highly valuable.

Next, open data can enable researchers to build upon and share information to gain deeper insights. This is valuable for interdisciplinary researchers and international collaborations, particularly in countries with lower funding for producing data.

Finally, it reduces the chance of the same data being collected by different researchers, lowering costs and increasing efficiency. By pooling data, researchers can proactively build upon each other’s insights and findings, therefore enriching analytical capabilities.

Challenges of publishing data openly

The widespread open publication of data does face some challenges.

Firstly, there are privacy and security concerns. Certain forms of data, such as for health, involve personal information that contributors may not consent to the sharing of. Further, the mosaic effect could occur, which refers the potential of combining information from different anonymised datasets to obtain personal information. Certain forms of data must be evaluated and screened to ensure that personal information is secure or whether it should ever be shared openly.

Second, running a data repository is very resource intensive. This includes high startup and operational costs, and also high amounts of computing power. Estimates for the price of open data initiatives range between €20,000 and €100,000 per organisation. The need for security and long-term preservation increases the resource intensiveness of repositories.

How to prepare your open data

Sharing your data openly requires ensuring they are structured clearly and are easily understandable by humans and machines. The common measures for well-structured data include accuracy, completeness, timeliness, consistency, and reliability.

Good metadata management is key. This provides your data with context and structure, answering questions like who produced them, what they are, where they are stored, how you produced them, and why. This also ensures that your data are machine readable, enabling them to be integrated into other databases and reused by others.

Finally, ensure your data have a clear license. This can easily be achieved by assigning them with a Creative Commons license, which enables you to decide on whether or how it can be reused and redistributed.

Make sure to speak to your publisher and/or funder about their data policy before you share it anywhere.

Why open data is important

Open data is important for increasing transparency, encouraging collaboration, and boosting innovation. The movement is supported by the FAIR Principles. These provide users with a guide on how to ensure that data are reusable and machine readable.

As governments introduce mandates on sharing data openly around the world, it is worth seeing how you can leverage open data in your research and potentially contribute your own.

We’re dedicated to giving you all the information you need to understand Open Access. Our article All You Need to Know About Open Access covers a range of topics.