
How Open Data Empowers Knowledge Generation
In a world powered by data, open data provides equal access and reuse rights. This enables researchers to build upon existing data to create further insights and governments and industries to leverage information to innovate.
Open data requirements are being mandated by governments and institutions around the world. This is because they recognize the inherent value in data access and reuse to advance knowledge generation.
Here, we explain the importance of data, what open data is, and how open data empowers knowledge generation.
Why data are important
Data come in various forms, representing information such as statistics or lists of names, and require analysis to transform them into insights.
Collecting good data and having data analysis capabilities are key to making informed decisions. Processes can be optimized and patterns understood, enabling you to predict the outcomes of new strategies.
Data form our understanding of the world by integrating knowledge with modern technology. This, in turn, informs all aspects of society, including healthcare decisions, government policies, and business strategies.
Because of this, data are very valuable and often stored with restricted access. This can be to preserve privacy and ensure security, but also to ensure that others do not benefit from access.
What is open data?
Open data are data that can be freely used, reused, and redistributed by anyone, having only the requirement to attribute and share-alike. Attribute and share-alike are terms commonly used in Creative Commons licensing, and refer to giving credit to the creator and resharing in the same format.
The FAIR Guiding Principles for Data Management and Stewardship emphasise standardisation and machine-readability, helping to standardise data practices. FAIR stands for:
- Findability: Metadata and data should be easy to find by humans and computers.
- Accessible: The user needs to know how the data can be accessed.
- Interoperable: Data must be integrated with other data and applications for analysis, storage, and processing.
- Reusable: Data must be clearly described so they can be replicated or combined in different settings.
The common goal of open data and the FAIR Principles is to optimise access to and the reuse of data.
Funding agencies, higher education institutions, and publishers may ask you to create a data management and sharing plan to include in a research proposal. Click here to learn What is a Data Management and Sharing Plan?
Citizen science
An element of open science, citizen science refers to scientific work that is performed by ordinary people without special qualifications to support the work of scientists.
A common practice in citizen science is data collection. A famous example of this is the collection of data about butterflies, as citizens use an app to record butterfly sightings. This helps to track population levels and the effects of climate change.
Citizens can provide data that would otherwise be expensive, time-consuming, and resource-intensive to collect. In essence, the more people you have collecting data, the more data you’ll be able to collect, the more varied they’ll be, and the quicker you’ll be able to get them.
If data are produced by citizens, then it may be natural to publish them openly. Data restrictions would block the producers of citizen science-led data projects being able to access their work.
Click here to learn more about citizen science.
How open data enables knowledge generation
Open data has a range of benefits for researchers and broader society as it empowers knowledge generation. Here, we outline some of these benefits.
Supports researchers
Collecting data can be a resource-intensive and time-demanding process, often requiring access to hardware and software to track and preserve them. Alternatively, if researchers that have limited resources want to utilize data created by others, there may be a fee to access them or restrictions on reuse that limit their ability to create research.
Furthermore, access restrictions could lead to researchers using resources to gather the same data, therefore dedicating time and money to work that could be dedicated to creating insights or different data.
Ensuring that data are open, however, enables researchers to build on existing data, which may save time and resources or even enable them to conduct research that otherwise would not have been possible.
Moreover, open data can enable interdisciplinary research by empowering researchers to engage with and utilize data from outside of their discipline. Interdisciplinary research can lead to deeper research and new approaches being developed.
In short, data restrictions can lead to inefficient resource use, whilst open data can empower researchers to leverage data to create deeper insights.
New technologies
New digital technologies are being developed by utilizing the vast datasets that are available.
For example, artificial intelligence (AI) tools like Large Language Models such as ChatGPT are trained using a huge volume of data to output text and images based on user prompts.
The quality of these outputs is based on the quality of the data the models are trained with. Therefore, ensuring high-quality data are openly accessible and machine-readable can increase the accuracy of AI models. This is highly important with the popularity of these tools; ChatGPT boasts 700 million weekly users as of August 2025.
Another example of an emerging technology that is powered by data is digital twin. A digital twin is a virtual replica of a physical object, person, or process that can be used to increase understanding and perform tests. They are linked to real data sources in the environment and update in real time to reflect changes in the physical world.
The properties, qualities, and capabilities of a digital twin model are informed entirely by data. Access to open data can enable researchers and members in industries to utilize common data to gain a deeper understanding of something without having to create and test physical models.
They enable data-driven decision making, the monitoring of complex systems, product simulation, and management of an object’s entire life cycle.
These deep insights into systems can save time by enabling rapid iterations and the optimisation of product design by removing the need to build and test individual prototypes. Moreover, by reducing the requirement for physical materials and resources during product design, DTs can increase sustainability.
Click here to learn more about how digital twins are implemented.
Real-world applications
Data informs many industries. These include healthcare decisions, like which treatments and medications to prescribe, and business decisions, like which departments to expand and products to develop.
This is because data provide qualitative and quantitative evidence that reveals patterns and effects.
Open data creates opportunities for anyone to benefit from shared information. This particularly supports institutions in low-to-middle income countries or areas and those in early stages of their development.
The World Health Organization (WHO) have the Global Health Observatory, which is a resource featuring data on various keywords. These include air pollution, child mortality, tuberculosis, and much more. All resources, reports, and data are freely accessible.
For example, the WHO host the publicly accessible global individual patient data platform for drug-resistant tuberculosis treatment. This gathers outcomes of over 10,000 drug-resistant tuberculosis patients, aiming to enhance research on new medicines, repurposed medicines, and patient outcomes.
In sectors like healthcare and environmental maintenance, utilizing openly available data can save lives.
Open data, open knowledge
Data informs decisions, technologies, and future research. However, many data are restricted by access fees and reuse limitations.
Open data creates equal access and reuse rights. By establishing banks of common knowledge that anyone can leverage, new knowledge can be generated by researchers, industries, and governments, potentially saving lives or advancing technologies.
Open data is part of open science, the mission to ensure openness throughout the entire research process. MDPI is committed to supporting researchers in opening their outputs and boosting the visibility of their work.
Click here for our article, All You Need to Know About Open Access, which covers a range of topics that can help boost your understanding and also keep you up to date.










