Finding Data#
There is an enormous amount of research data now out there online, either openly available or restricted. Despite the amount of data available, finding the right data for your research project/question is often difficult. The tips below may help you in finding data suitable for your project.
You also need to consider if data is actually reusable - does it have the correct license? And does it contain enough metadata and documentation for reuse?
Finding a dataset#
You can find open and restricted datasets by conducting searches of the metadata such as keyword searches.
You can find data via:
Direct browsing of discipline-specific and multidisciplinary repositories such as Zenodo, Open Science Framework, Figshare.
Search for discipline-specific data repositories on Re3data, FAIRsharing or look at this list of data repositories.
See Data Repositories for more information.
Search in data journals and research articles - you can start by looking at our Chapter on Data Articles.
Use your network to find datasets.
Use specific data search tools:
Check the license#
Once you found a dataset, you need to check the license to see if you can actually reuse the data!
The most commonly used open licences are Creative Commons, Open Government Licence, or an Open Data Commons Attribution License. See our Chapter on Licensing for more information.
Not all datasets that are available to researchers are open datasets. Therefore, if you want to use a restricted dataset, you need to check how you can apply to access it and what the restrictions are on its use. Use of restricted datasets is more likely to have a cost so plan for these costs in advance. Restricted datasets still have a license on them and there should be a clear application process such as a data request form or an email address to inquire about the access.
Check the metadata and documentation for reusability#
After a metadata check to see if the data is of use to you, you’ll need to evaluate the dataset more closely.
The following questions may help you to do so:
What was the original research question?
How was the data collected?
Are the collection and processing methods appropriate to answer my research question?
Is the data collection process well documented? Which instruments were used? What settings/parameters?
Are protocols of the data collection shared?
Is there sufficient information available to understand the dataset and its context/origin?
Is the information complete, understandable and consistent?
Giving credit for use of data#
Once you have used someone elses dataset, you’ll need to cite the data to provide credit to the original data creator(s)!
You need to do this clearly in your research documentation as well as in any research articles you publish. See Citing Research Objects for more information about how to properly cite datasets.
Always check how the original dataset should be cited: sometimes researchers want you to cite the accompanying publication instead of the dataset itself. This information is generally available in READme files or in the metadata of the repository.