In the Duke Research Data Repository, metadata are the pieces of information like Title, Creator, or Description that help people and indexing machines understand what a dataset is about. In many fields of research, the word metadata refers to contextual information like experimental conditions that might be recorded inside a dataset; here, we’re referring to information about files that will be stored outside the dataset. Metadata provides the “at a glance” information readers need to assess a dataset and enables dataset discovery through search tools.
Because metadata is so important for finding and interpreting datasets, the Duke Research Data Repository requires several pieces of metadata in the submission process. Other metadata is recommended. If you have questions about these metadata guidelines, please email us.
The following descriptive information about your dataset is required for deposit with the Duke Research Data Repository. These are fields marked with a red asterisk in the dataset submission form and the form will not let you proceed without completing them.
Provide a descriptive name for the dataset. Consider the following guidelines:
If including data underlying a publication, use the following structure:
Data from: Title of Publication (Article, Monograph, Report, etc.)
Data and scripts from: Title of Publication (Article, Monograph, Report, etc.)
If your data are not associated with a publication, but with a larger study/project, consider including descriptive information that will readily identify your dataset. The following details are often helpful:
Some examples:
List the names of the people involved in creating or authoring the data. These are the individuals who will be listed in the data citation. There is a secondary contributor field (see below) that is available for individuals who contributed to the dataset but who should not be included in the data citations as a creator or author.
The Creator field has several parts for each individual. To format the information in each field:
After you fill out the fields for one creator, click the “Add group” button to create a new set of fields for another creator.
In this field, please enter a name and email address.This contact is responsible for answering questions about the data, documentation, and/or code for this project. Contact information may be helpful to other researchers if they need additional information or guidance when re-using the data or reproducing the research. Providing this information may also lead to opportunities for collaboration.
You can change the contact person for a dataset through the metadata editing process, if necessary.
Enter any terms or topics that describe your research or would help make it discoverable in the Research Data Repository. Click the + button to the right of the field to add additional keywords. As part of the data curation process, the terms you enter here may be normalized to an established heading from among the Library of Congress Subject Headings or U.S. National Library of Medicine Medical Subject Headings, where appropriate. (If appropriate terms cannot be identified, the keyword will be applied to your dataset as you have supplied it).
Please provide a general description of the research that produced the data, including the research question(s). We strongly encourage robust descriptions of your actual data including information on where and how the data were collected/generated, data types, methodological information helpful for reproducibility (e.g., programs used), and other contextual details about the data; however, we would accept an article or grant abstract. Information in this field will help other users to better understand how your research contributes to the field and whether or not they may be able to reuse your data in their own research.
Licenses tell other people how they may use the material you have shared in the Duke Research Data Repository. Deposits will default to a CC0 public domain dedication. By applying a CC0 waiver to your data, you are releasing your data for reuse and redistribution without restrictions under copyright or database law. Users of data within the RDR are expected to give credit to data creators by following data citation norms. To learn more, see the Creative Commons web page or consult the RDR Licensing Page.
To add an additional license (for example, to license some parts of your deposit CC0 but another part under a GPL software license), click the + button to the right of the field.
Choosing a subject helps group your deposit with similar datasets. To choose more than one subject, use the + button to the right of the drop-down menu.
Choose a creator’s Duke department affiliation from the drop-down menu. Use the + button to the right to add additional departments. For non-Duke creators, please add their information to your README file.
Choose a creator’s Duke center or institute affiliation from the drop-down menu. Use the + button to the right to add additional centers or institutes. For non-Duke creators, please add their information to your README file.
Please list any other person(s) or organization(s) who may have contributed to the creation of the dataset, but whom you do not wish to include in the full dataset citation. These fields are formatted like the Creator fields described above, with an optional “Role” field to specify the contributor’s role in the research process. You can also add an ORCID for a contributor (format the ORCID as a URL) and search for Affiliations to associate with the Contributors.
If it is important for the analysis or reuse of your data, please provide the date(s) in which your data were collected or generated. Be as specific as you can in defining the beginning and end of the data collection period. This should be formatted numerically as YYYY-MM-DD. A range would be formatted like "2025-01-01 to 2025-10-30."
Enter information about publications, datasets, code deposits, or other resources that reference or connect to this dataset. For example, a publication based on the submitted data might be entered as below:
A connection to related code on Github might have the following information in each field:
If your deposit is associated with a publication but your article has not come out yet, you can modify this field later through the metadata edit process.
This field accommodates information about the geographic location in which your study data was collected. Enter a value here if this information is significant for the analysis of the data collected. This information may take the form of a country, state, or locality. To add another location, click the + button to the right of the field. Note: during the data curation process, this information may be normalized to an entry in the GeoNames geographical database, and may thus differ slightly in appearance or specificity upon dataset publication.
Please list the primary language in which the data and supplementary content are written. This field is meant to be limited to spoken languages; if you wish to include information about programming languages relevant to your dataset, please include that information in the description or in the dataset documentation. If there are multiple spoken languages, please note that in the description.
Please list the primary funding agency or agencies that supported the research project that generated or collected the data. Identifying the funding agency can help demonstrate compliance with any data sharing requirements made by the agency and connect your research with a larger body of work.
The funding source field has multiple parts. In the Funding Agency field, type in your first funder and click the magnifying glass icon, then choose your funder. The next two fields will auto-populate from that selection. (If you cannot find your funder, try expanding from an abbreviation to the full name, like searching on “National Institutes of Health” instead of “NIH.”) You may enter your grant number and grant title next, then a link to a grant page (e.g. NIH Reporter record).
To add another funding source, click the + Add group button. If you have multiple grants from the same funder, please use the + Add group button add a new entry for each one.
Metadata is a supplement, not a substitute, for dataset documentation such as README files, codebooks, data dictionaries, or methodology reports. The key difference between these is that metadata is best for increasing the findability of a research product, while data documentation makes it more possible for others to correctly interpret and replicate your work.
The simplest, fastest form of documentation is the README, a plain-text document that records the need-to-know information about a project or file. Download the Duke RDR README template to access a fill-in-the-blank document customized for Duke Research Data Repository deposits. Cornell's "Writing READMEs for Research Data" and "Writing READMEs for Code and Software" are practical guides that can get you started.
For more information, or for guidance on how to best compose documentation for your dataset, please contact us.