Duke Research Data Repository Frequently Asked Questions

We follow best practices for data sharing and archiving. We will provide you with a DOI to include with your manuscript proof or granting agency documentation, apply standardized metadata for discovery, assign a standard Creative Commons license, and keep your data safe through a preservation infrastructure. We have also outlined our compliance with the NIH Desirable Characteristics for Repositories. Learn more about expectations in changes in grant funding requirements related to the OSTP "Nelsom" memo and the now in effect NIH Data Management and Sharing Policy.
The FAIR Guiding Principles were developed by a diverse group of stakeholders across the scholarly landscape to provide “a concise and measurable set of principles...act as a guideline for those wishing to enhance the reusability of their data holdings.” Spelled out, FAIR is - Findable, Accessible, Interoperable, and Reusable. FAIRness is established through the use of standardized machine-readable metadata, persistent identifiers (DOIs), clear license terms, transportable formats, and contextual documentation. To learn more about FAIR, see: https://www.go-fair.org/fair-principles/
We will allow an embargo of up to one year if needed. If you need to assign and embargo or would like to request an embargo longer than one year, please contact us at datamanagement@duke.edu.
It depends on the data submission, but we try to turn around deposits as quickly as possible (between 1-3 business days). If your data are sent to the Data Curation Network (see below) this may add time but we can provide you with a DOI prior to publication in these cases. Your data will go through a curatorial review to ensure that your deposit package is complete. If additional information is needed or changes are required, prompt attention to the request for additional information or changes will ensure the fastest turnaround.
  • A digital object identifier (DOI) for persistent access
  • Standardized Dublin Core metadata for discovery
  • A customized citation for proper attribution
  • Curatorial review to support optimum reuse
  • File format transformations for longer term reuse (when possible)
  • Long-term archiving and preservation

A Digital Object Identifier, or DOI, is a type of persistent identifier used to uniquely identify an information resource or other object. Persistent identifiers are long-lasting references to a document, file, web page, or other object, usually digital. Typically, these references may be entered into a web browser and will resolve to the specified resource. All objects in the Research Data Repository are assigned an Archival Resource Key (ARK) on ingest, which will serve as the basis for a persistent link to each object in the data package. Additionally, datasets will be assigned a DOI at the level of the complete data package, which will be included in the dataset citation and will provide persistent access to the data.

We are unable to restrict access to datasets by registered user at this time. If you do need to restrict access to your data, we may be able to help you find alternative options. Contact us at datamanagement@duke.edu.

We may perform the following steps during the curation process:

  • Open the data files to verify their contents
  • Check to determine if there is enough documentation to describe the attributes and content of the data as well as the context of the research process (i.e., data sources, programs, etc.)
  • Review and run scripts (when appropriate)
  • If the data are on human subjects, perform a review to determine that all direct identifying information is removed (e.g. HIPAA 18)
  • Review data for other legal or ethical considerations that might impact data sharing (i.e., licenses, copyright, etc.)
  • Check for typos, duplicate files or other errors
  • Identify additional suggestions for how to enhance the data package

We are also a member institution of the Data Curation Network (DCN), which expands our capacity to curate data from a large number of disciplines and data types by accessing a network of curators across 10 institutions. The DCN also provides another usual conceptual model for curation (see the CURATED steps)

We offer data curation to enhance your data package as a complete published collection, and to ensure that it is optimized for long-term preservation. This includes checking for documentation on data context (methods, file relationships, software needed and version) and attribute definitions (rows, columns, values) and migration to open, stable formats when possible (or creating an open, stable version alongside the original). Data curation helps to ensure that your data are meeting the FAIR guiding principles.

The Data Curation Network (DCN) is a cross-institutional staffing model intended to leverage the expertise of data curators at partner institutions. Ideally, the DCN exists to

  • provide expert data curation services for Network partners and end users,
  • create and openly share data curation procedures and best practices,
  • support training and development opportunities for an emerging data curator professional community, and
  • expand into a sustainable entity that grows beyond our initial partner institutions.

If your data (subject area and format) match the expertise of one of the DCN curators, we have the option to send your dataset to the DCN for review. We will always ask you before doing so!

We encourage the Duke community to determine the repository that best meets their needs. For instance, your funder or publisher may instruct you to use a particular repository, or your scholarly community may have a disciplinary or content-type repository that people commonly publish in (which is a great option!). Duke also supports or is a member of a number of disciplinary repositories you may want to consider:

We are happy to help you prepare your data for deposit in other repositories if needed or help you identify an appropriate repository. Contact us at datamanagement@duke.edu for assistance.

At this time, the RDR cannot accept any data that would require special access conditions or are considered either sensitive or restricted according the Duke Data Classification. Examples of data that could be considered sensitive includes human subjects data containing personally identifiable information (PII), protected health information (PHI), export controlled data, geographic locations of endangered species, data bound by data use agreements, etc. If your data needs to be restricted, please contact us at datamanagement@duke.edu. We may be able to help you find an alternative archival solution.

While it's true that redundant copies can help assure you that your data are safe, multiple copies of datasets can confuse users of your data, and can often be difficult to keep in sync. We discourage the deposit of data in multiple locations, but we understand that sometimes it may be necessary to keep a copy of the data elsewhere. If your data have already been assigned a digital object identifier (DOI) by another repository or database, we will not assign one for the copy that will reside in the RDR. Likewise, we will make cross-references to any other copies of your dataset in the metadata of the RDR copy.

Yes! We can create a collection if you have a number of datasets (at least 3) related to one specific project. For instance see these two examples (example 1, example 2). If you would like to create a collection of datasets you have deposited into the RDR, please contact us (datamanagement@duke.edu) and we will have you complete a brief form to create the collection.

The Duke Research Data Repository provides 300 GB of preservation storage per deposit for Duke researchers (defined as graduate, post-doctoral, research staff, and faculty) at no cost. For larger datasets, please contact us to discuss the feasibility for the RDR to accept your deposit based upon the scope and scale of your data. Additional preservation costs may also be assessed based upon the size of the submission.

For projects planning for data preservation and storage for grant applications, please contact us at datamanagement@duke.edu for planning and tracking purposes. We can also provide you with boilerplate language or Letters of Support as appropriate.

Globus is a system for transferring large files. The RDR uses Globus to support the upload of datasets over 10GB into the repository and the download of datasets over 2GB. See our Globus documentation for more information.

Duke University Libraries has outlined its commitment to the long-term preservation of and access to the assets curated in its digital repositories. The Duke Research Data Repository (RDR) is included under the Libraries' general digital Preservation Policy. While we currently do not have a formal retention schedule for data published in the RDR, we anticipate keeping datasets for at least 25 years (unless a shorter preservation period is selected and paid for during the submission process according to our Pricing for Storage policy). We will consider usage statistics and the value of the data to the research community when assessing the ongoing preservation of data. Any data removed from the repository will be transferred back to the depositor.

If you leave the university, we will continue to retain and make your data available under the Data Deposit Agreement. If you leave Duke, please send us any updated contact information so we can update your dataset record. Providing your ORCiD identifier (which will follow you throughout your career) at time of deposit can help us keep your contact information current.

Generally data deposited with the RDR should be in its final, publication-ready form. However, if an error is discovered, or if additional data files or documentation are needed, we are able to make slight modifications to an already published dataset. The new dataset will receive a new DOI, and the two will be linked in our system. We will continue to provide access to the previous version of all files to ensure persistent access to previously published materials (if you need to permanently remove files, see our Deaccessioning Policy). Changes to the dataset metadata can be made at any time.

To request a modification to your dataset, navigate to your dataset in the repository, click on the "Request Modifications" button, and complete the form (please see below). One of our curators will be in touch to process your request.

Request modifications button

Request modifications form

Data deposited with the RDR should be in its final, publication-ready form; the RDR is not an appropriate solution for data that are being actively managed in the course of your research. However, we also understand that errors may be discovered post-publication or additional data or documentation files may need to be added. In these cases, the RDR staff can help you create a new version of your dataset. We will continue to provide access to the previous version of all files to ensure persistent access to previously published materials (if you need to permanently remove files, see our Deaccessioning Policy).

Versioning should not be used as a method to add files to a collection from different waves of a study or as a method to preserve data during the active research phase of a project. If you foresee your dataset evolving over time then consider a "release cycle" for your data. If you need to version content already within the repository or have questions about creating a plan for publishing dynamic data, contact datamanagement@duke.edu.

When a dataset has been versioned, the dataset will be assigned a new DOI, and the bibliographic citation for the dataset will be amended with a "V2" distinction. A "Provenance" field will be added to the metadata with a narrative explanation of what has changed between the versions.

Citation and provenance for a versioned dataset

At the bottom of the page, you will see a list of the dataset's versions, with a link to any other versions of the data that have been published in the RDR.

Versions of a dataset

Clicking on the active link to a previous version of the data, either from this page or from outside the RDR application, will take you to the earlier version of the dataset. A large red banner across the top will alert you to a superseding version of the data, with a link.

A version of a dataset superseded by a new version

If you know the full citation for the publication with which your data are associated, please provide it at the time of initial submission. On the "Submission Information" page of the Data Submission Form, you will see a field requesting "any related publications, datasets or other published materials that you would like to associate with the study/dataset record."

If you are unsure of the full citation at the time of data submission, you can always provide that information when it becomes available, either by emailing us directly at datamanagement@duke.edu, or by filling out the brief form you will see when requesting a modification of your dataset.

Links to associated publications will then appear under "Related Materials" in the metadata for your dataset, along with a link to view the publication.

Links to associated publications

Yes! As part of our curation service, we will enter the citation for your dataset in your Scholars@Duke profile. Datasets deposited with the RDR will be visible by expanding the "Selected Publications" section under "Publications & Artistic Works." The data may be accessed in the RDR either by clicking on the "Data Access" button, or via the link included in the dataset record (viewable by clicking the hyperlinked title in the dataset citation).

Scholars@Duke listing for a dataset

Duke University Libraries generally support the use of open-source software, and to this end, the RDR uses a locally customized version of the Samvera community's Hyrax framework for repository front-ends. Hyrax is a Ruby on Rails engine that is highly customizable, and in the RDR's iteration, deploys Fedora 4 (a version of the Fedora Commons repository system for digital asset management) as its persistence layer, and integrates with Apache Solr and Blacklight.

We can accept supplementary files; however, they must be accompanied by the underlying data that is used to create the tables and figures in those files (which are typically static PDF files). If you are not able to provide the data for any reason (proprietary, sensitive, consent not obtained), you can use OSF for sharing the static PDF tables. For more information on the OSF please see this page.

When considering other options for repository solutions for other types of scholarly materials (open access publications, presentations, etc.), the libraries support a number of options. If you are working on team-based research, you may also wish to consult the DUL guidelines for preserving and disseminating team-based research products..

We would love to hear from you regarding our services! You may provide feedback via this survey or feel free to reach out to us directly at datamanagement@duke.edu.

Submission Guidelines

Readying Your Data Deposit

Review the guidelines below when preparing your data for publishing or consult our Curation Checklist.

Describing Your Data

Prepare to describe your dataset in order for it to be discoverable in the repository. The Data Deposit Form will guide you through what you need to include. At the minimum, you will need to provide a title, author list, contact information, a description, and keywords.

Tips for metadata:


When including data underlying a publication - Data from: Title of Publication (Article, Monograph, Report, etc.) Example - Data and scripts from: Clustering and assembly dynamics of a one-dimensional microphase former

When including data associated with a larger study/project - Name of Study/Project, Data Details (Time Range, Location, Other Descriptive Information) Example- IPHEx-Southern Appalachian Mountains -- Rainfall Data 2008-2014

Creator: Include those individuals who were involved in creating or authoring the data. These individuals will be listed within the data citation. Another contributor field is available for listing individuals who contributed to the dataset but should not be included as creator/author(s).

Description: For data underlying a publication you may use the article abstract.Oher information may include study aims, methodology details, and other contextual details such as programs, software, or equipment used. Learn more about our metadata fields for research data.

Formatting Your Data

Save your files in a preservation-friendly format. Proprietary formats can cause problems with long-term preservation as software platforms change and previous versions become obsolete. If you wish to include a proprietary format that is commonly used in your field or discipline, you may do so, but full preservation can only be assured for sustainable file formats. You might also consider including the proprietary format with a preservation-friendly derivative. If you aren't sure if there is a preservation-friendly format for your files or you are unable to uncouple your data from proprietary software, please contact datamanagement@duke.edu.

Documenting Your Data

Provide appropriate documentation that clearly describes your data so that others can interpret and use your files. The documentation may be README files, data dictionaries, codebooks, instruments, user's manual, and/or fully commented code. The important thing to keep in mind is that someone else will need to understand how to use your data, and they will not know all of the nuances in your file names, labels, data values, etc. without guidance. Need help getting started creating documentation? See this Cornell guide for writing README files.

Pro Tip: To enhance reproducibility, be sure to include the name of the software (and version) you used to collect and analyze your data within your documentation.

Organizing Your Data

Organize your files in a logical order, but do not use nested folders (folders within folders). For example, all code in a folder called "code" with no additional subfolders in that "code" folder. Regardless of the structure, we ask that you include a README file (see above) that provides a description of what you are depositing. For your file names, use descriptive file names that will help users understand the file contents and differentiate them from other files. Also remember to define any abbreviations in your README file. Do not include special characters (i.e., +, =, /, etc.) in your file names as these cause issues for our system and others

Licensing Your Data

Decide on the appropriate license for your data. Datasets submitted to the RDR will have a default CC0 public domain dedication assigned. The RDR strongly recommends the use of a CC0 waiver to encourage the broadest reuse of the data and expects users of data to follow growing scholarly best practice to properly attribute and cite data producers. Since in many jurisdictions data may not be copyrightable, the CC0 waiver also removes legal questions related to the copyright status of datasets. If CC0 is not appropriate for your data, during the submission process you can elect to apply an alternative Creative Commons license.

Leave Files Uncompressed

Compressed files cannot receive the same level of preservation as uncompressed files. There may be situations were providing a zipped file is preferable in order to maintain a more complex file hierarchy or reduce the size of your deposit. We can accommodate these deposits but suggest only compressing files when necessary.

Sharing Human Subjects Data

Prepare any human subjects data to be ethically shared. Before depositing data about human subjects, you must ensure that you are following the terms of your approved IRB protocol. Typically this would include having obtained consent for data sharing (see this guide on recommended language for informed consent), including what information you plan to share, as well as any de-identification steps you will perform to protect participant confidentiality. The DUHS IRB has approved the RDR as an appropriate repository for sharing de-identified human subjects data. You may reference the following protocol number: Pro00108231. The Campus IRB does not require this same level of approval. When you submit human subjects data to the RDR, we will perform a review to make sure that there is no directly identifiable information in the data as well as ask for a copy of your consent form (or proof of consent waiver) and IRB protocol. If your data are de-identified but still could pose a risk of harm due to deductive disclosure, we can work with you to determine a more appropriate repository that provides restricted data support (researchers must apply to use, have proof of IRB approval, etc.).  

Data Coming from PACE (or covered by DUHS IRB)

If you are depositing data from PACE or are a DUHS researcher whose work is covered by the DUHS IRB, please ensure that you have obtained proper consent or waiver of consent to share data and that you have addressed data sharing in your IRB protocol. If you have not, you will need to amend your protocol. Consult DOCR or the IRB for appropriate data sharing language. 

If you have any questions about how to prepare your data for the deposit, we are happy to help! Contact us at datamanagement@duke.edu with your question or to set up an appointment.


In order to submit datasets larger than 10GB in size for curation and publication, or to download datasets or files from the Research Data Repository that are larger than 2GB, you will need to use the Globus file transfer service. For those who use Globus often, this may become the most convenient way to retrieve files from the RDR, regardless of size.

What is Globus?

Globus is a nonprofit platform created by the University of Chicago and Argonne National Laboratory that enables the simple transfer of digital files as large as petabytes (a petabyte is 1,000 terabytes or 1,000,000 gigabytes) from established endpoints, one of which can be your work or personal computer. Please note that if you encounter time-outs or errors in transfer or download your personal computer/laptop might not be powerful enough. It is recommended that you instead use an existing server for transfer. 

How do I use Globus to transfer files to and from my computer?

Below we describe some of the key steps to getting started with Globus, especially how to download files from the RDR and send us larger files for deposit into the repository. Globus also has detailed "How To" walkthroughs for basic and more complicated setup processes. We have also created some RDR specific walk-through videos of the steps outlined below.

Note if your lab or department’s IT infrastructure is already using Globus you can skip to the RDR specific steps for downloading or uploading data.

Set up a Globus account

The first step is to set up an account with the Globus Web App, you can do so using existing organizational login, including Duke University, or through Google, ORCiD, or using your Globus ID. To begin setting up an account, navigate to the Globus login page.

If you are setting up an account to deposit data, you must log in using your Duke credentials (click the blue "Continue" button to go to a Shibboleth login page and enter your Duke NetID and password).

Globus login screen

Install Globus Connect Personal

Once you have set up a Globus account, you will want to establish a personal endpoint on your computer so that you can transfer data to it (for download) and from it (for upload). To do this, you will need to install Globus Connect Personal to connect to the Globus Web App (instructions below). An "endpoint" is one of the two file transfer locations. You install Globus Connect Personal onto the system you plan to use (server, cluster, storage system, laptop, desktop, etc.) and configure it so that it has access to the area where the data you want to transfer is stored or where you want to download data to. The process is similar to mapping to a network drive or using an FTP service. Please see:

Downloading Data using Globus

Globus may be used to download datasets or individual files that are larger than 2GB.

  1. Navigate to the dataset or file you wish to download and click on the "Get Data from Globus" button. RDR Globus download button
  2. You will be prompted to log in to Globus. You may do so through an existing organizational login, or through Google, ORCiD, or using your Globus ID. Globus login screen
  3. After logging in, you will be taken to the File Manager screen. In the path field at the top will be the RDR system ID for the dataset you wish to download. In the panel to the left of the screen, you will see a list of the files associated with the dataset. This will include both an export manifest that includes SHA-1 checksums with which you can verify the accuracy and fixity of the files you download, and an export README file that contains metadata and some other contextual information about the dataset.
    Please note: if you are attempting to download a single large file from a dataset, you may have to navigate the dataset's hierarchy in this pane to find the correct file.
    Click on "Transfer or sync to..." on the right to specify a location to transfer the data. Globus download transfer screen
  4. At this stage you can either search for an existing Globus collection and begin the data transfer, or use Globus Connect Personal to move the data to your local machine. Your destination endpoint icon should be green (with the icon tooltip reading "collection ready") if your device is ready to download files from Globus. If the icon is red (such as "RDR Submissions" here) with an "x" (or if the icon tooltip reads "collection offline"), please check that Globus Connect Personal is running on your computer (for additional information, please refer to the Globus getting started documentation). Globus select file destination screen
  5. In the left-hand panel, either select all if you wish to transfer the entire dataset, or select the specific files you wish to transfer. In the right hand panel, select the directory to which you would like to transfer the data (note: if you do not select a location, Globus defaults to whatever location you previously set up in Globus Connect Personal). Globus transfer file selection screen
  6. When the endpoints have been established, the "Start" button will become actionable (dark blue). Clicking "Start" will produce a green pop-up indicating a successful transfer request. Globus transfer file start button Globus transfer request successful
  7. Because Globus is designed to handle large files that may take some time to download, transfer will pause when your computer is no longer connected to the Internet and will resume automatically when reconnected. Files that have been completely downloaded will appear in the destination folder in your computer. Clicking on "Activity" in the left-most panel will allow you to check the status of your current transfers."

Uploading Data using Globus

  1. Follow the instructions above to set up your account and configure your endpoint via Globus Connect Personal.
  2. You will receive an email from Globus that contains a URL to access your share where you will transfer your files. The endpoint collection name will have your NetID and a date stamp. If you are not already logged in, you will be taken to the Globus login page when you click on this link. Click the blue "Continue" button to go to a Shibboleth login page and enter your Duke NetID and password.
  3. Once you are logged in, you will see your File Manager screen. On the top right of your File Manager, you will see a "Panels" menu. In order to see both the source and transfer panels, select the middle option. Globus File Manager screen
  4. In the left-hand panel, you'll see the collection that you will be transferring your files to. It will be labeled with your NetID and date stamp and will be empty. In the right-hand panel, you will navigate to the endpoint you want to transfer from. To do this, click on the collection search bar (marked with a magnifying glass icon) to navigate to the files on the endpoint you defined when you installed Globus Connect Personal. You may need to use the “up one folder” arrow depending on how you have mapped your endpoint. To begin the upload, make sure the right side of your window, your collection, is active (in the dual panel view the inactive panel is gray) and that the files you want to transfer are selected: Globus File Manager screen
  5. Click on the Start button at the top of your file manager window to begin the transfer process. Globus File Manager start transfer buttonYour files will then begin to transfer. You can view processing messages in the Activity section of Globus. To view your transferred files click on “refresh list” and they should display. Any issues with the upload also typically result in an email message as well. If you receive any error messages and cannot move forward with your upload please contact datamanagement@duke.edu.

If you are in need of further assistance, extensive documentation is available through the Globus website. Additionally, please feel free to drop us a line at datamanagement@duke.edu.

A note about Troubleshooting Globus

By default, Globus Personal Connect will automatically turn on in the background when you start your computer. When you are not using Globus, to avoid receiving error messages and other notifications, simply find the Globus icon in the bottom right-hand corner of your screen (or in the menu/status bar at the top for Mac), right-click (or ctrl-click for Mac), and click "Quit Globus Personal Connect."

The RDR Curation team would like to thank the University of Michigan Deep Blue Data for their support as we integrated Globus with the repository.


What is metadata and why is it important?

For the purposes of deposit with the Duke Research Data Repository, metadata is the high-level descriptive information about a dataset that is used in the discovery and identification of data. Metadata may include characteristics of the dataset such as its Title, Creator, and Description, and can help other researchers to understand more about the data and whether it may be of use. Deposit with the Research Data Repository requires that depositors provide information for a small number of metadata fields, and strongly encourages provision of several others.

Metadata is not a substitute for adequate dataset documentation, such as README files, codebooks, data dictionaries, or methodology reports. Dataset documentation is meant to serve as a more comprehensive record of the methodology, coding decisions, measurement tools and analytic processes that make it possible for others to correctly interpret and replicate your work. For more information, or for guidance on how to best compose documentation for your dataset, please contact one of our Research Data Management Consultants.

For more in-depth documentation about metadata for the Research Data Repository, please see here. Metadata from the RDR can be extracted or harvested in bulk using our OAI-PMH feed, access the "Dataset" ListSet

What metadata are required for deposit?

The following descriptive information about your dataset is required for deposit with the Research Data Repository. These are fields marked with a red asterisk in the Data Submission Form.


Provide a descriptive name for the dataset. Consider the following guidelines:

  • If including data underlying a publication, use the following structure:
    • Data from: Title of Publication (Article, Monograph, Report, etc.)
  • If your data are not associated with a publication, but with a larger study/project, consider including descriptive information that will readily identify your dataset. The following details are often helpful:
    • Name of Study/Project
    • Time Range
    • Location
    • Topic or general research area
    • Wave or phase
  • Some examples:
    • Data and scripts from: Clustering and assembly dynamics of a one-dimensional microphase former
    • Data from: Resistance to flow on a sloping channel covered by dense vegetation following a dam-break
    • Neurobiology of social reward valuation in adults with a history of anorexia nervosa
    • IPHEx-Southern Appalachian Mountains -- Rainfall Data 2008-2014


List the name(s) of the person(s) and/or organization(s) involved in creating or authoring the data. These are individuals who will be listed in the data citation. There is a secondary contributor field (see below) that is available for individuals who contributed to the dataset but should not be included in the data citations as a creator or author. For personal names, please enter each name in the following format: Last name, First Name Middle Initial (middle initials are optional). If multiple names should be associated with the dataset, please separate names with a semicolon.


Please provide a brief, general description of the research that produced the data, including the research question(s). For data supporting a publication, including the article abstract is generally sufficient. Other information to consider incorporating in your description may include methodological details, or contextual details about programs, software, or equipment used. Information in this field will help other users to better understand how your research contributes to the field and whether or not they may be able to reuse your data in their own research.


Enter any terms or topics that describe your research or would help make it discoverable in the Research Data Repository. Multiple terms may be entered for each dataset; please separate terms with a semicolon. As part of the data curation process, the terms you enter here may be normalized to an established heading from among the Library of Congress Subject Headings or U.S. National Library of Medicine Medical Subject Headings, where appropriate (if an appropriate terms cannot be identified, the term will be applied to your dataset as you have supplied it).


In this field, please enter a name, phone number, email address, and/or ORCID for a designated contact for the dataset (we strongly recommend the inclusion of an ORCID–please contact us at datamanagement@duke.edu if you would like assistance setting up an account). This contact is responsible for answering questions about the data, documentation and/or code for this project. Contact information may be helpful to other researchers if they need additional information or guidance when re-using the data or reproducing the research. Providing this information may also lead to opportunities for collaboration. Multiple values may be entered for this field; please separate each contact with a semicolon.

Additional metadata we encourage you to supply


Please list any other person(s) or organization(s) who may have contributed to the creation of the dataset, but who you do not wish to include in the full dataset citation. For personal names, please enter each name in the following format: Last name, First Name Middle Initial (middle initials are optional). If multiple names should be associated with the dataset, please separate names with a semicolon.

Organizational affiliation

This field refers to any academic department, school, research center or other organizational affiliation you wish to see acknowledged in the metadata record for the dataset. This also provides an implicit secondary point of contact for future users of the data, should the explicit Contact information (see above) included with the dataset become out of date. Multiple values may be entered for this field; please separate multiple values with a semicolon. Unfortunately, individuals entered as creators or contributors cannot be explicitly associated with their organizational affiliation at this time. If this is required, an explicit association can be made by including that information as free text in the Description field, outlined above, or included in the dataset documentation.

Geographic location

This field accommodates information about the geographic location in which your study data was collected. Enter a value here if this information is significant for the analysis of the data collected. This information may take the form of a country, state, or locality. Note: during the data curation process, this information will be normalized to an entry in the GeoNames geographical database, and may thus differ slightly in appearance or specificity upon dataset publication.

Dates of collection

If it is significant for the analysis or reuse of your data, please provide the date or range of dates that correspond to the span of time that the data were collected or generated. Be as specific as you can in defining the beginning and end of the data collection period.


Please list the language(s) in which the data and supplementary content are written. This field is meant to be limited to spoken languages; if you wish to include information about programming languages relevant to your dataset, please include that information in the description or in the dataset documentation. If there are multiple languages, please separate each with a semicolon.


Please list the formats of the files associated with your dataset (e.g. PDF, CSV, DICOM, WAV, TIFF), and please separate each with a semicolon. We recommend that you save your files in a preservation-friendly format. Proprietary formats can cause problems with long-term preservation as software platforms change and previous versions become obsolete. If you wish to include a proprietary format, you may do so, but full preservation can only be assured for sustainable file formats. Enumerating the formats included in your dataset will help users determine what software will be required to reuse your data.

Related publications

Enter a citation to any publication(s) that make use of or reference this dataset. This may include articles, other datasets, or any other published materials that you would like to associate with this data package. Include a full citation where possible, but a DOI, URL, or other persistent identifier may be sufficient. If the publication has not yet been released, please make note of that in the field and plan to update the Research Data Curation Team when it has been published. Multiple citations may be included in this field; please separate each with a semicolon.

Funder or funding agency

Please list the primary funding agency or agencies that supported the research project that generated or collected the data. Identifying the funding agency can help demonstrate compliance with any data sharing requirements made by the agency and connect your research with a larger body of work. Multiple agencies may be included; please separate each with a semicolon.

Grant number

Enter any grant numbers associated with the funding that supported the research generating your data. Multiple values may be entered; please separate each with a semicolon.

A word about Creative Commons licenses

Datasets will default to a CC0 public domain dedication. By applying a CC0 waiver to your data, you are releasing your data for reuse and redistribution without restrictions under copyright or database law. Users of data within the RDR are expected to properly attribute data producers in accordance with community and data citation norms. To learn more see the Creative Commons CC0 web page or consult the RDR Licensing Policy.

If a CC0 waiver is not appropriate for your data, please see the Creative Commons website to identify another license.

  • Guide to Creating Metadata for Your Dataset in Deep Blue Data. (2018, May 15). Retrieved from https://deepblue.lib.umich.edu/data/metadata-guidance
  • Guide to writing "readme" style metadata. (n. d.) Retrieved from https://data.research.cornell.edu/content/readme