Please review these guidelines before submitting your data to the Duke Research Data Repository. Following these tips will help ensure that your data can be quickly and easily added to the repository. These guidelines are also available in flyer form in our Curation Checklist.
Datasets deposited within the RDR will need to be in a “flat structure,” which can be accomplished in one of two ways:
Provide documentation that clearly describes your data so that others can interpret and use your files. The documentation may be README files, data dictionaries, codebooks, instruments, user manuals, and/or fully commented code. At a minimum the RDR requires a README file with each submission. Need help getting started creating documentation? Download the RDR's plain text README template to get started and see Cornell's guide for writing README files for additional guidance.
The important thing to keep in mind is that someone else will need to understand how to use your data, and they will not know all of the nuances in your file names, labels, data values, etc. without guidance.
Pro tip: To enhance reproducibility, be sure to include the name of the software programs (and version numbers) you used to collect and analyze your data within your documentation.
In order to be discoverable in the repository, datasets must be described with metadata: small pieces of information in a standardized format that allow people and machines to understand the contents and context of a dataset. The Data Deposit Form will guide you through the required and optional metadata elements. At minimum, you will need to provide a title, author list with, Contact information, a description, and keywords. See our metadata page for tips on describing your data.
Save your files in a preservation-friendly format whenever possible. Proprietary formats that can only be opened or saved by specific programs can cause problems with long-term preservation as software platforms change and previous versions become obsolete. If you wish to include a proprietary format that is commonly used in your field or discipline, you may do so, but full preservation can only be assured for sustainable file formats. You might also consider including the proprietary format with a preservation-friendly derivative. If you aren't sure if there is a preservation-friendly format for your files or you are unable to uncouple your data from proprietary software, please contact datamanagement@duke.edu
The RDR recommends the use of a CC0 waiver to encourage the broadest reuse of the data and expects users of data to follow scholarly best practices to properly attribute and cite data producers. Since in many jurisdictions data may not be copyrightable, the CC0 waiver also removes legal questions related to the copyright status of datasets. If CC0 is not appropriate for your data, while completing the submission form you can elect to apply an alternate Creative Commons or other open data or software licenses. See our licensing page to learn more.
The Duke Research Data Repository accepts human subjects research data if it is not subject to any access restrictions and if depositors can demonstrate that the data have been appropriately prepared for ethical sharing. When depositing data about human participants, you must ensure that:
Participants have been properly consented for data sharing and the terms outlined in your participant consent form and approved IRB protocol align (one should not contradict the other). For help with developing language in consent forms about data sharing, see this guide from ICPSR) or the DUHS “Data Sharing” section of their English Standard language page. You will be required to provide a copy/sample of your consent form and IRB protocol at the time of deposit.
Data has been fully de-identified to at least HIPAA Safe Harbor standards (all direct identifiers removed).
Data will pose little to no risk to participants if their identities were to be inadvertently discovered. We will perform a review checking for the above prior to publishing the data. See our human data sharing policy for more information. If your data are de-identified but still could result in potential harm to your participants due to deductive disclosure, we can work with you to determine a more appropriate repository.
The DUHS IRB has approved the RDR as an appropriate repository for sharing de-identified human subjects data. You may reference the following protocol number: Pro00108231.