We follow best practices for data sharing and archiving. We will provide you with a DOI to include with your manuscript proof or granting agency documentation, apply standardized metadata for discovery, assign a standard Creative Commons license, and keep your data safe through a preservation infrastructure. We have also outlined our compliance with the NIH Desirable Characteristics for Repositories and provide boilerplate language about our repository that you can use in your Data Management & Sharing Plan. Learn more about expectations in changes in grant funding requirements related to the OSTP "Nelson" memo and the current NIH Data Management and Sharing Policy.
The RDR supports an embargo for up to one year from the time of dataset publication. If you need to assign an embargo, you can indicate you would like an embargo in the “notes to RDR curator” field at the bottom of the Submission form or let us know prior to data publication. As we are an open access repository, we generally only recommend embargoing data files when a dataset is underlying a publication that has not yet been published, and the depositor does not want files public during peer review. Embargoed files will be viewable on the dataset page but cannot be downloaded. If you need to share embargoed files with a publisher for peer review, please contact datamanagement@duke.edu and we can provide a letter that provides access information.
Note: if you would like to modify embargoed files without creating a new version (see below) or lift an embargo prior to the selected date (one year from dataset publication), contact us and we can assist.
Data deposited with the RDR should be in its final, publication-ready form; the RDR is not an appropriate solution for data that are being actively managed in the course of your research. However, we understand that errors may be discovered post-publication or additional data or documentation files may need to be added. In these cases, depositors can submit a new version of a dataset. We will continue to provide access to the previous version of all files to ensure persistent access to previously published materials (if you need to permanently remove files, see our deaccessioning policy.
When a dataset has been versioned, the system will:
Versioning should not be used as a method to add files to a collection from different waves of a study or as a method to preserve data during the active research phase of a project. If you foresee your dataset evolving over time, then consider a "release cycle" for your data. If you have questions about creating a plan for publishing dynamic data, contact datamanagement@duke.edu.
To version your dataset (adding/removing/modifying the files within a dataset), follow these instructions:
Note: If you are versioning a dataset that is over 25 GBs and need to add/delete/replace individual files over 10 GBs in size, please contact datamanagement@duke.edu for assistance prior to versioning.
If you need to make minor adjustments to your metadata only (not modifying files), follow these instructions:
Note: If you need to make minor changes to your README file without creating a new version (see verisioning above), you can contact datamanagement@duke.edu and we can make minor updates to README files by request.
It depends on the data submission, but we try to curate deposits as quickly as possible (two-three days). However if your data are large, complex (thousands of files), or are based on human participants, this could add additional time for review and processing. If your data are sent to the Data Curation Network (see below for more information) this may also add time, but we can provide you with a DOI prior to publication in these cases. Your data will go through a curatorial review to ensure that your deposit package is complete. If additional information is needed or changes are required, prompt attention to the request for additional information or changes will ensure the fastest turnaround.
After submitting a dataset, depositors will receive an email with a draft DOI for their dataset. The DOI may be shared with a publisher; however, note that the DOI will not resolve (link to your dataset on the web) until your data has completed the curation process and is published within the repository. If your data are from human subjects, please do not share the DOI until a curator has approved the acceptance of your data based on our human data policy requirements for consent, de-identification, and overall data sensitivity. For datasets over 25 GBs, a DOI can be requested after data files are submitted in the case where the researcher needs the DOI quickly to send to a publisher.
At this time, the RDR cannot accept any data that would require special access conditions or are considered either sensitive or restricted according the Duke Data Classification. Examples of data that could be considered sensitive includes any human participants data that contains either personally identifiable information (PII) or protected health information (PHI), unconsented Duke patient data, insufficiently consented data, and/or data for which de-identification is not sufficient enough to reduce the risk of harm to participants from accidental/inappropriate disclosure. Other examples of sensitive data may include those under export control, have specific geographic locations of endangered species and/or areas (poaching/vandalism risk), or any data Duke is obligated to protect. If your data needs to be restricted, please contact us at datamanagement@duke.edu. We may be able to help you find an alternative archival solution.
We are unable to restrict access to datasets by registered users at this time. If you need to restrict access to your data, we may be able to help you find alternative options. Contact us at datamanagement@duke.edu.
The Duke Research Data Repository provides 300 GB of preservation storage per deposit for Duke researchers (defined as graduate, post-doctoral, research staff, and faculty) at no cost. For larger datasets, please contact us to discuss the feasibility for the RDR to accept your deposit based upon the scope and scale of your data. Additional preservation costs may also be assessed based upon the size of the submission.
For projects planning for data preservation and storage for grant applications, please contact us at datamanagement@duke.edu for planning and tracking purposes. We can also provide you with boilerplate language or Letters of Support as appropriate.
The Duke Research Data Repository accepts code/scripts that produce, transform, or process data as well as original software produced in a project. To see examples of how other researchers have shared their code, search the repository for “code” or “scripts.” If you maintain a code repository on Github, you can link to it from your RDR deposit by providing the Github URL as a “Related Resource” in the submission workflow. You can also archive a snapshot of your Github repository in Zenodo, a public data repository supported by CERN, and provide the link to that Zenodo deposit as a Related Resource as well.
When considering other options for repository solutions for other types of scholarly materials (open access publications, presentations, etc.), the libraries support a number of options. If you are working on team-based research, you may also wish to consult the DUL guidelines for preserving and disseminating team-based research products.
We encourage members of the Duke community to determine the repository that best meets their needs. For instance, your funder or publisher may instruct you to use a particular repository, or your scholarly community may have a disciplinary or content-type repository that people commonly publish in (which is a great option!). Duke also supports or is a member of a number of disciplinary repositories you may want to consider:
We are happy to provide advice on how to prepare your data for deposit in other repositories if needed or help you identify an appropriate repository. Contact us at datamanagement@duke.edu for assistance.
While it's true that redundant copies can help assure you that your data are safe, multiple copies of datasets can confuse users of your data, and can often be difficult to keep in sync. We discourage the deposit of data in multiple locations, but we understand that sometimes it may be necessary to keep a copy of the data elsewhere. In these cases, include the link to the other copy of your data in the Related Resources section of the submission form.
If you leave the university, we will continue to retain and make your data available under our stated Retention Policy with a minimum retention of 25 years. If you leave Duke, please send us updated contact information so we can update your dataset record. Providing your ORCiD identifier (which will follow you throughout your career) at time of deposit can help us keep your contact information current.
Yes! We can create a collection if you have a number of datasets (at least 3) related to one specific project. For instance see this example. If you would like to create a collection of datasets you have deposited into the RDR, please contact us datamanagement@duke.edu and we will have you complete a brief form to create the collection.
If you know the full citation for the publication with which your data are associated, please provide it at the time of initial submission. Within the submission form, you will see a field called “Related Resources” where you can add the citation.
If you are unsure of the full citation at the time of data submission, you can always provide that information when it becomes available by updating the metadata or contacting datamanagement@duke.edu.
Links to associated publications will then appear under "Related Items" in the metadata for your dataset, along with a link to view the publication.
Yes! As part of our curation service, we will enter the citation for your dataset in your Scholars@Duke profile. Datasets deposited with the RDR will be visible by expanding the "Selected Publications" section under "Publications & Artistic Works." The data may be accessed in the RDR either by clicking on the "Data Access" button, or via the link included in the dataset record (viewable by clicking the hyperlinked title in the dataset citation).
We can accept supplementary files; however, they must be accompanied by the underlying data that is used to create the tables and figures in those files (which are typically static PDF files). If you are not able to provide the data for any reason (proprietary, sensitive, consent not obtained), you can use Open Science Framework (OSF) for sharing the static PDF tables. For more information on the OSF, please see this page.
Globus is a system for transferring large files. The RDR uses Globus to support the upload of datasets over 50 GBs into the repository and files are also available for download via Globus by clicking the “Download data from Globus” button at the top of the dataset. See our Globus documentation for more information.
We may perform the following steps during the curation process:
We are also a member institution of the Data Curation Network (DCN), which expands our capacity to curate data from a large number of disciplines and data types by accessing a network of curators across 19 institutions. The DCN also provides another usual conceptual model for curation (see the CURATED steps)