Frequently Asked Questions for the Duke Research Data Repository

Does depositing data in the RDR meet journal sharing or grand funding requirements (e.g., Desirable Characteristics of Data Repositories?
Am I able to place an embargo on my data?
What if I need to make changes/create a new version of my data?
What are the steps to create a new version of my dataset?
How can I make updates to only the metadata for my dataset?
How long will it take for my data to be published?
When can I get a DOI for my data?
My data is considered sensitive, can I deposit it in the RDR?
Can I restrict access to my data so that only approved users can download it?
Do I have to pay to deposit data in the RDR?
What if I have other types of materials to share that are not data (code/scripts/software)?
What if I would like to deposit my data somewhere else?
If I deposit my data in the RDR, can I deposit it somewhere else too?
What happens to my data if I leave Duke?
Is there a way to group all my datasets in the RDR together for access?
How can I link my dataset to its associated publication?
Will my datasets appear in my Scholars@Duke profile?
Do you accept supplementary files (PDF with static text/tables/figures) that accompany journal articles?
What is Globus?
What does the data curation process entail?

Does depositing data in the RDR meet journal sharing or grand funding requirements (e.g., Desirable Characteristics of Data Repositories?

We follow best practices for data sharing and archiving. We will provide you with a DOI to include with your manuscript proof or granting agency documentation, apply standardized metadata for discovery, assign a standard Creative Commons license, and keep your data safe through a preservation infrastructure. We have also outlined our compliance with the NIH Desirable Characteristics for Repositories and provide boilerplate language about our repository that you can use in your Data Management & Sharing Plan. Learn more about expectations in changes in grant funding requirements related to the OSTP "Nelson" memo and the current NIH Data Management and Sharing Policy.

Am I able to place an embargo on my data?

The RDR supports an embargo for up to one year from the time of dataset publication. If you need to assign an embargo, you can indicate you would like an embargo in the “notes to RDR curator” field at the bottom of the Submission form or let us know prior to data publication. As we are an open access repository, we generally only recommend embargoing data files when a dataset is underlying a publication that has not yet been published, and the depositor does not want files public during peer review. Embargoed files will be viewable on the dataset page but cannot be downloaded. If you need to share embargoed files with a publisher for peer review, please contact datamanagement@duke.edu and we can provide a letter that provides access information.

Note: if you would like to modify embargoed files without creating a new version (see below) or lift an embargo prior to the selected date (one year from dataset publication), contact us and we can assist.

What if I need to make changes/create a new version of my data?

Data deposited with the RDR should be in its final, publication-ready form; the RDR is not an appropriate solution for data that are being actively managed in the course of your research. However, we understand that errors may be discovered post-publication or additional data or documentation files may need to be added. In these cases, depositors can submit a new version of a dataset. We will continue to provide access to the previous version of all files to ensure persistent access to previously published materials (if you need to permanently remove files, see our deaccessioning policy.

When a dataset has been versioned, the system will:

Retain all files previously published
Assign a new DOI to the new version of the dataset
A “Version note” will be included with a narrative explanation of what has changed between the versions
At the bottom of the page, you will see a list of the dataset's versions, with a link to any other versions of the data that have been previously published in the RDR
A banner across the top will alert you when a dataset has been versioned, with a link to the most recent version
When searching for a dataset, only the most recent version will display with the Version History listed at the bottom of the page
See this example

Versioning should not be used as a method to add files to a collection from different waves of a study or as a method to preserve data during the active research phase of a project. If you foresee your dataset evolving over time, then consider a "release cycle" for your data. If you have questions about creating a plan for publishing dynamic data, contact datamanagement@duke.edu.

What are the steps to create a new version of my dataset?

To version your dataset (adding/removing/modifying the files within a dataset), follow these instructions:

Log into the RDR using your Duke credentials
Go to the dataset you would like to version
Click on the “Version” button at the top of the dataset record and select “Create new version"
All metadata will be pre-populated from the previous version of the dataset. Make changes as needed to the metadata
Add/remove/replace the files that are changing in this new version
Indicate what has been modified in the “Version Note” field
Click “Submit” at the bottom of the page
A curator will then do a cursory review of your new dataset prior to publishing the dataset
You will receive an email with the DOI for the new version of your dataset

Note: If you are versioning a dataset that is over 25 GBs and need to add/delete/replace individual files over 10 GBs in size, please contact datamanagement@duke.edu for assistance prior to versioning.

How can I make updates to only the metadata for my dataset?

If you need to make minor adjustments to your metadata only (not modifying files), follow these instructions:

Log into the RDR using your Duke credentials
Go to the dataset for which you would like to modify the metadata
Click on the “Version” button at the top of the dataset record and select “Update current version”
All metadata will be pre-populated from the previous version of the dataset. Make changes as needed to the metadata
Click “Submit” at the bottom of the page
A curator may further enhance your metadata to ensure it is well-structured; however, your metadata changes will go into effect immediately.

Note: If you need to make minor changes to your README file without creating a new version (see verisioning above), you can contact datamanagement@duke.edu and we can make minor updates to README files by request.

How long will it take for my data to be published?

It depends on the data submission, but we try to curate deposits as quickly as possible (two-three days). However if your data are large, complex (thousands of files), or are based on human participants, this could add additional time for review and processing. If your data are sent to the Data Curation Network (see below for more information) this may also add time, but we can provide you with a DOI prior to publication in these cases. Your data will go through a curatorial review to ensure that your deposit package is complete. If additional information is needed or changes are required, prompt attention to the request for additional information or changes will ensure the fastest turnaround.

When can I get a DOI for my data?

After submitting a dataset, depositors will receive an email with a draft DOI for their dataset. The DOI may be shared with a publisher; however, note that the DOI will not resolve (link to your dataset on the web) until your data has completed the curation process and is published within the repository. If your data are from human subjects, please do not share the DOI until a curator has approved the acceptance of your data based on our human data policy requirements for consent, de-identification, and overall data sensitivity. For datasets over 25 GBs, a DOI can be requested after data files are submitted in the case where the researcher needs the DOI quickly to send to a publisher.

My data is considered sensitive, can I deposit it in the RDR?

At this time, the RDR cannot accept any data that would require special access conditions or are considered either sensitive or restricted according the Duke Data Classification. Examples of data that could be considered sensitive includes any human participants data that contains either personally identifiable information (PII) or protected health information (PHI), unconsented Duke patient data, insufficiently consented data, and/or data for which de-identification is not sufficient enough to reduce the risk of harm to participants from accidental/inappropriate disclosure. Other examples of sensitive data may include those under export control, have specific geographic locations of endangered species and/or areas (poaching/vandalism risk), or any data Duke is obligated to protect. If your data needs to be restricted, please contact us at datamanagement@duke.edu. We may be able to help you find an alternative archival solution.

Can I restrict access to my data so that only approved users can download it?

We are unable to restrict access to datasets by registered users at this time. If you need to restrict access to your data, we may be able to help you find alternative options. Contact us at datamanagement@duke.edu.

Do I have to pay to deposit data in the RDR?

The Duke Research Data Repository provides 300 GB of preservation storage per deposit for Duke researchers (defined as graduate, post-doctoral, research staff, and faculty) at no cost. For larger datasets, please contact us to discuss the feasibility for the RDR to accept your deposit based upon the scope and scale of your data. Additional preservation costs may also be assessed based upon the size of the submission.

For projects planning for data preservation and storage for grant applications, please contact us at datamanagement@duke.edu for planning and tracking purposes. We can also provide you with boilerplate language or Letters of Support as appropriate.

What if I have other types of materials to share that are not data (code/scripts/software)?

The Duke Research Data Repository accepts code/scripts that produce, transform, or process data as well as original software produced in a project. To see examples of how other researchers have shared their code, search the repository for “code” or “scripts.” If you maintain a code repository on Github, you can link to it from your RDR deposit by providing the Github URL as a “Related Resource” in the submission workflow. You can also archive a snapshot of your Github repository in Zenodo, a public data repository supported by CERN, and provide the link to that Zenodo deposit as a Related Resource as well.

When considering other options for repository solutions for other types of scholarly materials (open access publications, presentations, etc.), the libraries support a number of options. If you are working on team-based research, you may also wish to consult the DUL guidelines for preserving and disseminating team-based research products.

What if I would like to deposit my data somewhere else?

We encourage members of the Duke community to determine the repository that best meets their needs. For instance, your funder or publisher may instruct you to use a particular repository, or your scholarly community may have a disciplinary or content-type repository that people commonly publish in (which is a great option!). Duke also supports or is a member of a number of disciplinary repositories you may want to consider:

MorphoSource (3D and 2D media that represent physical objects)
Qualitative Data Repository (qualitative and mixed methods data)
Inter-University Consortium for Political and Social Research (ICPSR) (social and behavioral sciences)
Vivli (clinical data)

We are happy to provide advice on how to prepare your data for deposit in other repositories if needed or help you identify an appropriate repository. Contact us at datamanagement@duke.edu for assistance.

If I deposit my data in the RDR, can I deposit it somewhere else too?

While it's true that redundant copies can help assure you that your data are safe, multiple copies of datasets can confuse users of your data, and can often be difficult to keep in sync. We discourage the deposit of data in multiple locations, but we understand that sometimes it may be necessary to keep a copy of the data elsewhere. In these cases, include the link to the other copy of your data in the Related Resources section of the submission form.

What happens to my data if I leave Duke?

If you leave the university, we will continue to retain and make your data available under our stated Retention Policy with a minimum retention of 25 years. If you leave Duke, please send us updated contact information so we can update your dataset record. Providing your ORCiD identifier (which will follow you throughout your career) at time of deposit can help us keep your contact information current.

Is there a way to group all my datasets in the RDR together for access?

Yes! We can create a collection if you have a number of datasets (at least 3) related to one specific project. For instance see this example. If you would like to create a collection of datasets you have deposited into the RDR, please contact us datamanagement@duke.edu and we will have you complete a brief form to create the collection.

How can I link my dataset to its associated publication?

If you know the full citation for the publication with which your data are associated, please provide it at the time of initial submission. Within the submission form, you will see a field called “Related Resources” where you can add the citation.

If you are unsure of the full citation at the time of data submission, you can always provide that information when it becomes available by updating the metadata or contacting datamanagement@duke.edu.

Links to associated publications will then appear under "Related Items" in the metadata for your dataset, along with a link to view the publication.

Will my datasets appear in my Scholars@Duke profile?

Yes! As part of our curation service, we will enter the citation for your dataset in your Scholars@Duke profile. Datasets deposited with the RDR will be visible by expanding the "Selected Publications" section under "Publications & Artistic Works." The data may be accessed in the RDR either by clicking on the "Data Access" button, or via the link included in the dataset record (viewable by clicking the hyperlinked title in the dataset citation).

Do you accept supplementary files (PDF with static text/tables/figures) that accompany journal articles?

We can accept supplementary files; however, they must be accompanied by the underlying data that is used to create the tables and figures in those files (which are typically static PDF files). If you are not able to provide the data for any reason (proprietary, sensitive, consent not obtained), you can use Open Science Framework (OSF) for sharing the static PDF tables. For more information on the OSF, please see this page.

What is Globus?

Globus is a system for transferring large files. The RDR uses Globus to support the upload of datasets over 50 GBs into the repository and files are also available for download via Globus by clicking the “Download data from Globus” button at the top of the dataset. See our Globus documentation for more information.

What does the data curation process entail?

We may perform the following steps during the curation process:

Open the data files to determine their contents
Check to determine if there is enough documentation to describe the attributes and content of the data as well as the context of the research process (i.e., data sources, programs, etc.)
Review and run scripts (when appropriate)
If the data are on human subjects, perform a review to determine that all direct identifying information is removed (e.g. HIPAA 18).
Check for typos, duplicate files or other errors.
Identify additional suggestions for how to enhance the data package.

We are also a member institution of the Data Curation Network (DCN), which expands our capacity to curate data from a large number of disciplines and data types by accessing a network of curators across 19 institutions. The DCN also provides another usual conceptual model for curation (see the CURATED steps)