Duke Research Data Repository Frequently Asked Questions

We follow best practices for data sharing and archiving. We will provide you with a DOI to include with your manuscript proof or granting agency documentation, apply standardized metadata for discovery, assign a standard Creative Commons license, and keep your data safe through a preservation infrastructure. We have also outlined our compliance with the NIH Desirable Characteristics for Repositories. Learn more about expectations in changes in grant funding requirements related to the OSTP "Nelson" memo and the now in effect NIH Data Management and Sharing Policy.

The FAIR Guiding Principles were developed by a diverse group of stakeholders across the scholarly landscape to provide “a concise and measurable set of principles...act as a guideline for those wishing to enhance the reusability of their data holdings.” Spelled out, FAIR is - Findable, Accessible, Interoperable, and Reusable. FAIRness is established through the use of standardized machine-readable metadata, persistent identifiers (DOIs), clear license terms, transportable formats, and contextual documentation. To learn more about FAIR, see: https://www.go-fair.org/fair-principles/
We will allow an embargo of up to one year if needed. If you need to assign and embargo or would like to request an embargo longer than one year, please contact us at datamanagement@duke.edu.
It depends on the data submission, but we try to turn around deposits as quickly as possible. However if your data are large, complex (thousands of files), or is based on human participants, this could add additional time for review and processing. If your data are sent to the Data Curation Network (see below) this may also add time but we can provide you with a DOI prior to publication in these cases. Your data will go through a curatorial review to ensure that your deposit package is complete. If additional information is needed or changes are required, prompt attention to the request for additional information or changes will ensure the fastest turnaround.
  • A digital object identifier (DOI) for persistent access
  • Standardized Dublin Core metadata for discovery
  • A customized citation for proper attribution
  • Curatorial review to support optimum reuse
  • File format transformations for longer term reuse (when possible)
  • Long-term archiving and preservation

A Digital Object Identifier, or DOI, is a type of persistent identifier used to uniquely identify an information resource or other object. Persistent identifiers are long-lasting references to a document, file, web page, or other object, usually digital. Typically, these references may be entered into a web browser and will resolve to the specified resource. All objects in the Research Data Repository are assigned an Archival Resource Key (ARK) on ingest, which will serve as the basis for a persistent link to each object in the data package. Additionally, datasets will be assigned a DOI at the level of the complete data package, which will be included in the dataset citation and will provide persistent access to the data.

Standard practice is that DOIs are issued after a dataset is curated and made public (or embargoed). In the case where a researcher needs a DOI immediately to provide to a publisher, a researcher may request a DOI after submitting the data but prior to making that data publicly available. In these cases, RDR staff will reserve the DOI for a limited period of time but the DOI will not resolve until the data record is made public. Researchers may also receive a DOI and embargo a dataset (see below) where the DOI points to public metadata but the submitted files are kept private; this is ideal when a dataset is still under review by a journal publication and the author foresees changes being made. The RDR requires the submission of data files prior to issuing DOIs.

We are unable to restrict access to datasets by registered user at this time. If you do need to restrict access to your data, we may be able to help you find alternative options. Contact us at datamanagement@duke.edu.

We may perform the following steps during the curation process:

  • Open the data files to determine their contents
  • Check to determine if there is enough documentation to describe the attributes and content of the data as well as the context of the research process (i.e., data sources, programs, etc.)
  • Review and run scripts (when appropriate)
  • If the data are on human subjects, perform a review to determine that all direct identifying information is removed (e.g. HIPAA 18).
  • Check for typos, duplicate files or other errors.
  • Identify additional suggestions for how to enhance the data package.

We are also a member institution of the Data Curation Network (DCN), which expands our capacity to curate data from a large number of disciplines and data types by accessing a network of curators across 19 institutions. The DCN also provides another usual conceptual model for curation (see the CURATED steps)

We offer data curation to enhance your data package as a complete published collection, and to ensure that it is optimized for long-term preservation. This includes checking for documentation on data context (methods, file relationships, software needed and version) and attribute definitions (rows, columns, values) and migration to open, stable formats when possible (or creating an open, stable version alongside the original). Data curation helps to ensure that your data are meeting the FAIR guiding principles.

The Data Curation Network (DCN) is a cross-institutional staffing model intended to leverage the expertise of data curators at partner institutions. Ideally, the DCN exists to

  • provide expert data curation services for Network partners and end users,
  • create and openly share data curation procedures and best practices,
  • support training and development opportunities for an emerging data curator professional community, and

If your data (subject area and format) match the expertise of one of the DCN curators, we have the option to send your dataset to the DCN for review. We will always ask you before doing so!

We encourage the Duke community to determine the repository that best meets their needs. For instance, your funder or publisher may instruct you to use a particular repository, or your scholarly community may have a disciplinary or content-type repository that people commonly publish in (which is a great option!). Duke also supports or is a member of a number of disciplinary repositories you may want to consider:

We are happy to help you prepare your data for deposit in other repositories if needed or help you identify an appropriate repository. Contact us at datamanagement@duke.edu for assistance.

At this time, the RDR cannot accept any data that would require special access conditions or are considered either sensitive or restricted according the Duke Data Classification. Examples of data that could be considered sensitive includes human subjects data containing personally identifiable information (PII), protected health information (PHI), export controlled data, geographic locations of endangered species, data bound by data use agreements, etc. If your data needs to be restricted, please contact us at datamanagement@duke.edu. We may be able to help you find an alternative archival solution.

While it's true that redundant copies can help assure you that your data are safe, multiple copies of datasets can confuse users of your data, and can often be difficult to keep in sync. We discourage the deposit of data in multiple locations, but we understand that sometimes it may be necessary to keep a copy of the data elsewhere. If your data have already been assigned a digital object identifier (DOI) by another repository or database, we will not assign one for the copy that will reside in the RDR. Likewise, we will make cross-references to any other copies of your dataset in the metadata of the RDR copy.

Yes! We can create a collection if you have a number of datasets (at least 3) related to one specific project. For instance see these two examples (example 1, example 2). If you would like to create a collection of datasets you have deposited into the RDR, please contact us (datamanagement@duke.edu) and we will have you complete a brief form to create the collection.

The Duke Research Data Repository provides 300 GB of preservation storage per deposit for Duke researchers (defined as graduate, post-doctoral, research staff, and faculty) at no cost. For larger datasets, please contact us to discuss the feasibility for the RDR to accept your deposit based upon the scope and scale of your data. Additional preservation costs may also be assessed based upon the size of the submission.

For projects planning for data preservation and storage for grant applications, please contact us at datamanagement@duke.edu for planning and tracking purposes. We can also provide you with boilerplate language or Letters of Support as appropriate.

Globus is a system for transferring large files. The RDR uses Globus to support the upload of datasets over 10GB into the repository and the download of datasets over 2GB. See our Globus documentation for more information.

Duke University Libraries has outlined its commitment to the long-term preservation of and access to the assets curated in its digital repositories. The Duke Research Data Repository (RDR) is included under the Libraries' general digital Preservation Policy. While we currently do not have a formal retention schedule for data published in the RDR, we anticipate keeping datasets for at least 25 years (unless a shorter preservation period is selected and paid for during the submission process according to our Pricing for Storage policy). We will consider usage statistics and the value of the data to the research community when assessing the ongoing preservation of data. Any data removed from the repository will be transferred back to the depositor.

If you leave the university, we will continue to retain and make your data available under the Data Deposit Agreement. If you leave Duke, please send us any updated contact information so we can update your dataset record. Providing your ORCiD identifier (which will follow you throughout your career) at time of deposit can help us keep your contact information current.

Generally data deposited with the RDR should be in its final, publication-ready form. However, if an error is discovered, or if additional data files or documentation are needed, we are able to make slight modifications to an already published dataset. The new dataset will receive a new DOI, and the two will be linked in our system. We will continue to provide access to the previous version of all files to ensure persistent access to previously published materials (if you need to permanently remove files, see our deaccessioning policy). Changes to the dataset metadata can be made at any time.

To request a modification to your dataset, navigate to your dataset in the repository, click on the "Request Modifications" button, and complete the form (please see below). One of our curators will be in touch to process your request.

Request modifications button

Request modifications form

Data deposited with the RDR should be in its final, publication-ready form; the RDR is not an appropriate solution for data that are being actively managed in the course of your research. However, we also understand that errors may be discovered post-publication or additional data or documentation files may need to be added. In these cases, the RDR staff can help you create a new version of your dataset. We will continue to provide access to the previous version of all files to ensure persistent access to previously published materials (if you need to permanently remove files, see our deaccessioning policy).

Versioning should not be used as a method to add files to a collection from different waves of a study or as a method to preserve data during the active research phase of a project. If you foresee your dataset evolving over time then consider a "release cycle" for your data. If you need to version content already within the repository or have questions about creating a plan for publishing dynamic data, contact datamanagement@duke.edu.

When a dataset has been versioned, the dataset will be assigned a new DOI, and the bibliographic citation for the dataset will be amended with a "V2" distinction. A "Provenance" field will be added to the metadata with a narrative explanation of what has changed between the versions.

Citation and provenance for a versioned dataset

At the bottom of the page, you will see a list of the dataset's versions, with a link to any other versions of the data that have been published in the RDR.

Versions of a dataset

Clicking on the active link to a previous version of the data, either from this page or from outside the RDR application, will take you to the earlier version of the dataset. A large red banner across the top will alert you to a superseding version of the data, with a link.

A version of a dataset superseded by a new version

If you know the full citation for the publication with which your data are associated, please provide it at the time of initial submission. On the "Submission Information" page of the Data Submission Form, you will see a field requesting "any related publications, datasets or other published materials that you would like to associate with the study/dataset record."

If you are unsure of the full citation at the time of data submission, you can always provide that information when it becomes available, either by emailing us directly at datamanagement@duke.edu, or by filling out the brief form you will see when requesting a modification of your dataset.

Links to associated publications will then appear under "Related Materials" in the metadata for your dataset, along with a link to view the publication.

Links to associated publications

Yes! As part of our curation service, we will enter the citation for your dataset in your Scholars@Duke profile. Datasets deposited with the RDR will be visible by expanding the "Selected Publications" section under "Publications & Artistic Works." The data may be accessed in the RDR either by clicking on the "Data Access" button, or via the link included in the dataset record (viewable by clicking the hyperlinked title in the dataset citation).

Scholars@Duke listing for a dataset

Duke University Libraries generally support the use of open-source software, and to this end, the RDR uses a locally customized version of the Samvera community's Hyrax framework for repository front-ends. Hyrax is a Ruby on Rails engine that is highly customizable, and in the RDR's iteration, deploys Fedora 4 (a version of the Fedora Commons repository system for digital asset management) as its persistence layer, and integrates with Apache Solr and Blacklight.

We can accept supplementary files; however, they must be accompanied by the underlying data that is used to create the tables and figures in those files (which are typically static PDF files). If you are not able to provide the data for any reason (proprietary, sensitive, consent not obtained), you can use OSF for sharing the static PDF tables. For more information on the OSF please see this page.

When considering other options for repository solutions for other types of scholarly materials (open access publications, presentations, etc.), the libraries support a number of options. If you are working on team-based research, you may also wish to consult the DUL guidelines for preserving and disseminating team-based research products..

We would love to hear from you regarding our services! You may provide feedback via this survey or feel free to reach out to us directly at datamanagement@duke.edu.

Submission Guidelines

Readying Your Data Deposit

Review the guidelines below when preparing your data for publishing or consult our Curation Checklist.

Describing Your Data

Prepare to describe your dataset in order for it to be discoverable in the repository. The Data Deposit Form will guide you through what you need to include. At the minimum, you will need to provide a title, author list, contact information, a description, and keywords.

Tips for metadata:

Title:

When including data underlying a publication - Data from: Title of Publication (Article, Monograph, Report, etc.) Example - Data and scripts from: Clustering and assembly dynamics of a one-dimensional microphase former

When including data associated with a larger study/project - Name of Study/Project, Data Details (Time Range, Location, Other Descriptive Information) Example- IPHEx-Southern Appalachian Mountains -- Rainfall Data 2008-2014

Creator: Include those individuals who were involved in creating or authoring the data. These individuals will be listed within the data citation. Another contributor field is available for listing individuals who contributed to the dataset but should not be included as creator/author(s).

Description: For data underlying a publication you may use the article abstract.Oher information may include study aims, methodology details, and other contextual details such as programs, software, or equipment used. Learn more about our metadata fields for research data.

Formatting Your Data

Save your files in a preservation-friendly format. Proprietary formats can cause problems with long-term preservation as software platforms change and previous versions become obsolete. If you wish to include a proprietary format that is commonly used in your field or discipline, you may do so, but full preservation can only be assured for sustainable file formats. You might also consider including the proprietary format with a preservation-friendly derivative. If you aren't sure if there is a preservation-friendly format for your files or you are unable to uncouple your data from proprietary software, please contact datamanagement@duke.edu.

Documenting Your Data

Provide appropriate documentation that clearly describes your data so that others can interpret and use your files. The documentation may be README files, data dictionaries, codebooks, instruments, user's manual, and/or fully commented code. The important thing to keep in mind is that someone else will need to understand how to use your data, and they will not know all of the nuances in your file names, labels, data values, etc. without guidance. Need help getting started creating documentation? Download the RDR README plain text template to get started and see this Cornell guide for writing README files.

Pro Tip: To enhance reproducibility, be sure to include the name of the software (and version) you used to collect and analyze your data within your documentation.

Organizing Your Data

Organize your files in a logical order, but do not use nested folders (folders within folders). For example, all code in a folder called "code" with no additional subfolders in that "code" folder. Regardless of the structure, we ask that you include a README file (see above) that provides a description of what you are depositing. For your file names, use descriptive file names that will help users understand the file contents and differentiate them from other files. Also remember to define any abbreviations in your README file. Do not include special characters (i.e., +, =, /, etc.) in your file names as these cause issues for our system and others

Licensing Your Data

Decide on the appropriate license for your data. Datasets submitted to the RDR will have a default CC0 public domain dedication assigned. The RDR strongly recommends the use of a CC0 waiver to encourage the broadest reuse of the data and expects users of data to follow growing scholarly best practice to properly attribute and cite data producers. Since in many jurisdictions data may not be copyrightable, the CC0 waiver also removes legal questions related to the copyright status of datasets. If CC0 is not appropriate for your data, during the submission process you can elect to apply an alternative Creative Commons license.

Leave Files Uncompressed

Compressed files cannot receive the same level of preservation as uncompressed files. There may be situations were providing a zipped file is preferable in order to maintain a more complex file hierarchy or reduce the size of your deposit. We can accommodate these deposits but suggest only compressing files when necessary.

Sharing Human Subjects Data

Prepare any human subjects data to be ethically shared. When depositing data about human participants, you must ensure: 1) Participants have been properly consented for data sharing and the terms outlined in your participant consent form and approved IRB protocol align (one should not contradict the other). For help with developing language in consent forms about data sharing see this guide from ICPSR) or the DUHS “Data Sharing” section of their English Standard language page. You will be required to provide a copy/sample of your consent form and IRB protocol at the time of deposit. 2)Data must be fully de-identified at least by HIPAA Safe Harbor standards (all direct identifiers removed), 3) Data must pose little to no risk to participants if their identities were to be inadvertently discovered. We will perform a review checking for the above prior to publishing the data. See our policy for more information.  If your data are de-identified but still could result in potential harm to your participants due to deductive disclosure, we can work with you to determine a more appropriate repository. The DUHS IRB has approved the RDR as an appropriate repository for sharing de-identified human subjects data. You may reference the following protocol number: Pro00108231.

Globus

In order to submit datasets larger than 10GB in size for curation and publication, or to download datasets or files from the Research Data Repository that are larger than 3GB, you will need to use the Globus file transfer service. For those who use Globus often, this may become the most convenient way to retrieve files from the RDR, regardless of size.

What is Globus?

Globus is a nonprofit platform created by the University of Chicago and Argonne National Laboratory that enables the simple transfer of digital files as large as petabytes (a petabyte is 1,000 terabytes or 1,000,000 gigabytes) from established endpoints, one of which can be your work or personal computer.

How do I use Globus to transfer files to and from my computer?

Below we describe some of the key steps to getting started with Globus, especially how to download files from the RDR and send us larger files for deposit into the repository. Globus also has detailed "How To" walkthroughs for basic and more complicated setup processes. We have also created some RDR specific walk-through videos [add links] of the steps outlined below.

Note if your lab or department’s IT infrastructure is already using Globus you can skip to the RDR specific steps for downloading or uploading data.

Set up a Globus account

The first step is to set up an account with the Globus Web App, you can do so using existing organizational login, including Duke University, or through Google, ORCiD, or using your Globus ID. To begin setting up an account, navigate to the Globus login page.

If you are setting up an account to deposit data, you must log in using your Duke credentials (i.e., click) the blue "Continue" button to go to a Shibboleth login page and enter your Duke NetID and password.

Globus login screen

Install Globus Connect Personal

Once you have set up a Globus account, you will want to establish a personal endpoint on your computer so that you can transfer data to it (for download) and from it (for upload). To do this, you will need to install Globus Connect Personal to connect to the Globus Web App (instructions below). An "endpoint" is one of the two file transfer locations. You install Globus Connect Personal onto the system you plan to use (server, cluster, storage system, laptop, desktop, etc.) and configure it so that it has access to the area where the data you want to transfer is stored or where you want to download data to. The process is similar to mapping to a network drive or using an FTP service. Please see:

Downloading Data using Globus

Globus may be used to download datasets or individual files that are larger than 3GB.

  1. Navigate to the dataset or file you wish to download and click on the "Get Data from Globus" button. RDR Globus download button
  2. You will be prompted to log in to Globus. You may do so through an existing organizational login, or through Google, ORCiD, or using your Globus ID. Globus login screen
  3. After logging in, you will be taken to the File Manager screen. In the path field at the top will be the RDR system ID for the dataset you wish to download. In the panel to the left of the screen, you will see a list of the files associated with the dataset. This will include both an export manifest that includes SHA-1 checksums with which you can verify the accuracy and fixity of the files you download, and an export README file that contains metadata and some other contextual information about the dataset.
    Please note: if you are attempting to download a single large file from a dataset, you may have to navigate the dataset's hierarchy in this pane to find the correct file
    Click on "Transfer or sync to..." on the right to specify a location to transfer the data. Globus download transfer screen
  4. At this stage you can either search for an existing Globus collection and begin the data transfer, or use Globus Connect Personal to move the data to your local machine. Your destination endpoint icon should be green (with the icon tooltip reading "collection ready") if your device is ready to download files from Globus. If the icon is red (such as "RDR Submissions" here) with an "x" (or if the icon tooltip reads "collection offline"), please check that Globus Connect Personal is running on your computer (for additional information, please refer to the Globus getting started documentation). Globus select file destination screen
  5. In the left-hand panel, either select all if you wish to transfer the entire dataset, or select the specific files you wish to transfer. In the right hand panel, select the directory to which you would like to transfer the data (note: if you do not select a location, Globus defaults to whatever location you previously set up in Globus Connect Personal). Globus transfer file selection screen
  6. When the endpoints have been established, the "Start" button will become actionable (dark blue). Clicking "Start" will produce a green pop-up indicating a successful transfer request. Globus transfer file start buttonGlobus transfer request successful
  7. Because Globus is designed to handle large files that may take some time to download, transfer will pause when your computer is no longer connected to the Internet and will resume automatically when reconnected. Files that have been completely downloaded will appear in the destination folder in your computer. Clicking on "Activity" in the left-most panel will allow you to check the status of your current transfers."

Uploading Data using Globus

  1. Follow the instructions above to set up your account and configure your endpoint via Globus Connect Personal.
  2. You will receive an email from Globus that contains a URL to access your share where you will transfer your files. The endpoint collection name will have your NetID and a date stamp. If you are not already logged in, you will be taken to the Globus login page when you click on this link. Click the blue "Continue" button to go to a Shibboleth login page and enter your Duke NetID and password.
  3. Once you are logged in, you will see your File Manager screen. On the top right of your File Manager, you will see a "Panels" menu. In order to see both the source and transfer panels, select the middle option. Globus File Manager screen
  4. In the left-hand panel, you'll see the collection that you will be transferring your files to. It will be labeled with your NetID and date stamp and will be empty. In the right-hand panel, you will navigate to the endpoint you want to transfer from. To do this, click on the collection search bar (marked with a magnifying glass icon) to navigate to the files on the endpoint you defined when you installed Globus Connect Personal. You may need to use the “up one folder” arrow depending on how you have mapped your endpoint. To begin the upload, make sure the right side of your window, your collection, is active (in the dual panel view the inactive panel is gray) and that the files you want to transfer are selected: Globus File Manager screen
  5. Click on the Start button at the top of your file manager window to begin the transfer process. Your files will then begin to transfer. You can view processing messages in the Activity section of Globus. To view your transferred files click on “refresh list” and they should display. Any issues with the upload also typically result in an email message as well. If you receive any error messages and cannot move forward with your upload please contact datamanagement@duke.edu.

If you are in need of further assistance, extensive documentation is available through the Globus website. Additionally, please feel free to drop us a line at datamanagement@duke.edu.

A note about Troubleshooting Globus

By default, Globus Personal Connect will automatically turn on in the background when you start your computer. When you are not using Globus, to avoid receiving error messages and other notifications, simply find the Globus icon in the bottom right-hand corner of your screen (or in the menu/status bar at the top for Mac), right-click (or ctrl-click for Mac), and click "Quit Globus Personal Connect."


The RDR Curation team would like to thank the University of Michigan Deep Blue Data for their support as we integrated Globus with the repository.

Metadata

What is metadata and why is it important?

For the purposes of deposit with the Duke Research Data Repository, metadata is the high-level descriptive information about a dataset that is used in the discovery and identification of data. Metadata may include characteristics of the dataset such as its Title, Creator, and Description, and can help other researchers to understand more about the data and whether it may be of use. Deposit with the Research Data Repository requires that depositors provide information for a small number of metadata fields, and strongly encourages provision of several others.

Metadata is not a substitute for adequate dataset documentation, such as README files, codebooks, data dictionaries, or methodology reports. Dataset documentation is meant to serve as a more comprehensive record of the methodology, coding decisions, measurement tools and analytic processes that make it possible for others to correctly interpret and replicate your work. For more information, or for guidance on how to best compose documentation for your dataset, please contact one of our Research Data Management Consultants.

For more in-depth documentation about metadata for the Research Data Repository, please see here. Note: RDR metadata can be harvested using our OAI-PMH feed.

What metadata are required for deposit?

The following descriptive information about your dataset is required for deposit with the Research Data Repository. These are fields marked with a red asterisk in the Data Submission Form.

Title

Provide a descriptive name for the dataset. Consider the following guidelines:

  • If including data underlying a publication, use the following structure:
    • Data from: Title of Publication (Article, Monograph, Report, etc.)
  • If your data are not associated with a publication, but with a larger study/project, consider including descriptive information that will readily identify your dataset. The following details are often helpful:
    • Name of Study/Project
    • Time Range
    • Location
    • Topic or general research area
    • Wave or phase
  • Some examples:
    • Data and scripts from: Clustering and assembly dynamics of a one-dimensional microphase former
    • Data from: Resistance to flow on a sloping channel covered by dense vegetation following a dam-break
    • Neurobiology of social reward valuation in adults with a history of anorexia nervosa
    • IPHEx-Southern Appalachian Mountains -- Rainfall Data 2008-2014

Creator

List the name(s) of the person(s) and/or organization(s) involved in creating or authoring the data. These are individuals who will be listed in the data citation. There is a secondary contributor field (see below) that is available for individuals who contributed to the dataset but should not be included in the data citations as a creator or author. For personal names, please enter each name in the following format: Last name, First Name Middle Initial (middle initials are optional). If multiple names should be associated with the dataset, please separate names with a semicolon.

Description

Please provide a brief, general description of the research that produced the data, including the research question(s). For data supporting a publication, including the article abstract is generally sufficient. Other information to consider incorporating in your description may include methodological details, or contextual details about programs, software, or equipment used. Information in this field will help other users to better understand how your research contributes to the field and whether or not they may be able to reuse your data in their own research.

Keywords

Enter any terms or topics that describe your research or would help make it discoverable in the Research Data Repository. Multiple terms may be entered for each dataset; please separate terms with a semicolon. As part of the data curation process, the terms you enter here may be normalized to an established heading from among the Library of Congress Subject Headings or U.S. National Library of Medicine Medical Subject Headings, where appropriate (if an appropriate terms cannot be identified, the term will be applied to your dataset as you have supplied it).

Contact

In this field, please enter a name, phone number, email address, and/or ORCID for a designated contact for the dataset (we strongly recommend the inclusion of an ORCID–please contact us at datamanagement@duke.edu if you would like assistance setting up an account). This contact is responsible for answering questions about the data, documentation and/or code for this project. Contact information may be helpful to other researchers if they need additional information or guidance when re-using the data or reproducing the research. Providing this information may also lead to opportunities for collaboration. Multiple values may be entered for this field; please separate each contact with a semicolon.

Additional metadata we encourage you to supply

Contributor

Please list any other person(s) or organization(s) who may have contributed to the creation of the dataset, but who you do not wish to include in the full dataset citation. For personal names, please enter each name in the following format: Last name, First Name Middle Initial (middle initials are optional). If multiple names should be associated with the dataset, please separate names with a semicolon.

Organizational affiliation

This field refers to any academic department, school, research center or other organizational affiliation you wish to see acknowledged in the metadata record for the dataset. This also provides an implicit secondary point of contact for future users of the data, should the explicit Contact information (see above) included with the dataset become out of date. Multiple values may be entered for this field; please separate multiple values with a semicolon. Unfortunately, individuals entered as creators or contributors cannot be explicitly associated with their organizational affiliation at this time. If this is required, an explicit association can be made by including that information as free text in the Description field, outlined above, or included in the dataset documentation.

Geographic location

This field accommodates information about the geographic location in which your study data was collected. Enter a value here if this information is significant for the analysis of the data collected. This information may take the form of a country, state, or locality. Note: during the data curation process, this information will be normalized to an entry in the GeoNames geographical database, and may thus differ slightly in appearance or specificity upon dataset publication.

Dates of collection

If it is significant for the analysis or reuse of your data, please provide the date or range of dates that correspond to the span of time that the data were collected or generated. Be as specific as you can in defining the beginning and end of the data collection period.

Languages

Please list the language(s) in which the data and supplementary content are written. This field is meant to be limited to spoken languages; if you wish to include information about programming languages relevant to your dataset, please include that information in the description or in the dataset documentation. If there are multiple languages, please separate each with a semicolon.

Formats

Please list the formats of the files associated with your dataset (e.g. PDF, CSV, DICOM, WAV, TIFF), and please separate each with a semicolon. We recommend that you save your files in a preservation-friendly format. Proprietary formats can cause problems with long-term preservation as software platforms change and previous versions become obsolete. If you wish to include a proprietary format, you may do so, but full preservation can only be assured for sustainable file formats. Enumerating the formats included in your dataset will help users determine what software will be required to reuse your data.

Related publications

Enter a citation to any publication(s) that make use of or reference this dataset. This may include articles, other datasets, or any other published materials that you would like to associate with this data package. Include a full citation where possible, but a DOI, URL, or other persistent identifier may be sufficient. If the publication has not yet been released, please make note of that in the field and plan to update the Research Data Curation Team when it has been published. Multiple citations may be included in this field; please separate each with a semicolon.

Funder or funding agency

Please list the primary funding agency or agencies that supported the research project that generated or collected the data. Identifying the funding agency can help demonstrate compliance with any data sharing requirements made by the agency and connect your research with a larger body of work. Multiple agencies may be included; please separate each with a semicolon.

Grant number

Enter any grant numbers associated with the funding that supported the research generating your data. Multiple values may be entered; please separate each with a semicolon.

A word about Creative Commons licenses

Datasets will default to a CC0 public domain dedication. By applying a CC0 waiver to your data, you are releasing your data for reuse and redistribution without restrictions under copyright or database law. Users of data within the RDR are expected to properly attribute data producers in accordance with community and data citation norms. To learn more see the Creative Commons CC0 web page or consult the RDR Licensing Policy.

If a CC0 waiver is not appropriate for your data, please see the Creative Commons website to identify another license.


References
  • Guide to Creating Metadata for Your Dataset in Deep Blue Data. (2018, May 15). Retrieved from https://deepblue.lib.umich.edu/data/metadata-guidance
  • Guide to writing "readme" style metadata. (n. d.) Retrieved from https://data.research.cornell.edu/content/readme

Grant Support Provided by the Duke Research Data Repository

Including the Duke Research Data Repository (RDR) in a grant application is increasingly common given the rise of data management and sharing policies from funding agencies, such as the 2023 NIH Data Management and Sharing Policy. The RDR is committed to partnering with Duke researchers to meet the needs set forth by both funding and journal publishers regarding data sharing best practices and in alignment with the FAIR (Findable, Accessible, Interoperable, and Reusable) Guiding Principles.

Including the RDR in a Grant Proposal

If you would like to indicate that the Duke Research Data Repository is your data sharing solution in a data management and sharing plan, we ask that you first ensure the following:

  • The data is under 300 GB
  • The data can be classified as public (no restrictions or sensitivities)
  • The data can be completely open access (we cannot provide controlled access or any registration/verification process for data)

If you meet these criteria and our overall policies, you may include the following information in your grant application. We have general boilerplate that can be used across funders and customized boilerplate language for NIH plans.

There are no costs associated with depositing data within the RDR for Duke researchers; however, if you are required to provide a Letter of Support to demonstrate the institutional commitment to your grant, please contact us.

If your data are not in scope for the RDR due to sensitivities or size, we are happy to work with you to identify other repository options. We can also provide a review of any data management and sharing plan by request, simply email datamanagement@duke.edu or use the DMPTool.org to get started on your plan.

The data will be deposited into the Duke Research Data Repository (RDR), an openly accessible preservation archive maintained by the Duke University Libraries. The RDR will assign appropriate metadata (Dublin Core) for discoverability and provide a Digital Object Identifier (DOI) for persistent access and unique identification of the data. All data will be made openly accessible without restrictions on direct download and are findable with standard indexing tools including Google Datasets and DataCite Commons. Reuse conditions and expectations will be communicated to end users through the assignment of a standardized Creative Commons license or waiver. The data will be preserved in the RDR for a minimum of 25 years according to the RDR Retention Policy. When the data are transferred to the RDR, data curators will review deposits to help ensure they are complete and in a structure and format that supports long-term preservation and the FAIR Guiding Principles. The RDR has policies and procedures that comply with the NTSC Desirable Characteristics for Data Repositories. The RDR provides for automated backup of all data, which provides an added layer of protection and security for the data.

Element 4: Data Preservation, Access, and Associated Timelines

Repository where scientific data and metadata will be archived:

The data will be deposited into the Duke Research Data Repository (RDR), an open-access preservation archive maintained by Duke University Libraries. The RDR has policies and procedures that comply with the NIH Desirable Characteristics for Data Repositories. All data within the RDR are accessible to anyone with an internet connection worldwide. Collaborators at other institutions may also contribute data when working with Duke investigators. The data will be preserved in the RDR for the long-term according to RDR policies and procedures. When the data are transferred to the RDR, data curators will review deposits to help ensure they are complete and in a structure and format that supports long-term preservation, access, and reuse. The RDR provides for automated backup of all data, which provides an added layer of protection and security for the data.

How scientific data will be findable and identifiable:

The RDR will assign appropriate metadata (Dublin Core) for discoverability and provides a DataCite Digital Object Identifier (DOI) for persistent access and unique identification of the data, all data in the RDR are findable with standard indexing tools and included in Google Datasets and DataCite Commons.

When and how long the scientific data will be made available:

Data will be shared either at the time of publication or at the end of the performance period, whichever comes first. All data deposited with the Duke Research Data Repository will be retained for a minimum of 25 years according to their stated Retention Policy, data will be retained and remain publicly accessible even in the event of an investigator leaving Duke.

Element 5: Access, Distribution, or Reuse Considerations:

B. Whether access to scientific data will be controlled:

All data will be made openly accessible without restrictions on direct download and there will be no additional limitations placed on these data. Reuse conditions and expectations will be communicated to end users through the assignment of a standardized Creative Commons license or waiver available within the repository metadata record.

RDR and the Desirable Characteristics for Data Repositories

Below we describe how the RDR adheres to the Desirable Characteristics for Data Repositories established by the National Technology Science Council. The RDR also has documented alignment with the NIH-specific Desirable Characteristics for Data Repositories in this document.

"The repository provides broad, equitable, and maximally open access to datasets and their metadata free of charge in a timely manner after submission, consistent with legal and policy requirements related to maintaining privacy and confidentiality, Tribal and national data sovereignty, and protection of sensitive data."

All datasets published within the RDR are freely and openly available under a selected Creative Commons waiver or license with no charge for access. Human data must be deidentified and all consent documentation submitted to curation staff for review.

"The repository ensures datasets are accompanied by documentation describing terms of dataset access and use (e.g., reuse licenses and need for approval by a data use committee)."

All datasets published within the RDR are freely and openly available as described in a selected Creative Commons waiver or license with no charge for access. No approval process or controlled access is allowable.

"The repository has documented capabilities for ensuring that administrative, technical, and physical safeguards are employed to comply with applicable confidentiality, risk management, and continuous monitoring requirements for sensitive data."

RDR does not accept sensitive data; however, all datasets are reviewed during the curation process to avoid any inadvertent disclosure of confidential information. Data deposits require secure Duke authentication for upload and only approved RDR staff members have access to the system.

"The repository provides documentation on policies for data retention."

The RDR has a published Retention Policy with a stated minimum retention period of 25 years.

"The repository has a plan for long-term management of data, including maintaining integrity, authenticity, and availability of datasets; has contingency plans to ensure data are available and maintained during and after unforeseen events."

Duke University Libraries maintains a library-wide digital preservation policy and strategy.

"The repository assigns a dataset a citable, unique persistent identifier (PID or DPI), such as a digital object identifier (DOI), to support data discovery, reporting (e.g., of research progress), and research assessment (e.g., identifying the outputs of Federally funded research). The unique PID points to a persistent location that remains accessible even if the dataset is de-accessioned or no longer available."

All datasets published through the RDR are assigned a DOI via DataCite. No datasets are removed unless approved according to the DUL Deaccessioning Policy, and any deaccessioned datasets will resolve to a tombstone page.

"The repository ensures datasets are accompanied by metadata to enable discovery, reuse, and citation of datasets, using schema that are appropriate to, and ideally widely used across, the communities that the repository serves."

All datasets contain depositor-supplied Dublin Core metadata, which is then reviewed and normalized against local and national controlled vocabularies by curation staff.

"The repository provides or facilitates expert curation and quality assurance to improve the accuracy and integrity of datasets and metadata."

All datasets are subject to curatorial review prior to publication. Duke is also a member of the Data Curation Network, providing a network of curation expertise for various data types.

"The repository ensures datasets are accompanied by metadata that describe terms of reuse and provides the ability to measure attribution, citation, and reuse of data (e.g., through assignment of adequate and openly accessible metadata and unique PIDs)."

All datasets published through the RDR are given a dataset citation to facilitate proper attribution, page views and download counts are available for datasets, and discovery and reuse is facilitated by DOIs and inclusion in the DataCite global registry.

"The repository allows datasets and metadata to be accessed, downloaded, or exported from the repository in widely used, preferably non-proprietary, formats consistent with standards used in the disciplines the repository serves."

Depositors are encouraged to use, and where possible files are converted to, open, preservation-friendly formats.

"The repository has mechanisms in place to record the origin, chain of custody, version control, and any other modifications to submitted datasets and metadata."

Public provenance information is provided for modifications made to versioned datasets and private curation logs are maintained with each dataset.

"The repository supports authentication of data submitters. The repository has technical capabilities that facilitate associating submitter PIDs with those assigned to their deposited digital objects, such as datasets."

Data deposits require secure Duke authentication for upload and only approved RDR staff members have access to the system. ORCIDs are requested from primary contact authors.

"The repository has a plan for long-term management of data, building on a stable technical infrastructure and funding plans."

DUL maintains a library-wide digital preservation policy and strategy. The RDR is also a member of the Samvera Community that supports the Hyrax platform used by the repository.

"The repository has documented measures in place to meet well established cybersecurity criteria for preventing unauthorized access to, modification of, or release of data, with levels of security that are appropriate to the sensitivity of data (e.g., the NIST Cybersecurity Framework: https://www.nist.gov/cyberframework)."

The RDR only houses data that can be fully open and accessible without restriction. This means that the data are not sensitive or restricted according to Duke’s classification standard. The RDR is housed on Duke servers professionally managed by the central Duke Office of Information Technology and in accordance with Duke security policies.

The Duke Research Data Repository is a completely open access repository and therefore does not provide additional security features for human data as described by the Desirable Characteristics. Only human participants data that has proper consent for data sharing, is fully de-identified, AND presents no harm to participants if their identity was inadvertently discovered should be deposited in the RDR. Our approach to human data is outlined in our collections policy and submission guidelines.