This cef39_202111291421 file was generated on 20211201 by CHRISTINE CRUTE ------------------- GENERAL INFORMATION ------------------- Title of Dataset: Data from: RNA sequencing of rabbit placentas exposed to a PFAS-mixture Author Information (Name, Institution, Address, Email) Principal Investigator: Christine Crute, Duke University, cef39@duke.edu Associate or Co-investigator: Liping Feng, Duke University, lipingfeng@duke.edu Alternate Contact(s): Date of data collection (single date, range, approximate date): <20210801> Geographic location of data collection: Information about funding sources or sponsorship that supported the collection of the data: HHS | National Institutes of Health (NIH), Grant/Award Number: 1K01TW010828-01 -------------------------- SHARING/ACCESS INFORMATION -------------------------- Licenses/restrictions placed on the data, or limitations of reuse: Recommended citation for the data: TBD Citation for and links to publications that cite or use the data: TBD Links to other publicly accessible locations of the data: N/A Links/relationships to ancillary or related data sets: N/A -------------------- DATA & FILE OVERVIEW -------------------- File list (filenames, directory structure (for zipped files) and brief description of all data files): Files list, as zipped in fastq.gz: Cntl-1_S33_L002_R1_001 Cntl-1_S33_L002_R2_001 Cntl-2_S34_L002_R1_001 Cntl-2_S34_L002_R2_001 Cntl-3_S35_L002_R1_001 Cntl-3_S35_L002_R2_001 PFAS-mix-1_S36_L002_R1_001 PFAS-mix-1_S36_L002_R2_001 PFAS-mix-2_S37_L002_R1_001 PFAS-mix-2_S37_L002_R2_001 PFAS-mix-3_S38_L002_R1_001 PFAS-mix-3_S38_L002_R2_001 Directory Structure: - S1—You can ignore it. It is the sample number based on the order in one lane that samples are first listed in the sample sheet starting with 1. In this example, S1 indicates that the sample ID is the first listed in the sample sheet. - L001—The lane number. - R1—The read. In this example, R1 means Read 1. For a paired-end run, there is at least one file with R2 in the file name for Read 2. - 001—You can ignore it. It is always 001. Relationship between files, if important for context: CNTL = controls; PFAS-mix= rabbits dosed with PFAS-mixture Additional related data collected that was not included in the current data package: N/A If data was derived from another source, list source: Duke Data Service If there are there multiple versions of the dataset, list the file updated, when and why update was made: N/A -------------------------- METHODOLOGICAL INFORMATION -------------------------- Description of methods used for collection/generation of data: Extracted RNA from one male and one female placenta per dam were pooled into 3 exposed and 3 control samples. Total RNA quality and concentration was assessed on a 2100 Bioanalyzer (Agilent Technologies) and Qubit 2.0 (ThermoFisher Scientific), respectively, and extracts with RNA Integrity Number (RIN) greater than 7 were processed for sequencing. Library preparation followed the manufacturer’s protocol of the commercially available Roche KAPA Stranded mRNA-Seq Kit. Briefly, mRNA transcripts were captured using magnetic oligo-dT beads, fragmented using heat and magnesium, and reverse transcribed using random priming. During the 2nd strand synthesis, the cDNA:RNA hybrid was converted into to double-stranded cDNA (dscDNA) and dUTP incorporated into the 2nd cDNA strand, effectively marking the second strand. Illumina sequencing adapters were then ligated to the dscDNA fragments and amplified to produce the final RNA-seq library. The strand marked with dUTP was not amplified, allowing strand-specificity sequencing. Libraries were indexed using a dual indexing approach allowing for multiple libraries to be pooled and sequenced on the same Ilumina NovaSeq 6000 sequencing platform. Before pooling and sequencing, fragment length distribution and library quality was first assessed via Agilent Fragment Analyzer (Agilent Technologies). All libraries were then pooled in equimolar ratio and sequenced. Sequencing was done on one lane of a NovaSeq 6000 S-Prime flow cell at 50bp paired-end. Once generated, sequence data was demultiplexed and Fastq files generated using Bcl2Fastq conversion software provided by Illumina. Methods for processing the data: Fastq files were run through fastp version 0.20.1 to control and improve the sequence read quality before starting downstream analysis (Chen et al., 2018). Alignment of the trimmed reads was performed using STAR version 2.7.9a (Dobin et al., 2013). The reference genome assembly OryCun2.0 for Oryctolagus cuniculus (rabbit) was used with the corresponding RefSeq annotation (collected from NCBI datasets) for generating the STAR index. Reads were then aligned to the genome index via ‘--quantMode GeneCounts’ to producing gene counts. The aligned reads were sorted and indexed using samtools version 1.9 (Li et al., 2009). Read counts produced by the STAR aligner were gathered to a count matrix that was then used as input for differential gene expression analysis using DESeq2 version 1.28.1 (Love et al., 2014). Adaptive shrinkage was used for adjusting the expression differences observed for lowly expressed genes. Significant differential genes were identified by filtering the adjusted results on adjusted p-value < 0.1. (Stephens, 2017) Software- or Instrument-specific information needed to interpret the data, including software and hardware version numbers: Ilumina NovaSeq 6000 Standards and calibration information, if appropriate: N/A Environmental/experimental conditions: exposure to PFAS-mixture Describe any quality-assurance procedures performed on the data: Fastq files were run through fastp version 0.20.1 to control and improve the sequence read quality before starting downstream analysis. People involved with sample collection, processing, analysis and/or submission: Christine Crute, Duke Center for Genomic and Computational Biology (GCB) -------------------------- DATA-SPECIFIC INFORMATION -------------------------- Number of variables: N/A Number of cases/rows: N/A Variable list, defining any abbreviations, units of measure, codes or symbols used: N/A Missing data codes: N/A Specialized formats or other abbreviations used: N/A