1. Dataset title: Data and scripts from: A Structure Database and In Silico Spectral Library for Comprehensive Suspect Screening of Per- and Polyfluoroalkyl Substances (PFASs) in Environmental Media by High-resolution Mass Spectrometry. 

2. Principal Investigator: Gordon Getzinger, gjg3@duke.edu

3. Files:
scripts/README.md: A vingette demonstrating database construction and use. 
scripts/make_masslist.R: R functions required for creating the database. 
scripts/pfas_masslist.py: Python functions required for creating the database. 
sql/schema.sql: The SQLite database schema - can be used to build the SQLite database from the flat file tables.
sql/PFAScreeneR.sql: Complete database in SQL export form - can be used to build the SQLite database directly and independently of flat file tables.  
Flat files for individual tables in the SQLite database:
	csv/list_membership.csv: 
		InChIKey: Hashed international chemical identifier (InChI). 
		List: Name of the input molecule list. 
	csv/molecules.csv:
		ID: First fourteen characters of the InChIKey. 
		InChIKey: Hashed InChI. 
		InChI: IUPAC international chemical identifier. 
		SMILES: simplified molecular-input line-entry system molecule representation. 
		MolStructure: Chemical table file in MDL format. 
	csv/mol_props.csv:
		ID: First fourteen characters of the InChIKey. 
		MolForm: Molecular formula. 
		ExactMass: The neutral, monoisotopic exact mass (Dalton). 
		FormalCharge: Sum of charges assigned to atoms in the molecule. 
		HBD: Number of hydrogen-bond donors. 
		HBA: Number of hydrogen-bond acceptors. 
	csv/rxns.csv:
		Precursor_InChIKey: InChIKey of the precursor of a reaction. 
		Product_InChIKey: InChIKey of the product of a reaction. 
		rxn_type: Class of reaction (i.e., hydrolysis or biotransformation)
	csv/cfm_neg.csv:
		ID: First fourteen characters of the InChIKey. 
		spectrum: The standard output created by CFM algorithm--run in negative ionization mode--containing the m/z, predicted intensity, and SMILES of predicted fragment ions. 
	csv/cfm_pos.csv:
		ID: First fourteen characters of the InChIKey. 
		spectrum: The standard output created by CFM algorithm--run in positive ionization mode--containing the m/z, predicted intensity, and SMILES of predicted fragment ions. 
	csv/bio_rxns.csv:
		Precursor_InChIKey: InChIKey of the precursor of a reaction. 
		Product_InChIKey: InChIKey of the product of a reaction.
		Reaction: The reaction name from Biotransformer/EnviPath. 
		Reaction ID: The reaction identifier from Biotransformer/EnviPath.
		Enzymes(s): The enzyme(s) identifiers from Biotransformer/EnviPath. 	
		Biosystem: The name of the biosystem from Biotransfromer/EnviPath. 
	csv/hyd_rxns.csv:
		Precursor_InChIKey: InChIKey of the precursor of a reaction. 
		Product_InChIKey: InChIKey of the product of a reaction.
		Reaction: The reaction name of the applied hydrolysis reaction.
		RxnStep: The number of reaction steps starting from an input parent compound required to form the structure. 

4. File formats: csv, sql, R, py, md

5. Versioning: 

20201115: Initial submission. 
20201118: Update csv files, add sql and scripts paths. 
20201119: Added table names and field descriptions to README. Updated title. 
20210201: Updated database schema and added SQL export file for database.