1. Dataset title: Data and scripts from: A Structure Database and In Silico Spectral Library for Comprehensive Suspect Screening of Per- and Polyfluoroalkyl Substances (PFASs) in Environmental Media by High-resolution Mass Spectrometry. 2. Principal Investigator: Gordon Getzinger, gjg3@duke.edu 3. Files: scripts/README.md: A vingette demonstrating database construction and use. scripts/make_masslist.R: R functions required for creating the database. scripts/pfas_masslist.py: Python functions required for creating the database. sql/schema.sql: The SQLite database schema - can be used to build the SQLite database from the flat file tables. sql/PFAScreeneR.sql: Complete database in SQL export form - can be used to build the SQLite database directly and independently of flat file tables. Flat files for individual tables in the SQLite database: csv/list_membership.csv: InChIKey: Hashed international chemical identifier (InChI). List: Name of the input molecule list. csv/molecules.csv: ID: First fourteen characters of the InChIKey. InChIKey: Hashed InChI. InChI: IUPAC international chemical identifier. SMILES: simplified molecular-input line-entry system molecule representation. MolStructure: Chemical table file in MDL format. csv/mol_props.csv: ID: First fourteen characters of the InChIKey. MolForm: Molecular formula. ExactMass: The neutral, monoisotopic exact mass (Dalton). FormalCharge: Sum of charges assigned to atoms in the molecule. HBD: Number of hydrogen-bond donors. HBA: Number of hydrogen-bond acceptors. csv/rxns.csv: Precursor_InChIKey: InChIKey of the precursor of a reaction. Product_InChIKey: InChIKey of the product of a reaction. rxn_type: Class of reaction (i.e., hydrolysis or biotransformation) csv/cfm_neg.csv: ID: First fourteen characters of the InChIKey. spectrum: The standard output created by CFM algorithm--run in negative ionization mode--containing the m/z, predicted intensity, and SMILES of predicted fragment ions. csv/cfm_pos.csv: ID: First fourteen characters of the InChIKey. spectrum: The standard output created by CFM algorithm--run in positive ionization mode--containing the m/z, predicted intensity, and SMILES of predicted fragment ions. csv/bio_rxns.csv: Precursor_InChIKey: InChIKey of the precursor of a reaction. Product_InChIKey: InChIKey of the product of a reaction. Reaction: The reaction name from Biotransformer/EnviPath. Reaction ID: The reaction identifier from Biotransformer/EnviPath. Enzymes(s): The enzyme(s) identifiers from Biotransformer/EnviPath. Biosystem: The name of the biosystem from Biotransfromer/EnviPath. csv/hyd_rxns.csv: Precursor_InChIKey: InChIKey of the precursor of a reaction. Product_InChIKey: InChIKey of the product of a reaction. Reaction: The reaction name of the applied hydrolysis reaction. RxnStep: The number of reaction steps starting from an input parent compound required to form the structure. 4. File formats: csv, sql, R, py, md 5. Versioning: 20201115: Initial submission. 20201118: Update csv files, add sql and scripts paths. 20201119: Added table names and field descriptions to README. Updated title. 20210201: Updated database schema and added SQL export file for database.