This GibbsKo_DATASET_Readme.txt file was generated on 2022-05-03 by Dennis Ko GENERAL INFORMATION 1. Title of Dataset: Human variation impacting MCOLN2 restricts Salmonella Typhi replication by magnesium deprivation 2. Author Information A. Principal Investigator Contact Information Name: Dennis Ko Institution: Department of Molecular Genetics & Microbiology and Department of Medicine Address: 213 Research Drive | Box 3053 DUMC | Durham, N.C. 27710 Email: dennis.ko@duke.edu 3. Date of data collection (single date, range, approximate date): 2007-2022 4. Geographic location of data collection : Seattle, WA and Durham, NC using LCLs collected as part of the International HapMap Project and 1000 Genomes Project 5. Information about funding sources that supported the collection of the data: These data are generated with support from R01AI118903, F31AI136313, F31AI143147 SHARING/ACCESS INFORMATION 1. Licenses/restrictions placed on the data: CCO 2. Links to publications that cite or use the data: These data are part of "Gibbs et al. 2022. Human variation impacting MCOLN2 restricts Salmonella Typhi replication by magnesium deprivation" 3. Links to other publicly accessible locations of the data: NA 4. Links/relationships to ancillary data sets: NA 5. Was data derived from another source? No. 6. Recommended citation for this dataset: cite Gibbs et al. 2022 preprint or manuscript once accepted DATA & FILE OVERVIEW 1. File List: GWAS summary statistics for S. Typhi intracellular replication in 952 LCLs generated using QFAM-parents in PLINK with adaptive permutation. 2. Relationship between files, if important: NA 3. Additional related data collected that was not included in the current data package: See the manuscript Gibbs et al. 2022 4. Are there multiple versions of the dataset? No METHODOLOGICAL INFORMATION 1. Description of methods used for collection/generation of data: For details of methods refer to manuscript Gibbs et al. 2022. Hi-HoST screening of 951 LCLs from parent-offspring trios for S. Typhi intracellular replication occurred in two large sets. In one, S. Typhi intracellular replication was one of 79 host-pathogen phenotypes measured as part of the Hi-HoST Phenome Project (H2P2) (2). H2P2 measured replication in 527 LCLs from four population in the 1000 Genomes Project (34): ESN (Esan in Nigeria), GWD (Gambians in Western Divisions in The Gambia), IBS (Iberian Population in Spain), and KHV (Kinh in Ho Chi Minh City, Vietnam). In this dataset, we determined that replication is a quantitative trait suitable for GWAS due to its interindividual variation (mean of 1.7-fold with standard deviation of 0.3), high experimental repeatability (~75% variance is due to inter-individual variation in two-way ANOVA), and substantial heritability (h2=0.33 with p=0.002 in parent-offspring regression) (2). To these 527 LCLs, we added previously unpublished data on S. Typhi replication from 424 LCLs from four population in the HapMap project: CEU (Utah residents with ancestry from northern and western Europe), YRI (Yoruba in Ibadan, Nigeria), CHB (Han Chinese in Beijing, China), and JPT (Japanese in Tokyo, Japan) (35). For all 951 LCLs, we used flow cytometry to quantify intracellular bacterial burden as the median fluorescent intensity (MFI) of GFP in the living (7AAD–) and GFP+ host cells, which contain viable GFP-tagged S. Typhi (see above for details of this fluorescence-based gentamicin protection assay). From these MFI measurements, we calculated intracellular replication or permissivity as the ratio of 24 hpi to 3.5 hpi burden. Each LCL was measured on three sequential passages and the phenotype used for GWAS was calculated as the mean measurement of these three independent assays. Each batch of LCLs measured during Hi-HOST screening was z-score transformed to reduce inter-batch experimental variation: Z=(x-μ_batch)/σ_batch. 2. Methods for processing the data: For details of methods refer to manuscript Gibbs et al. 2022. Genotypes were obtained from HapMap r28 and 1000 Genomes Project Phase 3 with imputation using 1000 Genomes Project Phase 3. Filters included minor allele frequency (MAF) < 0.05, SNP missingness of > 0.2 and sample genotype missingness of > 0.2, resulting in a total of 8386469 SNPs for subsequent analysis. Genome-wide association analysis was carried out using the QFAM-parents approach in PLINK v1.9 (5, 36) with adaptive permutations ranging from 1000 to a maximum of 109. 3. Instrument- or software-specific information needed to interpret the data: text file 4. Standards and calibration information, if appropriate: No. 5. Environmental/experimental conditions: LCLs were maintained in lab at 37˚C in a 5% CO2 atmosphere and were grown in RPMI 1640 media (Invitrogen) supplemented with 10% fetal bovine serum (FBS), 2 mM glutamine, 100 U/ml penicillin-G, and 100 mg/ml streptomycin. 6. Describe any quality-assurance procedures performed on the data: NA 7. People involved with sample collection, processing, analysis and/or submission: Data generation and processing conducted by Kyle Gibbs, Liuyang Wang, and Dennis Ko DATA-SPECIFIC INFORMATION FOR: HiHOST_SummaryStats_STyphiReplication Permutation results file generated using QFAM-parents in PLINK 1. Number of variables: 1 2. Number of cases/rows: 8368471 3. Variable List: The file contains following columns CHR Chromosome SNP SNP ID BETA Regression slope for real data EMP_BETA Sample mean of permutation regression slopes EMP_SE Sample stdev of permutation regression slopes. EMP1 Empirical p-value NP Number of permutations performed 4. Missing data codes: 5. Specialized formats or other abbreviations used: