# Data and scripts for: Localized orbital scaling correction for periodic systems (Mahler et al., arXiv:2202.01870 (2022), accepted in PRB). ## Introduction Includes band gaps, total energies, and associated data computed with the density functional approximation (DFA) PBE and with the (screened) localized orbital scaling correction (s)LOSC. Molecular data are computed with the `QM4D` code, developed in our group; bulk data are computed with [`Quantum ESPRESSO`](https://quantum-espresso.org) version 6.5 (to which (s)LOSC for bulk materials is a local modification) and with a locally modified version of [`wannier90`](http://www.wannier.org/) version 3.1. These data consist of three subfolders: * `main_data/`: * `sweep_gamma/`: * `sweep_virtuals/`: Each subfolder contains an [`R` script](https://r-project.org) (version 4.1.2), which generates figures (`.png` files) and LaTeX data tables (`.txt` files) from raw data (`.csv` files). Below, we describe the input files (with their variables) and the output files from each script. ## `main_data/sLOSC.R` Band gap and total energy calculations with the PBE density functional approximation and the sLOSC correction on the SC/40+ dataset of 43 semiconductors and large-gap insulators, as well as on a subset of 19 molecules from the G2/97 test set. In particular, * Mean absolute percentage error (MAPE) as the Coulomb screening parameter $\alpha$ is tuned (Fig. (S2) of the Supplemental Material); * Calculated and experimental gaps from PBE, LOSC2, and sLOSC (Fig. 1 of the main text). Note that the set of molecules from which the ionization potential (IP) is computed is not the same as the set of molecules from which the electron affinity (EA) is computed; the files `mol_ip_names.csv` and `mol_ea_names.csv` allow us to merge the lists and obtain the band gaps for the two sets' intersection. ### Inputs #### `sc40_dfa.csv`: The PBE (DFA) calculations leading to the band gap of the SC/40+ dataset. * name: The chemical formula. * homo: The DFA valence band maximum, in electronvolts. * lumo: The DFA conduction band minimum, in electronvolts. * gap: The DFA band gap, in electronvolts. #### `sc40_losc.csv` The sLOSC calculations leading to the band gap of the SC/40+ dataset. * name: The chemical formula. * k: The Coulomb screening parameter of sLOSC, called $\alpha$ in the paper. * homo: The sLOSC valence band maximum, in electronvolts. * lumo: The sLOSC conduction band minimum, in electronvolts. * gap: The sLOSC band gap, in electronvolts. #### `sc40_ref.csv` Experimental band gaps of the SC/40+ systems. * name: The chemical formula. * gap: The reference (experimental) band gap, in electronvolts. #### `sc40_energy.csv` Total energy of SC/40+ calculations. * name: The chemical formula. * k: The Coulomb screening parameter of sLOSC, called $\alpha$ in the paper. * edfa: The DFA total energy computed with `Quantum ESPRESSO`, in rydbergs. * delosc: The sLOSC correction to the total energy (in rydbergs). #### `sc40_lattice.csv` Experimental lattice parameters characterizing the SC/40+ dataset. For sources, see the paper. * name: The chemical formula. * ncoord: The coordination number. * atomwt: The weight of the first atom in the lattice. * atomwt2: The weight of the second atom in the lattice (if applicable). * lattice: The lattice type: di (diamond), zb (zincblende), wu (wurtzite), rs (rocksalt), ccp (cubic close packed). * param: The lattice parameter (in angstroms). * param2: The second lattice parameter in angstroms (if applicable). #### `strukturbericht.csv`: Strukturbericht parameters characterizing the lattices of the SC/40+ dataset. * lattice: The lattice type (see `sc40_lattice.csv`). * strukturbericht: A4 (diamond); B3 (zincblende); B4 (wurtzite); B1 (rocksalt); A1 (cubic close packed). #### `mol_ip_raw.csv`: The DFA (PBE), sLOSC, and reference (CCSD(T)) ionization potentials of the molecular systems. The coupled-cluster singles-and-doubles with perturbative triples method, CCSD(T), extrapolated to the infinite-basis limit, is used for the reference value. See Table S5 of the Supporting Information of Su et al., J. Phys. Chem. Lett. 11, 1528 (2020), doi:10.1021/acs.jpclett.9b03888 for details. * name: A placeholder name for each system. * k: The sLOSC screening parameter $\alpha$. * tau: The exchange factor $\tau$, invariably 1.2378 (see the paper). * ip_dfa: The DFA calculated ionization potential (IP), in electronvolts. * ip_losc: The sLOSC calculated ionization potential (IP), in electronvolts. * ip_ref: The CCSD(T) calculated ionization potential (IP), in electronvolts. #### `mol_ea_raw.csv`: The DFA (PBE), sLOSC, and reference (CCSD(T)) electron affinities of the molecular systems. The coupled-cluster singles-and-doubles with perturbative triples method, CCSD(T), extrapolated to the infinite-basis limit, is used for the reference value. See Table S5 of the Supporting Information of Su et al., J. Phys. Chem. Lett. 11, 1528 (2020), doi:10.1021/acs.jpclett.9b03888 for details. * name: A placeholder name for each system. * k: The sLOSC screening parameter $\alpha$. * tau: The exchange factor $\tau$, invariably 1.2378 (see the paper). * ea_dfa: The DFA calculated electron affinity (EA), in electronvolts. * ea_losc: The sLOSC calculated electron affinity (EA), in electronvolts. * ea_ref: The CCSD(T) calculated electron affinity (EA), in electronvolts. #### `mol_ip_energy.csv`: The DFA (PBE) and sLOSC total energy calculations for the molecules we computed the IPs of. * name: The placeholder name. * k: The sLOSC Coulomb screening parameter $\alpha$. * edfa: The DFA (PBE) total energy, in hartrees. * delosc: The sLOSC correction to the total energy, in hartrees. #### `mol_ea_energy.csv`: The DFA (PBE) and sLOSC total energy calculations for the molecules we computed the EAs of. * name: The placeholder name. * k: The sLOSC Coulomb screening parameter $\alpha$. * edfa: The DFA (PBE) total energy, in hartrees. * delosc: The sLOSC correction to the total energy, in hartrees. #### `mol_ip_names.csv`: The chemical formulas that match the placeholder names for the IP molecules. * name: The placeholder name. * formula: The chemical formula. #### `mol_ip_names.csv`: The chemical formulas that match the placeholder names for the EA molecules. * name: The placeholder name. * formula: The chemical formula. ### Outputs #### `CalcVsExp_sc40.png`: The scatterplot of DFA (PBE), unscreened sLOSC (LOSC2), and sLOSC band gaps compared to the experimental values for the SC/40+ dataset. This is Fig. 1 of the paper. #### `CalcVsExp_sc40_LOSC2.png`: The scatterplot of DFA (PBE) and unscreened sLOSC (LOSC2) band gaps without sLOSC, compared to the experimental values, for the SC/40+ dataset. #### `CalcVsExp_mol.png`: The scatterplot of DFA (PBE), unscreened sLOSC (LOSC2), and sLOSC band gaps compared to the experimental values for the molecules considered. This is Fig. S4 of the Supplemental Material attached to the paper. #### `mape.png`: The mean absolute percentage error (MAPE) in bulk and molecular band gap as the Coulomb screening parameter $\alpha$ is varied. This is Fig. S2 of the Supplemental Material attached to the paper. #### `mae.png`: The mean absolute error (MAE) in bulk and molecular band gap as the Coulomb screening parameter $\alpha$ is varied. #### `mse.png`: The mean squared error (MSE) in bulk and molecular band gap as the Coulomb screening parameter $\alpha$ is varied. #### `dataTable_mol.txt`: The LaTeX table capturing the raw data (formula, DFA/sLOSC/reference band gaps, and DFA/sLOSC total energies) from the molecular systems computed. This is Table S3 of the Supplemental Material. #### `dataTable_sc40.txt`: The LaTeX table capturing the raw data (formula, lattice type and paramters, DFA/sLOSC/reference band gaps, and DFA/sLOSc total energies) from the bulk systems computed. This is Table S2 of the Supplemental Material. ___ ## `sweep_gamma/sLOSC_gamma.R` Mean absolute error in band gap as space/energy localization parameter $\gamma$ is tuned. ### Inputs #### `sc40_data.csv`: Collected DFA and sLOSC data from the SC/40+ dataset as both the Coulomb screening parameter $\alpha$ and the spatial/energy mixing parameter $\gamma$ are varied. * name: The chemical formula. * gamma: The space-energy mixing parameter $0 \leq \gamma \leq 1$; if $\gamma = 0$ the sLOSC correction is computed with maximally localized Wannier functions, while as $\gamma \to 1$ we have maximal energy localization instead (while preserving the translational symmetry of Wannier functions). * k: The sLOSC Coulomb screening parameter $\alpha$. * homodfa: The DFA valence band maximum, in electronvolts. * lumodfa: The DFA conduction band minimum, in electronvolts. * homolosc: The sLOSC valence band maximum, in electronvolts. * lumolosc: The sLOSC conduction band minimum, in electronvolts. * edfa: The DFA total energy, in rydbergs. * delosc: The sLOSC energy correction, in rydbergs. #### `sc40_ref.csv`: The experimental band gaps for the SC/40+ dataset. * name: The chemical formula. * gap: The experimental band gap, in electronvolts. #### `mol_data.csv`: Collected DFA and sLOSC data from the molecular dataset as both the Coulomb screening parameter $\alpha$ and the spatial/energy mixing parameter $\gamma$ are varied. * type: Whether the system came from the EA or IP subset (see `main_data` above). * mol: The placeholder number (which is different between EA and IP; that is, EA:Mol10 is not the same as IP:Mol10). * k: The sLOSC Coulomb screening parameter $\alpha$. * g: The space-energy mixing parameter $\gamma$ (see above). * homodfa: The DFA valence band maximum, in electronvolts. * lumodfa: The DFA conduction band minimum, in electronvolts. * homolosc: The sLOSC valence band maximum, in electronvolts. * lumolosc: The sLOSC conduction band minimum, in electronvolts. * edfa: The DFA total energy, in rydbergs. * delosc: The sLOSC energy correction, in rydbergs. #### `mol_ip_ref.csv`: The reference (CCSD(T)) ionization potentials. * ipmol: The placeholder name. * ipref: The reference ionization potential, in electronvolts. #### `mol_ea_ref.csv`: The reference (CCSD(T)) electron affinities. * eamol: The placeholder name. * earef: The reference electron affinity, in electronvolts. #### `IP_names.csv`: The chemical formulas corresponding to the placeholder names of the IP systems. * name: The placeholder name. * formula: The chemical formula. #### `EA_names.csv`: The chemical formulas corresponding to the placeholder names of the EA systems. * name: The placeholder name. * formula: The chemical formula. #### `matches.csv`: The IP and EA placeholders that correspond to the same formula. Each row corresponds to the same chemical formula. * ipmol: The IP placeholder. * eamol: The EA placeholder. ### Outputs #### `sc40_gamma_mae.png`: A 3-D plot of the mean absolute error (MAE), in electronvolts, as $\gamma$ and $\alpha$ are varied. #### `molerror.txt`: LaTeX table of the mean relative error (MRE), in percent, of the molecular band gap for some choices of $\alpha, \gamma$ near the sLOSC values chosen for bulk systems. ___ ## `sweep_virtuals/sLOSC_virtuals.R` Error in band gap as a function of Coulomb screening parameter $\alpha$ as the number of virtual orbitals is varied. Calculated on a subset of the SC/40+ set, given below by name (short name, lattice): * Carbon (C, diamond); * Gallium antimonide (GaSb, zincblende); * Germanium (Ge, diamond); * Indium arsenide (InAs, zincblende); * Indium nitride (InN, wurtzite); * Indium antimonide (InSb, zincblende); * Lithium fluoride (LiF, rocksalt); * Sodium fluoride (NaF, rocksalt); * Silicon (Si, diamond); * Silicon carbide (SiC, wurtzite). ### Inputs #### `sc40_dfa_nc{1,2,3,4,5}.csv`: DFA band gap information for the SC/40+ subset with $\{1,2,3,4,5\}$ coordination shells of conduction bands. * name: The chemical formula. * homo: The DFA valence band maximum, in electronvolts. * lumo: The DFA conduction band minimum, in electronvolts. * gap: The DFA band gap, in electronvolts. #### `sc40_losc_nc{1,2,3,4,5}.csv`: sLOSC band gap information for the SC/40+ subset with $\{1,2,3,4,5\}$ coordination shells of conduction bands. * name: The chemical formula. * k: The sLOSC Coulomb screening parameter $\alpha$. * homo: The sLOSC valence band maximum, in electronvolts. * lumo: The sLOSC conduction band minimum, in electronvolts. * gap: The sLOSC band gap, in electronvolts. #### `sc40_ref.csv`: Reference (experimental) band gap information for the SC/40+ subset. * name: The chemical formula. * gap: The experimental band gap, in electronvolts. ### Outputs #### `m{a,ap,s}e.png`: The mean $\{\text{absolute}, \text{absolute percentage}, \text{squared}\}$ error as the sLOSC screening parameter $\alpha$ is varied for different numbers of conduction bands (the DFA calculation is insensitive to the number of conduction bands and has no screening parameter, hence is a horizontal line). #### `gap_{C,GaSb,Ge,InAs,InN,InSb,LiF,NaF,Si,SiC}.png`: The sLOSC band gap of the specified system as the Coulomb screening parameter $\alpha$ is varied, for different numbers of conduction bands. The experimental gap is shown as a horizontal line. Each of these figures is one of the subfigures (a)--(j) in Fig. S1 of the Supplemental Material. ___ ## Running the `R` scripts To run the R scripts, change the `setwd` line near the beginning to match the current working directory in which the scripts and data are located. In case of future deprecations, the session information of the R environment under which these data were generated follows below. (Note that the `orca` package, used to save the 3-D plots as `PNG` images, is set to be replaced in the future.) ``` > sessionInfo() R version 4.1.2 (2021-11-01) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Monterey 12.4 Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] plotly_4.10.0 xtable_1.8-4 latex2exp_0.9.4 egg_0.4.5 gridExtra_2.3 forcats_0.5.1 stringr_1.4.0 dplyr_1.0.8 purrr_0.3.4 [10] readr_2.1.2 tidyr_1.2.0 tibble_3.1.6 ggplot2_3.3.5 tidyverse_1.3.1 loaded via a namespace (and not attached): [1] Rcpp_1.0.8 lubridate_1.8.0 ps_1.6.0 assertthat_0.2.1 digest_0.6.29 utf8_1.2.2 R6_2.5.1 cellranger_1.1.0 [9] backports_1.4.1 reprex_2.0.1 httr_1.4.2 pillar_1.7.0 rlang_1.0.2 lazyeval_0.2.2 readxl_1.3.1 rstudioapi_0.13 [17] data.table_1.14.2 htmlwidgets_1.5.4 bit_4.0.4 munsell_0.5.0 broom_0.7.12 compiler_4.1.2 modelr_0.1.8 pkgconfig_2.0.3 [25] htmltools_0.5.2 tidyselect_1.1.2 fansi_1.0.2 viridisLite_0.4.0 crayon_1.5.0 tzdb_0.2.0 dbplyr_2.1.1 withr_2.5.0 [33] grid_4.1.2 jsonlite_1.8.0 gtable_0.3.0 lifecycle_1.0.1 DBI_1.1.2 magrittr_2.0.2 scales_1.1.1 cli_3.2.0 [41] stringi_1.7.6 vroom_1.5.7 fs_1.5.2 xml2_1.3.3 ellipsis_0.3.2 generics_0.1.2 vctrs_0.3.8 tools_4.1.2 [49] bit64_4.0.5 glue_1.6.2 hms_1.1.1 crosstalk_1.2.0 processx_3.5.3 yaml_2.3.5 parallel_4.1.2 fastmap_1.1.0 [57] colorspace_2.0-3 rvest_1.0.2 haven_2.4.3 ```