This readme file was generated on [2024-08-10] by Shira Faigenbaum-Golovin ------------------- GENERAL INFORMATION ------------------- Title of Dataset: Data from: Critical biblical studies via word frequency analysis: unveiling text authorship Description: We address the question of authorship of biblical texts by employing statistical analysis to the frequency of words using a new method that is particularly sensitive to deviations in frequencies associated with few words out of potentially many. The data below consists of the “discriminating words” which have the biggest effect on the value of the Higher Criticism statistic from the analyses of 50 chapters. This data repository contains raw data 1) the list of all the lemmas used for the study; 2) the resulted indicative words for discriminating between the given chapter, and the three reference corpora. Author Contact Information: Principal Investigator: Shira Faigenbaum-Golovin Institution: Duke university Email: alexandra.golovin@duke.edu ORCID: 0000-0003-0320-9726 Associate or Co-investigator: Alon Kipnis Institution: Reichman University Email: alon.kipnis@runi.ac.il ORCID: 0000-0003-3798-8035 -------------------- DATA & FILE OVERVIEW -------------------- File list (filenames, directory structure (for zipped files) and brief description of all data files): ------------------------- Discriminating words.zip ------------------------- The zip consists of a file for each tested chapter out of the 50 chapter in the dataset: (a) Deuteronomy: Deut 6; 12–13; 15–16; 18–19; 26; 28; (b) Deuteronomistic History: Deut 8–11; 27; Josh 1; 5; 6; 12; 23; Judg 2; 6; 2 Sam 7; 1 Kgs 8; 2 Kgs 17:1–21; 22–25; (c) Priestly: Gen 1; 17; Exod 6; 16; 25–31; 35–40; Lev 1–3; 8–9. For each file, named by as the chapter, a list of the discriminating terms is provided for the rejection of the same author hypothesis between the current chapter and the corpus listed in the "Comparing with" column. The words are sorted by the p-value within each corpus comparison. -------------------------------------------------- Chapter size calculation_50_chapters.xlsx -------------------------------------------------- The file contains all the lemmas for each of the 50 chapters tested. -------------------------- METHODOLOGICAL INFORMATION -------------------------- The study was based on words from "Open Scriptures Hebrew Bible". The authorship attribution statistical analysis follows the method proposed by Kipnis and Donoho [1-2] that measures the resemblance of two word-frequency tables by means of extending the well-known Higher Criticism (HC) statistic [1] Kipnis, A., Higher criticism for discriminating word-frequency tables and authorship attribution. The Annals Appl. Stat. 2022; 16:1236–1252. [2] Donoho, D. L., Kipnis, A., Higher criticism to compare two large frequency tables, with sensitivity to possible rare and weak differences. The Annals Stat. 2022; 50:1447–1472. -------------------------- DATA-SPECIFIC INFORMATION -------------------------- ------------------------- Discriminating words.zip ------------------------- The zip consists of a file for each tested chapter out of the 50 chapter in the dataset: (a) Deuteronomy: Deut 6; 12–13; 15–16; 18–19; 26; 28; (b) Deuteronomistic History: Deut 8–11; 27; Josh 1; 5; 6; 12; 23; Judg 2; 6; 2 Sam 7; 1 Kgs 8; 2 Kgs 17:1–21; 22–25; (c) Priestly: Gen 1; 17; Exod 6; 16; 25–31; 35–40; Lev 1–3; 8–9. For each file, named by as the chapter, a list of the discriminating terms is provided for the rejection of the same author hypothesis between the current chapter and the corpus listed in the "Comparing with" column. The words are sorted by the p-value within each corpus comparison. - feature - lemma id as defined by the "Open Scriptures Hebrew Bible (OSHB)" - affinity - (-1, 1), The sign of score indicates whether the word had high frequency (appeared more) in the current corpus or in the one we are comparing against (positive or negative respectively). - pval - (0, gentilic nouns by . - feature (org) - Original lemma id as defined by the "Open Scriptures Hebrew Bible (OSHB)", without the replacements. - morph - morphology of the word - term - one representative word of the lemma, see above. - lemma - list of all the lemma id that is around the current term. - POW - suffix, main, prefix - feature-trans - term transliteration as provided by OSHB The data was derived from: Open Scriptures Hebrew Bible (OSHB) - Original work of the Open Scriptures Hebrew Bible available at https://github.com/openscriptures/morphhb