This readme file was generated on 2022-10-18 by Samuel Brudner


GENERAL INFORMATION

Title of Dataset: Juvenile zebra finch syllables for data-driven analysis of development

Author/Principal Investigator Information
Name: Richard Mooney
ORCID: 0000-0002-3308-1367
Institution: Dept. of Neurobiology, Duke University
Address: 
301C Bryan Research Building
311 Research Drive
Durham, NC 27710-4432
Email: mooney@neuro.duke.edu

Author/Associate or Co-investigator Information
Name: John Pearson
ORCID: 0000-0002-9876-7837
Institution: Dept of Biostatistics and Bioinformatics, Duke University
Address: 
B255 Levine Science Research Center
Durham, NC 27710
Email: john.pearson@duke.edu


Dates of data collection:  Oct 2019 - Jan 2021

Geographic location of data collection: Duke University, Durham, NC

Information about funding sources that supported the collection of the data: NIH 5R01NS099288, NIH 1RF1NS118424


SHARING/ACCESS INFORMATION

Licenses/restrictions placed on the data: 

Links to publications that cite or use the data: 

Links to other publicly accessible locations of the data: 

Links/relationships to ancillary data sets: 

Was data derived from another source? No
If yes, list source(s): 

Recommended citation for this dataset: 


DATA & FILE OVERVIEW

File List: {birdId}_hatchdate.mat,{birdId}_specs.zip, {birdId}_segs.zip, {birdId}_proj.zip, {birdId}_raw_wav.zip, {birdId}_table.mat, trained_vae/{birdId}_vae.tar, predicted_age_models/ffnn_{birdId}_table.mat, predicted_age_models_fewerLayers/ffnn_{birdId}_table.mat, gauss_models/{birdId}_{syllable}_checkpoint_neg_loss=-{loss_value}.pt, gauss_models_64/{birdId}_{syllable}_checkpoint_neg_loss=-{loss_value}.pt, gauss_models_layers/{birdId}_{syllable}_checkpoint_neg_loss=-{loss_value}.pt

Relationship between files, if important: For a given birdId, {birdId}_raw_wav.zip and {birdId}_segs.zip have matched directory structures and file names, following SAP convention and, in the case of the segmentation files, as written by AVA (Goffinet et al., 2021). For a given birdId, {birdId}_specs.zip and {birdId}_proj.zip have matched directory structure and file names. For matched file names, information refers to the same syllables in the same sequence, as written by AVA (Goffinet et al., 2021). Files for bird grn394, syllable B, are used as examples in submission for publication.

Additional related data collected that was not included in the current data package: Small amounts of directed song were collected for grn475 and sil469, but associated raw wav files were not included.

Are there multiple versions of the dataset? No
If yes, name of file(s) that was updated: 
Why was the file updated? 
When was the file updated? 


METHODOLOGICAL INFORMATION

Description of methods used for collection/generation of data:

We raised 5 juvenile male zebra finches in their home cages where their father was the sole adult male and song model. To begin collecting song data, we isolated each juvenile in a sound box equipped with a microphone. We collected sound-triggered audio data at 44100 Hz with SAP (3 birds; Tchernichovski, 2000) or at 32000 Hz with EvTaf (2 birds; Tumer, 2007). We recorded until birds reached at least 94dph.

Methods for processing the data:

To extract the acoustic features of song syllables, we used a variational autoencoder, following the general procedures described in \cite{Goffinet.2021}. We performed this procedure separately for each bird in our dataset. In brief, we hand-tuned sound amplitude-based segmentation parameters to extract all individual sounds from the entire developmental audio record of each animal. We saved a spectrogram corresponding to each sound, manually tuning spectrogram floor and ceiling values to a range that captured variation in vocal sound intensity but excluded quiet background noise. These clipped syllable spectrograms were rescaled so all values fell in the interval [0,1]. For each bird, we designated 30,000 random sound spectrograms as the bird-specific VAE training dataset. For the three birds collected using SAP (Tchernichovski, 2000) we trained the autoencoder for 500 epochs; for the two birds collected with EvTAF (Tumer, 2007), we trained the autoencoder for 100 epochs after observing qualitatively successful reconstruction accuracy at that reduced training duration. Finally, the resulting trained VAEs were used to calculate a latent representation of every sound in each animal dataset, by calculating the mean of the latent variational posterior given by the trained VAE encoder.

After calculating a latent representation for every sound in the dataset, the latents were embedded in a 2-dimensional UMAP to visualize clusters. By investigating the underlying spectrograms of renditions in each cluster, we were able to assign meaningful category labels to different clusters. Some categories (like cage noise and call types) were discarded. We retained for further analysis only clear clusters corresponding to syllable types represented in the animal's crystallized endpoint song. Finally, we performed within-syllable principal components analysis to find the primary axes of variation exhibited by syllables over the course of development.

For each animal, we partitioned the entire collected repertoire of song syllables into a training set and an analysis set. We trained predicted age networks, using the the 32-dimensional latent vector describing each observed spectrogram as network input. The network minimized an error consisting of squared prediction error and a regularization term consisting of the sum of squares of the network weights. These terms were combined with weights given by Bayesian regularization (Foresee,1997) by training the network in Matlab with the training function parameter set to “trainbr.” We included regularization in order to “smooth” the predicted age function in acoustic space, to improve generalizability for a different set of experiments in which probe days of data were systematically withheld from training. The training data input to this procedure was automatically partitioned by Matlab into a 70/15/15 split corresponding to training, test, and validation sets respectively. Training iterated until performance on the test set failed to improve for three consecutive epochs, or until 30 minutes had elapsed. The validation subdivision was unused, as we used a previously withheld analysis partition for all subsequent analysis that used our trained models.

Instrument- or software-specific information needed to interpret the data: MATLAB is needed to open .mat files. Python (PyTorch) is needed to open .tar and .pt files, which store trained PyTorch models. The .tar vae models are trained VAEs using the architecture in Goffinet et al. 2021. The Gaussian models instantiate the 'CHOLESKY' class defined at https://github.com/SamuelBrudner/juvenile_syllable_analysis


People involved with sample collection, processing, analysis and/or submission: 
Samuel Brudner, John Pearson, Richard Mooney

DATA-SPECIFIC INFORMATION FOR: {birdID}_hatchdate.mat
Each file contains a matlab datetime variable "hatchdate" storing the hatchdate of bird in the filename.

DATA-SPECIFIC INFORMATION FOR: {birdID}_raw_wav.zip
This unzips to a directory structure containing the raw audio wave files from throughout development for each bird. The files are organized into subdirectories with numeric names, which pick out the animal age (in whole number days post hatch) at the time enclosed files were recorded. This directory structure, and the wav file naming convention, follow SAP (Tchernichovski, 2000) data collection convention.

DATA-SPECIFIC INFORMATION FOR: {birdID}_specs.zip
This unzips to a directory structure containing all segmented spectrograms. The files are organized into subdirectories indicating days post hatch, like the raw wav file directory structure. The files are HDF5 format, with 20 sounds saved per file. The files were generated from wav recordings using AVA (Goffinet et al., 2021).

DATA-SPECIFIC INFORMATION FOR: {birdID}_segs.zip
This unzips to a directory structure containing all detected segment onsets and offsets. The file directory structure and naming convention is linked to the raw wav files for each bird (ie, sound segments in wav file 75/filename.wav are stored in 75/filename.txt.) The files were generated from wav recordings using AVA (Goffinet et al., 2021).

DATA-SPECIFIC INFORMATION FOR: {birdID}_proj.zip
This unzips to a directory structure containing all VAE latent space representations of spectrograms. The file directory structure and naming convention is linked to the spectrogram files for each bird (ie, spectrogram 3 in file 75/filename.hdf in the specs.zip folder is represented by the third entry in 75/filename.hdf in the proj.zip folder.) The files were generated from wav recordings using AVA (Goffinet et al., 2021).

DATA-SPECIFIC INFORMATION FOR: grn394_table.mat

Number of variables: 16

Number of cases/rows: 429379

Variable List: 
bird: the ID assigned to the bird
hdf5specName, hdf5_index: the filename and index to data representing the syllable's associated spectrogram
age: the age (doh) at which the syllable was produced
dph: the integer age in days (floor of age)
type: the syllable type label of the rendition
datetime: the rendition time
file: the name of the audio file that includes this syllable rendition
duration: the duration of the syllable
latent: the 32D latent space mean of the VAE encoder representation of the syllable
embed: UMAP coordinates of the syllable
pca: the coordinates of the rendition in latent space principal components (calculated on a per syllable type basis)
pca_tsquared: Hotelling's t-squared statistic from syllable-type pca
partition: partition label for training vs analyzing predicted age network, Gaussian distribution models. Note "laser" label is a dummy label generated for use in another context on 10 randomly selected datapoints.
ffnn_predicted_age: the production age predicted by a feedforward neural net on the basis of the syllable's latent space coordinates


DATA-SPECIFIC INFORMATION FOR: grn395_table.mat

Number of variables: 16

Number of cases/rows: 215872

Variable List: 
bird: the ID assigned to the bird
hdf5specName, hdf5_index: the filename and index to data representing the syllable's associated spectrogram
age: the age (doh) at which the syllable was produced
dph: the integer age in days (floor of age)
type: the syllable type label of the rendition
datetime: the rendition time
file: the name of the audio file that includes this syllable rendition
duration: the duration of the syllable
latent: the 32D latent space mean of the VAE encoder representation of the syllable
embed: UMAP coordinates of the syllable
pca: the coordinates of the rendition in latent space principal components (calculated on a per syllable type basis)
pca_tsquared: Hotelling's t-squared statistic from syllable-type pca
partition: partition label for training vs analyzing predicted age network, Gaussian distribution models. Note "laser" label is a dummy label generated for use in another context on 10 randomly selected datapoints.
ffnn_predicted_age: the production age predicted by a feedforward neural net on the basis of the syllable's latent space coordinates



DATA-SPECIFIC INFORMATION FOR: grn397_table.mat

Number of variables: 16

Number of cases/rows: 257212

Variable List: 
bird: the ID assigned to the bird
hdf5specName, hdf5_index: the filename and index to data representing the syllable's associated spectrogram
age: the age (doh) at which the syllable was produced
dph: the integer age in days (floor of age)
type: the syllable type label of the rendition
datetime: the rendition time
file: the name of the audio file that includes this syllable rendition
duration: the duration of the syllable
latent: the 32D latent space mean of the VAE encoder representation of the syllable
embed: UMAP coordinates of the syllable
pca: the coordinates of the rendition in latent space principal components (calculated on a per syllable type basis)
pca_tsquared: Hotelling's t-squared statistic from syllable-type pca
partition: partition label for training vs analyzing predicted age network, Gaussian distribution models. Note "laser" label is a dummy label generated for use in another context on 10 randomly selected datapoints.
ffnn_predicted_age: the production age predicted by a feedforward neural net on the basis of the syllable's latent space coordinates


DATA-SPECIFIC INFORMATION FOR: grn475_table.mat

Number of variables: 16

Number of cases/rows: 97740

Variable List: 
bird: the ID assigned to the bird
hdf5specName, hdf5_index: the filename and index to data representing the syllable's associated spectrogram
age: the age (doh) at which the syllable was produced
dph: the integer age in days (floor of age)
type: the syllable type label of the rendition
datetime: the rendition time
file: the name of the audio file that includes this syllable rendition
duration: the duration of the syllable
latent: the 32D latent space mean of the VAE encoder representation of the syllable
embed: UMAP coordinates of the syllable
pca: the coordinates of the rendition in latent space principal components (calculated on a per syllable type basis)
pca_tsquared: Hotelling's t-squared statistic from syllable-type pca
partition: partition label for training vs analyzing predicted age network, Gaussian distribution models. Note "laser" label is a dummy label generated for use in another context on 50 randomly selected datapoints.
ffnn_predicted_age: the production age predicted by a feedforward neural net on the basis of the syllable's latent space coordinates


DATA-SPECIFIC INFORMATION FOR: sil469_table.mat

Number of variables: 16

Number of cases/rows: 97740

Variable List: 
bird: the ID assigned to the bird
hdf5specName, hdf5_index: the filename and index to data representing the syllable's associated spectrogram
age: the age (doh) at which the syllable was produced
dph: the integer age in days (floor of age)
type: the syllable type label of the rendition
datetime: the rendition time
file: the name of the audio file that includes this syllable rendition
duration: the duration of the syllable
latent: the 32D latent space mean of the VAE encoder representation of the syllable
embed: UMAP coordinates of the syllable
pca: the coordinates of the rendition in latent space principal components (calculated on a per syllable type basis)
pca_tsquared: Hotelling's t-squared statistic from syllable-type pca
partition: partition label for training vs analyzing predicted age network, Gaussian distribution models.
ffnn_predicted_age: the production age predicted by a feedforward neural net on the basis of the syllable's latent space coordinates


DATA-SPECIFIC INFORMATION FOR: trained_vaes/grn394_vae.tar,grn395_vae.tar,grn397_vae.tar,grn475_vae.tar,sil469_vae.tar
Bird-specific trained variational auto encoders that convert 258x258 spectrogram image representations of syllables into posterior distributions in 32-dimensional latent space, according to Goffinet et al., 2021

DATA-SPECIFIC INFORMATION FOR: gauss_models/{birdID}_{syllType}_checkpoint_neg_loss=-{val}.pt
Syllable-specific trained neural network that takes normalize age as input and returns Gaussian distributions in latent space corresponding to the likely locations of syllable renditions at that age.

DATA-SPECIFIC INFORMATION FOR: gauss_models_64/{birdID}_{syllType}_checkpoint_neg_loss=-{val}.pt
Syllable-specific trained neural network that takes normalize age as input and returns Gaussian distributions in latent space corresponding to the likely locations of syllable renditions at that age. Uses fewer nodes per layer than the models in the gauss_models directory.

DATA-SPECIFIC INFORMATION FOR: gauss_models_layers/{birdID}_{syllType}_checkpoint_neg_loss=-{val}.pt
Syllable-specific trained neural network that takes normalize age as input and returns Gaussian distributions in latent space corresponding to the likely locations of syllable renditions at that age. Shallower networks than the models in the gauss_models directory.


DATA-SPECIFIC INFORMATION FOR: predicted_age_models/{birdID}_ffnn_table.mat
Variable List:
bird: the ID assigned to the bird
type: the syllable type being modeled
ffnn_l2a: a neural network that predicts age from location in latent space
ffnn_info_l2a: a struct containing information about the training protocol, and about model improvement during training

DATA-SPECIFIC INFORMATION FOR: predicted_age_models_fewerLayers/{birdID}_ffnn_table.mat
Variable List:
bird: the ID assigned to the bird
type: the syllable type being modeled
ffnn_l2a: a neural network that predicts age from location in latent space using fewer network layers than the models in 'predicted_age_models'
ffnn_info_l2a: a struct containing information about the training protocol, and about model improvement during training