Data from: Statistical significance for maximally persistent topological features via the Gumbel distribution

Public

  • Topological data analysis (TDA) is finding traction as a novel way to discover and quantify structure in data. While there has been great success in descriptive characterizations, a rigorous statistical framework for the field is still in development. Here we look at a commonly used metric -- the length of the maximally persistent feature in a point cloud -- and develop a framework for hypothesis testing. Because the distribution of persistence lengths in Poisson spatial point clouds is well-aligned with the probabilistic theory of extreme values, we argue that critical values of the Gumbel distribution should be used when assessing statistical significance. For one-dimensional topological features (holes) in two-dimensional point clouds, we use the theory to predict an asymptotic rescaling of maximally persistent features that results in convergence in distribution to an approximately standard Gumbel random variable. We then propose a model for critical values as a function of point density and demonstrate its effectiveness on some standard TDA challenges. ... [Read More]

Total Size
3 files (1.01 GB)
Data Citation
  • Ciocanel, M. V. & McKinley, S. (2022). Data from: Statistical significance for maximally persistent topological features via the Gumbel distribution. Duke Research Data Repository. https://doi.org/10.7924/r48k7ft9j
DOI
  • 10.7924/r48k7ft9j
Publication Date
ARK
  • ark:/87924/r48k7ft9j
Affiliation
Language
Type
Format
Title
  • Data from: Statistical significance for maximally persistent topological features via the Gumbel distribution
This Dataset
Usage Stats