7–11 Oct 2024
University of Nova Gorica, Lanthieri mansion, Vipava, Slovenia
Europe/Ljubljana timezone

Automatic feature selection and weighting: tree biodiversity estimators explained by other variables

8 Oct 2024, 11:30
15m
University of Nova Gorica, Lanthieri mansion, Vipava, Slovenia

University of Nova Gorica, Lanthieri mansion, Vipava, Slovenia

Oral presentation Contributing talks

Speaker

Romina Wild (Scuola Internazionale Superiore di Studi Avanzati (SISSA) Trieste)

Description

In any large data base, most of the features defining a data point are redundant, irrelevant, or affected by large noise, and have to be discarded. To do this, one needs to answer: What is the best dimensionality of a reduced feature space in order to retain maximum information? How can one correct for different units of measure? What is the optimal scaling of importance between features? We use a statistical method, Information Imbalance, to select the most informative feature sets among many possible ones. In an example from the Amazon rainforest, we find sets of biotic and abiotic features to predict tree biodiversity and species richness, and compare common biodiversity estimators for their information content. The differentiable version of this statistic can automatically weight features relative to each other, accounting for units of measure and importance. Other use cases include variable selection in molecular dynamics simulations, clinical data sets and for neural network potentials.

Primary authors

Romina Wild (Scuola Internazionale Superiore di Studi Avanzati (SISSA) Trieste) Prof. Alessandro Laio (Scuola Internazionale Superiore di Studi Avanzati (SISSA) Trieste)

Presentation materials