Preprocessed and high-dimensional data sets in discriminant analysis and classification

Prof. Dr. Angelika Rohde
Angelika Rohde
Sophie Langer
Ass. Prof. Lukas Steinberger
Lukas Steinberger

Preprocessed and high-dimensional data sets in distriminant analysis and classification
Angelika Rohde, Sophie Langer and Lukas Steinberger

This project studies the impact of data preprocessing and high-dimensional feature vectors on classification, regression and linear discriminant analysis, two prevalent challenges in modern data science. Privacy-preserving preprocessing can weaken the influence of features on outcomes, while high-dimensional data often includes many weak predictors. We develop statistical methods suited for such settings, focusing on computationally efficient learning algorithms with provable convergence rates. Specifically, we focus on semiparametric binary regression and analyze (stochastic) gradient descent for (penalized) empirical risk minimization. Our analysis progresses from foundational high-dimensional linear and additive models to more complex neural network architectures.