Classification – Preprocessed and high-dimensional data sets
Angelika Rohde and Lukas Steinberger
Preprocessing and consequences of high-dimensional features are studied for classification. First, comprehensive theory is developed to analyze and adjust for the adverse effects of imbalanced data classification, revealing a new approach of data reduction preprocessing within the majority class. Next, optimal methods for anonymized data release to protect the privacy of individual data providers are designed and studied, building on the notion of differential privacy. Finally, the performance of classifiers is investigated when features are high-dimensional, explicitly including the numerical approximations to practically implement those classifiers.