During the last years, the data situation has fundamentally changed: massive amounts of data are routinely collected and stored. Opposite to this trend, storage or privacy constraints require even clean raw data to be preprocessed, and its massiveness typically leads to computational intractability of subsequently applied classical efficient statistical methodology. Usually, preprocessed data do not share the distributional properties of the raw data any longer. Moreover, statistically efficient data preprocessing typically depends on the given task of subsequent statistical inference. Hence, both processing steps are inseparably linked. New concepts have to be developed which guarantee validity and efficiency on potentially preprocessed data sets while being computationally tractable at the same time for massive data. Starting with the following five projects, the aim is to provide exactly this conjoint development.
- Computationally tractable bootstrap for high-dimensional data (Holger Dette and Angelika Rohde)
- Optimal actions and stopping in sequential learning (Alexandra Carpentier and Markus Reiß)
- Supersmooth functional data analysis and PCA-preprocessing (Moritz Jirak and Alexander Meister)
- Sublinear time methods with statistical guarantees (Holger Dette and Axel Munk)
- Classification – Preprocessed and high-dimensional data sets (Angelika Rohde and Lukas Steinberger)