Projects

During the last years, the data situation has fundamentally changed: massive amounts of data are routinely collected and stored. Opposite to this trend, storage or privacy constraints require even clean raw data to be preprocessed, and its massiveness typically leads to computational intractability of subsequently applied classical efficient statistical methodology. Usually, preprocessed data do not share the distributional properties of the raw data any longer. Moreover, statistically efficient data preprocessing typically depends on the given task of subsequent statistical inference. Hence, both processing steps are inseparably linked. New concepts have to be developed which guarantee validity and efficiency on potentially preprocessed data sets while being computationally tractable at the same time for massive data. Starting with the following five projects, the aim is to provide exactly this conjoint development.