Computationally tractable bootstrap for high-dimensional data

Angelika Rohde

Holger Dette and Angelika Rohde

Bootstrapping is the classical approach for distributional approximation of estimators and test statistics when an asymptotic distribution contains unknown quantities or provides a poor approximation quality. For the analysis of massive data, however, the bootstrap is computationally intractable in its basic sampling-with-replacement version. Moreover, it is even not valid in some important high-dimensional applications. Combining subsampling of observations with suitable selection of their coordinates, a new and computationally tractable bootstrap algorithm especially for high-dimensional massive data sets is proposed in this project. Its performance is studied for statistics of high-dimensional sample covariance matrices, namely linear spectral statistics and PCA-preprocessed statistics, where the common sampling-with-replacement bootstrap fails.