Optimal actions and stopping in sequential learning
Alexandra Carpentier and Markus Reiß
One of the central models in mathematical statistics is the Gaussian linear model in which the least squares estimator (LSE) is efficient. For the nowadays common massive data sets, however, the LSE is computationally expensive. The seemingly attractive alternative of stopping an iterative numerical algorithm for its computation as soon as the approximation quality reaches the level of statistical resolution seriously fails at the missing knowledge of both of these quantities. In the general framework of sequential learning, this project develops methodology which covers this problem as a prominent particular case. Sequential learning copes with data or information that is not available at once, but comes in at different times. Two widespread situations of this kind are studied:
(a) iterative numerical algorithms that provide a stream of estimators and optimal regularization as well as feasible computational cost are achieved by stopping the algorithm early, and
(b) when data comes in, the data flow may be actively steered in order to learn or estimate the unknowns near-optimally.