01.12.2023, 15:00 (3:00 pm) CET
Jaouad Mourtada: Finite-sample performance of the maximum likelihood estimator in logistic regression
The logistic model is a classical linear model to describe the probabilistic dependence of binary responses to multivariate features. We consider the predictive performance of the maximum likelihood estimator (MLE) for logistic regression, assessed in terms of the logistic loss of its probabilistic forecasts. We consider two questions: first, that of the existence of the MLE (which occurs when the data is not linearly separated), and second that of its accuracy when it exists. These properties depend on both the dimension of covariates and on the signal strength.
In the case of Gaussian covariates and a well-specified logistic model, we obtain sharp non-asymptotic guarantees for the existence and excess prediction risk of the MLE. This complements asymptotic results of Sur and Candès, and refines non-asymptotic upper bounds of Ostrovskii and Bach and Chinot, Lecué and Lerasle. It also complements independent recent results by Kuchelmeister and van de Geer. We then extend these results in two directions: first, to non-Gaussian covariates satisfying a certain regularity condition, and second to the case of a misspecified logistic model.
Meeting ID: 641 5666 0227
12.01.2023, 15:00 (3:00 pm) CET
Tudor Manole: [To be announced]
[Date to be determined]
Kengo Kato: [To be announced]
03.11.2023, 15:00 (3:00 pm) CET
Sven Wang: On polynomial-time mixing for high-dimensional MCMC in non-linear regression models
We consider the problem of generating random samples from high-dimensional posterior distributions. We will discuss both (i) conditions under which diffusion-based MCMC algorithms mix in polynomial-time (based on https://arxiv.org/pdf/2009.05298.pdf) as well as (ii) situations in which `cold-start‘ MCMC suffers from an exponentially long mixing time (based on https://arxiv.org/pdf/2209.02001.pdf). We will focus on the setting of non-linear inverse regression models. Our positive results on polynomial-time mixing are derived under local `gradient stability‘ assumptions on the forward map, which can be verified for a range of well-known non-linear inverse problems involving elliptic PDE, as well as under the assumption that a good initializer is available. Our negative results on exponentially long mixing times hold for `cold-start‘ MCMC. We show that there exist non-linear regression models in which the posterior distribution is unimodal, but there exists a so-called `free entropy barrier‘, which local Markov chains take an exponentially long time to traverse.
14.07.2023, 15:00 (3:00 pm) CET
Iain Johnstone: Expectation propagation in mixed models
Matt Wand and colleagues have recently adapted the machine learning technique of expectation propagation (EP) to yield state of the art estimation of parameters in generalized linear mixed models. We review this work before asking: are the EP estimators asymptotically efficient? The problem becomes one of defining an appropriate objective function that captures the EP iteration and approximates maximum likelihood well enough to inherit its efficiency. This is joint work with a group including the late Peter Hall, Matt Wand, and Song Mei.
09.06.2023, 15:00 (3:00 pm) CET
Stanislav Volgushev: Structure learning for extremal graphical models
Extremal graphical models are sparse statistical models for multivariate extreme events. The underlying graph encodes conditional independencies and enables a visual interpretation of the complex extremal dependence structure. In this talk we present a data-driven methodology to learn the underlying graph. For tree models and general extreme-value distributions, we show that the tree can be learned in a completely non-parametric fashion. For the specific class of Hüsler-Reiss distributions, we discuss methodologies for estimating general graphs. Conditions that ensure consistent graph recovery in growing dimensions are provided.
05.05.2023, 15:00 (3:00 pm) CET
Nina Dörnemann: Linear spectral statistics of sequential sample covariance matrices
In this talk, we revisit the investigation of linear eigenvalue statistics of sample covariance matrices in high dimensions. Such statistics are frequently used to construct tests for various hypotheses on large covariance matrices. In the meanwhile classical work of Bai and Silverstein (2004), the authors establish a central limit theorem for the linear spectral statistics of sample covariance matrices, which has been generalized in various follow-up works.
In contrast to previous results, we will take a different point of view on linear spectral statistics and study these objects from a sequential perspective. More precisely, we will introduce the sequential sample covariance matrix, which admits a process of eigenvalue statistics. Our interest in such objects is partially motivated by change-point problems in statistics. In our work, we establish the weak convergence of this process of spectral statistics towards a non-standard Gaussian process.
In the final part of this talk, we will discuss a procedure to monitor the sphericity assumption on high dimensional covariance matrices.
This talk is based on a joint work with Holger Dette.
12.01.2023, 16:00 (4:00 pm) CET
Randolf Altmeyer: Polynomial time guarantees for sampling based posterior inference
The Bayesian approach provides a flexible and popular framework for a wide range of nonparametric inference problems. It relies crucially on computing functionals with respect to the posterior distribution. Examples are the posterior mean or posterior quantiles for uncertainty quantification. In practice, this requires sampling from the posterior distribution using numerical algorithms, e.g., Markov chain Monte Carlo (MCMC) methods. The runtime of these algorithms to achieve a given target precision will typically, at least without additional structural assumptions, scale exponentially in the model dimension and the sample size. In contrast, in this talk we show that sampling based posterior inference in a general high-dimensional setup is indeed feasible. Given a sufficiently good initialiser, we present polynomial-time convergence guarantees for a widely used gradient based MCMC sampling scheme. The proof exploits the local curvature induced by the Fisher-information of the statistical model near the underlying truth, and relies on results from the non-linear inverse problem literature. We will discuss applications to logistic and Gaussian regression, as well as to density estimation.
01.12.2022, 16:00 (4:00 pm) CET
Edgar Dobriban: T-Cal: An optimal test for the calibration of predictive models
The prediction accuracy of machine learning methods is steadily increasing, but the calibration of their uncertainty predictions poses a significant challenge. Numerous works focus on obtaining well-calibrated predictive models, but less is known about reliably assessing model calibration. This limits our ability to know when algorithms for improving calibration have a real effect, and when their improvements are merely artifacts due to random noise in finite datasets. In this work, we consider detecting mis-calibration of predictive models using a finite validation dataset as a hypothesis testing problem. The null hypothesis is that the predictive model is calibrated, while the alternative hypothesis is that the deviation from calibration is sufficiently large.
We find that detecting mis-calibration is only possible when the conditional probabilities of the classes are sufficiently smooth functions of the predictions. When the conditional class probabilities are Hölder continuous, we propose T-Cal, a minimax optimal test for calibration based on a debiased plug-in estimator of the ℓ2-Expected Calibration Error (ECE). We further propose Adaptive T-Cal, a version that is adaptive to unknown smoothness. We verify our theoretical findings with a broad range of experiments, including with several popular deep neural net architectures and several standard post-hoc calibration methods. T-Cal is a practical general-purpose tool, which — combined with classical tests for discrete-valued predictors — can be used to test the calibration of essentially any probabilistic classification method.
03.11.2022, 16:00 (4:00 pm) MEZ
Mona Azadkia: A Fast Non-parametric Approach for Local Causal Structure Learning
Abstract: In this talk, we introduce a non-parametric approach to the problem of causal structure learning with essentially no assumptions on functional relationships and noise. We develop DAG-FOCI, a computationally fast algorithm for this setting that is based on the FOCI variable selection algorithm in Azadkia-Chatterjee-2021. DAG-FOCI outputs the set of parents of a response variable of interest. We provide theoretical guarantees of our procedure when the underlying graph does not contain any (undirected) cycle containing the response variable of interest. Furthermore, in the absence of this assumption, we give a conservative guarantee against false positive causal claims when the set of parents is identifiable.
07.07.2022, 16:00 Uhr MEZ
Mathias Drton: Identification and Estimation of Graphical Continuous Lyapunov Models
Abstract: Graphical continuous Lyapunov models offer a new perspective on modeling causally interpretable dependence structure in multivariate data by treating each independent observation as a one-time cross-sectional snapshot of a temporal process. Specifically, the models consider multivariate Ornstein-Uhlenbeck processes in equilibrium. This setup leads to Gaussian models in which the covariance matrix is determined by the continuous Lyapunov equation. In this setting, each graphical model assumes a sparse drift matrix with support determined by a directed graph. The talk will discuss identifiability of such sparse drift matrices as well as their regularized estimation.
02.06.202, 16:00 Uhr MEZ
Arnak Dalalyan: Estimating the matching map between two sets of high-dimensional, noisy and corrupted features
Abstract: In this talk, I will present some recent results on finding the matching map between subsets of two sets of size n consisting of d-dimensional noisy feature vectors. The main result shows that, if the signal-to-noise ratio of the feature vectors is of order at least d¼, then it is possible to recover the true matching map exactly with a high probability. A notable feature of this result is that it does not assume the knowledge of the number of feature-vectors in the first set that have their pairs in the second set. We also show that the rate d¼ can not be improved by other procedure. When the number k of matching pairs is known, this rate is achieved by the minimizer of the sum sum squares of distances between matched pairs of feature-vectors. We show how this estimator can be extended to the setting of unknown k. In addition, we show that the resulting optimization problem can be formulated as a minimum-cost flow problem, and thus solved efficiently, with complexity O(k½ n2).
Finally, we will report the result of numerical experiments illustrating our theoretical findings.
Download slides: [hier]
05.05.2022, 16:00 Uhr MEZ
Nicolas Verzelen: Some recent results on graph and point clustering
Abstract: In this presentation, we consider two prototypical unsupervised learning problems (i) clustering nodes from a graph sampled from a Stochastic Block Model and (ii) clustering points sampled from a Gaussian Mixture Model. In these two models, the statistician aims at estimating an hidden partition (of nodes or points) from the data. I will first introduce suitable notions of distances between the groups in each model. Then, I will survey recent results on the minimal separation distance between the cluster so that a procedure is able to recover the partition with high probability. This will be mostly done through the prism of the K-means criterion and its convex relaxations. Interestingly, these clustering problems seem to exhibit a computational-statistical trade-off: known polynomial-time procedures are shown to recover the hidden partitions under stronger separation conditions than minimax (but exponential time) one, at least when the number of groups is large. Partial computational lower bounds that support the existence of this gap will be discussed at the end of the talk.
Download slides: [hier]