Academic

News


Filter by
Jump to
Search

SCALABLE BAYESIAN INFERENCE FOR LARGE CROSSED MIXED EFFECTS MODELS

Ms. Zhang XinyuDepartment of Statistics and Data Science, NUS

Date:8 February 2024, Thursday

Location:S16-07-107

Time:10-11 am, Singapore

Large crossed mixed effects models with imbalanced structures and missing data pose major computational challenges for standard Bayesian posterior sampling algorithms, as the computational complexity is usually superlinear in the number of observations. We propose a class of efficient subset-based stochastic gradient MCMC algorithms for such crossed mixed effects models, which facilitate scalable inference on both the variance components and the regression coefficients.

Our first contribution is to devise novel algorithms for the crossed mixed effects models with the identity link for continuous response variables. The first algorithm is developed for balanced design without missing observations, where we leverage the closed-form expression of the precision matrix for the full data matrix. The second algorithm, which we call the pigeonhole stochastic gradient Langevin dynamics (PSGLD), is developed for both balanced and unbalanced designs with a potentially large proportion of missing observations. Our PSGLD algorithm imputes the latent crossed random effects by running short Markov chains and then samples the model parameters of variance components and regression coefficients at each MCMC iteration. We provide theoretical guarantees by showing the convergence of the output distribution from the proposed algorithm to the target non-log-concave posterior distribution.

Our second main contribution is to extend the PSGLD algorithms to generalized crossed mixed effects models with probit and logistic links for binary and categorical data. For crossed mixed effects models with probit links, we use the data augmentation technique in Gibbs sampler and treat the subset-based random effects and auxiliary variables as the latent variables of PSGLD. For logistic regression models with crossed mixed effects, we appeal to the Plya-Gamma sampler to generate the random effects and the Plya-Gamma auxiliary variables but only deem the random effects as the latent variables. A variety of numerical experiments based on both synthetic and real data demonstrate that the proposed algorithms can significantly reduce the computational cost of the standard MCMC algorithms and better balance the approximation accuracy and computational efficiency.