Contains model-based treatment of missing data for regression models with missing values in covariates or the dependent variable using maximum likelihood or Bayesian estimation (Ibrahim et al., 2005; <doi:10.1198/016214504000001844>; Luedtke, Robitzsch, & West, 2020a, 2020b; <doi:10.1080/00273171.2019.1640104><doi:10.1037/met0000233>). The regression model can be nonlinear (e.g., interaction effects, quadratic effects or B-spline functions). Multilevel models with missing data in predictors are available for Bayesian estimation. Substantive-model compatible multiple imputation can be also conducted.


  • The maximum likelihood estimation of regression models with missing values in covariates is implemented in frm_em. Available regression models are linear regression, logistic regression, ordinal probit regression and models with Box-Cox or Yeo-Johnson transformed normally distributed outcomes. The factorization based regression model also allow the inclusion of latent variables and measurement error prone covariates.

  • Bayesian estimation and multiple imputation of regression models with missing values in covariates is implemented in frm_fb. The same regression models like in frm_em can be specified. Moreover, multilevel models can also be specified with Bayesian estimation. The function frm_fb allows substantive model compatible multiple imputation.


Ibrahim, J. G., Chen, M. H., Lipsitz, S. R., & Herring, A. H. (2005). Missing-data methods for generalized linear models: A comparative review. Journal of the American Statistical Association, 100, 332-346.

Luedtke, O., Robitzsch, A., & West, S. (2020a). Analysis of interactions and nonlinear effects with missing data: A factored regression modeling approach using maximum likelihood estimation. Multivariate Behavioral Research, 55(3), 361-381. doi: 10.1080/00273171.2019.1640104

Luedtke, O., Robitzsch, A., & West, S. (2020b). Regression models involving nonlinear effects with missing data: A sequential modeling approach using Bayesian estimation. Psychological Methods, 25(2), 157-181. doi: 10.1037/met0000233

See also

The EM algorithm for the multivariate normal model is implemented in norm2::emNorm in the norm2 package. A corresponding MCMC algorithm can be found in the norm2::mcmcNorm function.

See the lavaan, OpenMx or sem packages for full information maximum likelihood approaches for handling missing data for multivariate normal distributions, linear regression models, and, more generally, structural equation modeling with missing data.

Structural equation models with missing data can be also estimated with a two-stage procedure. In a first stage, a mean vector and a covariance matrix is estimated (possibly with auxiliary variables) and in the second stage, the structural equation model is estimated on the previously obtained mean vector and covariance matrix. The procedure is implemented in the semTools::twostage function in the semTools package.


  ##  |\  /||~~\ |\  /||~~\
  ##  | \/ ||   || \/ ||--<
  ##  |    ||__/ |    ||__/

  ##  > library(mdmb)
  ##  * mdmb 0.0-13 (2017-01-15)