Multilevel Imputation Using lme4

This function is a general imputation function based on the linear mixed effects model as implemented in lme4::lmer. The imputation model can be hierarchical or non-hierarchical and can be written in a general form \(\bold{y}=\bold{X} \bold{\beta} + \sum_{v=1}^V \bold{Z}_v \bold{u}_v\) for \(V\) multivariate random effects. While predictors can be selected by specifying the rows in the predictor matrix in mice::mice (i.e., modification of type), the level of random effects can be specified with levels_id and random slopes can be selected with random_slopes.

The function mice.impute.ml.lmer allows the imputation of variables at arbitrary levels. The corresponding level can be specified with levels_id. All predictor variables are aggregated to the corresponding level of the variable to be imputed.

Several strategies for the specification of the design matrix \(\bold{X}\) are accommodated. By default, predictors at a lower level are automatically aggregated to the higher level and included as further predictors to maintain the multilevel structure in the data (Grund, Luedtke & Robitzsch, 2018; Enders, Mistler & Keller, 2016; argument aggregate_automatically=TRUE). Further, interactions and quadratic effects can be defined by respective arguments interactions and quadratics. The dimension of the matrix of predictors can be reduced by applying partial least squares regression, see mice.impute.pls.

The function now only allows continuous data (model="continuous"), ordinal data (model="pmm") or binary data (model="pmm" or model="binary"). Nominal variables with missing values cannot (yet) be handled.

mice.impute.ml.lmer(y, ry, x, type, levels_id, variables_levels=NULL,
    random_slopes=NULL, aggregate_automatically=TRUE, intercept=TRUE,
    groupcenter.slope=FALSE, draw.fixed=TRUE, random.effects.shrinkage=1e-06,
    glmer.warnings=TRUE, model="continuous", donors=3, match_sampled_pars=FALSE,
    blme_use=FALSE, blme_args=NULL, pls.facs=0, interactions=NULL,
    quadratics=NULL, min.int.cor=0, min.all.cor=0, pls.print.progress=FALSE,
    group_index=NULL, iter_re=0, ...)

Arguments

y: Incomplete data vector of length n
ry: Vector of missing data pattern (FALSE -- missing, TRUE -- observed)
x: Matrix (n \(\times\) p) of complete predictors.
type: Predictor variables associated with fixed effects.
levels_id: Specification of the level identifiers (see Examples)
variables_levels: Specification of the level of variables (see Examples)
random_slopes: Specification of random slopes (see Examples)
aggregate_automatically: Logical indicating whether aggregated effects at higher levels are automatically included.
intercept: Optional logical indicating whether the intercept should be included.
groupcenter.slope: Optional logical indicating whether covariates should be centered around group means
draw.fixed: Optional logical indicating whether fixed effects parameter should be randomly drawn
random.effects.shrinkage: Shrinkage parameter for stabilizing the covariance matrix of random effects
glmer.warnings: Optional logical indicating whether warnings from glmer should be displayed
model: Type of model. Can be "continuous" for normally distributed data, "binary" for dichotomous data specifying a logistic mixed effects model and "pmm" for predictive mean matching.
donors: Number of donors used for predictive mean matching
match_sampled_pars: Logical indicating whether values of nearest neighbors should also be sampled in pmm imputation.
blme_use: Logical indicating whether the blme package should be used.
blme_args: (Prior) Arguments for blme, see blme::blmer and blme::bmerDist-class.
pls.facs: Number of factors used in PLS dimension reduction
interactions: Specification of predictors with interaction effects
quadratics: Specification of predictors with quadratic effects
min.int.cor: Minimum absolute value of correlation with outcome for interaction effects to be retained
min.all.cor: Minimum absolute value of correlation with outcome for predictors to be retained
pls.print.progress: Logical indicating whether progress of algorithm should be displayed
group_index: Optional vector for group identifiers (internally used in mice.impute.bygroup
iter_re: Number of iterations for sampling random effects in random intercept models for continuous outcomes. Using iter_re>0 is necessary for cross-classified models with not fully balanced designs.
...: Further arguments to be passed

Value

Vector of imputed values

References

Enders, C. K., Mistler, S. A., & Keller, B. T. (2016). Multilevel multiple imputation: A review and evaluation of joint modeling and chained equations imputation. Psychological Methods, 21(2), 222-240. doi:10.1037/met0000063

Grund, S., Luedtke, O., & Robitzsch, A. (2018). Multiple imputation of multilevel data in organizational research. Organizational Research Methods, 21(1), 111-149. doi:10.1177/1094428117703686

Examples

if (FALSE) {
#############################################################################
# EXAMPLE 1: Imputation of three-level data with normally distributed residuals
#############################################################################

data(data.ma07, package="miceadds")
dat <- data.ma07

# variables at level 1 (identifier id1): x1 (some missings), x2 (complete)
# variables at level 2 (identifier id2): y1 (some missings), y2 (complete)
# variables at level 3 (identifier id3): z1 (some missings), z2 (complete)

#****************************************************************************
# Imputation model 1

#----- specify levels of variables (only relevent for variables
#      with missing values)
variables_levels <- miceadds:::mice_imputation_create_type_vector( colnames(dat), value="")
 # leave variables at lowest level blank (i.e., "")
variables_levels[ c("y1","y2") ] <- "id2"
variables_levels[ c("z1","z2") ] <- "id3"

#----- specify predictor matrix
predmat <- mice::make.predictorMatrix(data=dat)
predmat[, c("id2", "id3") ] <- 0
# set -2 for cluster identifier for level 3 variable z1
# because "2lonly" function is used
predmat[ "z1", "id3" ] <- -2

#----- specify imputation methods
method <- mice::make.method(data=dat)
method[c("x1","y1")] <- "ml.lmer"
method[c("z1")] <- "2lonly.norm"

#----- specify hierarchical structure of imputation models
levels_id <- list()
#** hierarchical structure for variable x1
levels_id[["x1"]] <- c("id2", "id3")
#** hierarchical structure for variable y1
levels_id[["y1"]] <- c("id3")

#----- specify random slopes
random_slopes <- list()
#** random slopes for variable x1
random_slopes[["x1"]] <- list( "id2"=c("x2"), "id3"=c("y1") )
# if no random slopes should be specified, the corresponding entry can be left empty
# and only a random intercept is used in the imputation model

#----- imputation in mice
imp1 <- mice::mice( dat, maxit=10, m=5, method=method,
            predictorMatrix=predmat, levels_id=levels_id,  random_slopes=random_slopes,
            variables_levels=variables_levels )
summary(imp1)

#****************************************************************************
# Imputation model 2

#----- impute x1 with predictive mean matching and y1 with normally distributed residuals
model <- list(x1="pmm", y1="continuous")

#----- assume only random intercepts
random_slopes <- NULL

#---- create interactions with z2 for all predictors in imputation models for x1 and y1
interactions <- list("x1"="z2", "y1"="z2")

#----- imputation in mice
imp2 <- mice::mice( dat, method=method, predictorMatrix=predmat,
                levels_id=levels_id, random_slopes=random_slopes,
                variables_levels=variables_levels, model=model, interactions=interactions)
summary(imp2)
}

Arguments

Value

References

See also

Examples