Structured Latent Class Analysis (SLCA)

This function implements a structured latent class model for polytomous item responses (Formann, 1985, 1992). Lasso estimation for the item parameters is included (Chen, Liu, Xu & Ying, 2015; Chen, Li, Liu & Ying, 2017; Sun, Chen, Liu, Ying & Xin, 2016).

slca(data, group=NULL, weights=rep(1, nrow(data)), Xdes,
  Xlambda.init=NULL, Xlambda.fixed=NULL, Xlambda.constr.V=NULL,
  Xlambda.constr.c=NULL,  delta.designmatrix=NULL,
  delta.init=NULL, delta.fixed=NULL, delta.linkfct="log",
  Xlambda_positive=NULL, regular_type="lasso", regular_lam=0, regular_w=NULL,
  regular_n=nrow(data), maxiter=1000, conv=1e-5, globconv=1e-5, msteps=10,
  convM=5e-04, decrease.increments=FALSE, oldfac=0, dampening_factor=1.01,
  seed=NULL, progress=TRUE, PEM=TRUE, PEM_itermax=maxiter, ...)

# S3 method for slca
summary(object, file=NULL, ...)

# S3 method for slca
print(x, ...)

# S3 method for slca
plot(x, group=1, ... )

Arguments

data: Matrix of polytomous item responses
group: Optional vector of group identifiers. For plot.slca it is a single integer group identified.
weights: Optional vector of sample weights
Xdes: Design matrix for $x_{ijh}$ with $ q_{ihjv}$ entries. Therefore, it must be an array with four dimensions referring to items ($i$), categories ($h$), latent classes ($j$) and $\lambda$ parameters ($v$).
Xlambda.init: Initial $\lambda_x$ parameters
Xlambda.fixed: Fixed $\lambda_x$ parameters. These must be provided by a matrix with two columns: 1st column -- Parameter index, 2nd column: Fixed value.
Xlambda.constr.V: A design matrix for linear restrictions of the form $V_x \lambda_x=c_x$ for the $\lambda_x$ parameter.
Xlambda.constr.c: A vector for the linear restriction $V_x \lambda_x=c_x$ of the $\lambda_x$ parameter.
delta.designmatrix: Design matrix for delta parameters $\delta$ parameterizing the latent class distribution by log-linear smoothing (Xu & von Davier, 2008)
delta.init: Initial $\delta$ parameters
delta.fixed: Fixed $\delta$ parameters. This must be a matrix with three columns: 1st column: Parameter index, 2nd column: Group index, 3rd column: Fixed value
delta.linkfct: Link function for skill space reduction. This can be the log-linear link (log) or the logistic link function (logit).
Xlambda_positive: Optional vector of logical indicating which elements of $\bold{\lambda}_x$ should be constrained to be positive.
regular_type: Regularization method which can be lasso, scad or mcp. See gdina for more information and references.
regular_lam: Numeric. Regularization parameter
regular_w: Vector for weighting the regularization penalty
regular_n: Vector of regularization factor. This will be typically the sample size.
maxiter: Maximum number of iterations
conv: Convergence criterion for item parameters and distribution parameters
globconv: Global deviance convergence criterion
msteps: Maximum number of M steps in estimating $b$ and $a$ item parameters. The default is to use 4 M steps.
convM: Convergence criterion in M step
decrease.increments: Should in the M step the increments of $a$ and $b$ parameters decrease during iterations? The default is FALSE. If there is an increase in deviance during estimation, setting decrease.increments to TRUE is recommended.
oldfac: Factor $f$ between 0 and 1 to control convergence behavior. If $x_t$ denotes the estimated parameter in iteration $t$, then the regularized estimate $x_t^{\ast}$ is obtained by $x_t^{\ast}=f x_{t-1} + (1-f) x_t$. Therefore, values of oldfac near to one only allow for small changes in estimated parameters from in succeeding iterations.
dampening_factor: Factor larger than one defining the specified decrease in decrements in iterations.
seed: Simulation seed for initial parameters. The default of NULL corresponds to a random seed.
progress: An optional logical indicating whether the function should print the progress of iteration in the estimation process.
PEM: Logical indicating whether the P-EM acceleration should be applied (Berlinet & Roland, 2012).
PEM_itermax: Number of iterations in which the P-EM method should be applied.
object: A required object of class slca
file: Optional file name for a file in which summary should be sinked.
x: A required object of class slca
...: Optional parameters to be passed to or from other methods will be ignored.

Details

The structured latent class model allows for general constraints of items $i$ in categories $h$ and classes $j$. The item response model is $$P( X_{i}=h | j )=\frac{ \exp( x_{ihj} ) }{ \sum_l \exp( x_{ilj} ) }$$ with linear constraints on the class specific probabilities $$ x_{ihj}=\sum_v q_{ihjv} \lambda_{xv} $$

Linear restrictions on the $\lambda_x$ parameter can be specified by a matrix equation $V_x \lambda_x=c_x$ (see Xlambda.constr.V and Xlambda.constr.c; Neuhaus, 1996).

The latent class distribution can be smoothed by a log-linear link function (Xu & von Davier, 2008) or a logistic link function (Formann, 1992). For class $j$ in group $g$ employing a link function $h$, it holds that $$ h [ P( j| g) ] \propto \sum_w r_{jw} \delta_{gw} $$ where group-specific distributions are allowed. The values $r_{jw}$ are specified in the design matrix delta.designmatrix.

This model contains classical uni- and multidimensional latent trait models, latent class analysis, located latent class analysis, cognitive diagnostic models, the general diagnostic model and mixture item response models as special cases (see Formann & Kohlmann, 1998; Formann, 2007).

The function also allows for regularization of $\lambda_{xv}$ parameters using the lasso approach (Sun et al., 2016). More formally, the penalty function can be written as $$pen( \bold{\lambda}_x )=p_\lambda \sum_v n_v w_v | \lambda_{xv} | $$ where $p_\lambda$ can be specified with regular_lam, $w_v$ can be specified with regular_w, and $n_v$ can be specified with regular_n.

Value

An object of class slca. The list contains the following entries:

item: Data frame with conditional item probabilities
deviance: Deviance
ic: Information criteria, number of estimated parameters
Xlambda: Estimated $\lambda_x$ parameters
se.Xlambda: Standard error of $\lambda_x$ parameters
pi.k: Trait distribution
pjk: Item response probabilities evaluated for all classes
n.ik: An array of expected counts $n_{cikg}$ of ability class $c$ at item $i$ at category $k$ in group $g$
G: Number of groups
I: Number of items
N: Number of persons
delta: Parameter estimates for skillspace representation
covdelta: Covariance matrix of parameter estimates for skillspace representation
MLE.class: Classified skills for each student (MLE)
MAP.class: Classified skills for each student (MAP)
data: Original data frame
group.stat: Group statistics (sample sizes, group labels)
p.xi.aj: Individual likelihood
posterior: Individual posterior distribution
K.item: Maximal category per item
time: Info about computation time
skillspace: Used skillspace parametrization
iter: Number of iterations
seed.used: Used simulation seed
Xlambda.init: Used initial lambda parameters
delta.init: Used initial delta parameters
converged: Logical indicating whether convergence was achieved.

References

Berlinet, A. F., & Roland, C. (2012). Acceleration of the EM algorithm: P-EM versus epsilon algorithm. Computational Statistics & Data Analysis, 56(12), 4122-4137.

Chen, Y., Liu, J., Xu, G., & Ying, Z. (2015). Statistical analysis of Q-matrix based diagnostic classification models. Journal of the American Statistical Association, 110, 850-866.

Chen, Y., Li, X., Liu, J., & Ying, Z. (2017). Regularized latent class analysis with application in cognitive diagnosis. Psychometrika, 82, 660-692.

Formann, A. K. (1985). Constrained latent class models: Theory and applications. British Journal of Mathematical and Statistical Psychology, 38, 87-111.

Formann, A. K. (1992). Linear logistic latent class analysis for polytomous data. Journal of the American Statistical Association, 87, 476-486.

Formann, A. K. (2007). (Almost) Equivalence between conditional and mixture maximum likelihood estimates for some models of the Rasch type. In M. von Davier & C. H. Carstensen (Eds.), Multivariate and mixture distribution Rasch models (pp. 177-189). New York: Springer.

Formann, A. K., & Kohlmann, T. (1998). Structural latent class models. Sociological Methods & Research, 26, 530-565.

Neuhaus, W. (1996). Optimal estimation under linear constraints. Astin Bulletin, 26, 233-245.

Sun, J., Chen, Y., Liu, J., Ying, Z., & Xin, T. (2016). Latent variable selection for multidimensional item response theory models via $L_1$ regularization. Psychometrika, 81(4), 921-939.

Xu, X., & von Davier, M. (2008). Fitting the structured general diagnostic model to NAEP data. ETS Research Report ETS RR-08-27. Princeton, ETS.

Note

If some items have differing number of categories, appropriate class probabilities in non-existing categories per items can be practically set to zero by loading an item for all skill classes on a fixed $\lambda_x$ parameter of a small number, e.g. -999.

The implementation of the model builds on pieces work of Anton Formann. See http://www.antonformann.at/ for more information.