Assessing Model Fit and Local Dependence by Comparing Observed and Expected Item Pair Correlations

This function computes several measures of absolute model fit and local dependence indices for dichotomous item responses which are based on comparing observed and expected frequencies of item pairs (Chen, de la Torre & Zhang, 2013; see Details).

modelfit.cor(data, posterior, probs)
modelfit.cor2(data, posterior, probs)

modelfit.cor.din( dinobj, jkunits=0 )

# S3 method for modelfit.cor.din
summary(object, ...)

Arguments

data: An $N \times I$ data frame of dichotomous item responses
posterior: A matrix containing the posterior distribution (e.g. obtained as an output of the din function).
probs: An array of dimension [items,categories,attribute classes] containing probabilities
dinobj: An object of class din, gdina or gdm (only for dichotomous item responses)
object: An object of class din, gdina or gdm (only for dichotomous item responses)
jkunits: Number of Jackknife units. The default is to use 0 units (no use of jackknifing). If jackknife estimation should be employed, use (say) at least 20 jackknife units. The input jkunits can be also a vector of jackknife unit identifiers.
...: Further arguments to be passed

Details

The fit statistics are based on predictions of the pairwise table $(X_i, X_j)$ of item responses. The $\chi^2$ statistic X2 for item pairs $i$ and $j$ is defined as $$ \chi^2_{ij}=\sum_{k=0}^1 \sum_{l=0}^1 \frac{ (n_{ij,kl}-e_{ij,kl}) ^2 }{ e_{ij,kl} }$$ where $n_{ij,kl}$ is the absolute frequency of $\{ X_{i}=k,X_j=l\}$ and $e_{ij,kl}$ is the expected frequency using the estimated model. Note that for calculating $e_{ij,kl}$, individual posterior distributions are evaluated. The $\chi^2_{ij} $ statistic is chi-square distributed with one degree of freedom and can be used for testing whether items $i$ and $j$ are locally dependent. To control for multiple comparisons, p-value adjustments according to the Holm and FDR method are conducted (see stats::p.adjust).

The residual covariance RESIDCOV of item pairs $(i,j)$ is calculated as $$ RESIDCOV_{ij}= \frac{ n_{ij,11} n_{ij,00} - n_{ij,10} n_{ij,01} }{n^2 } - \frac{ e_{ij,11} e_{ij,00} - e_{ij,10} e_{ij,01} }{n^2 } $$ where MRESIDCOV is the average of all RESIDCOV statistics and is the total sample size.

The statistic MADcor denotes the average absolute deviation between observed correlations $r_{ij}$ and model predicted correlations $\hat{r}_{ij}$ of item pairs $(i,j)$: $$ MADcor=\frac{1}{ J(J-1)/2 } \sum_{i < j} | r_{ij} - \hat{r}_{ij} |$$

The SRMSR (standardized root mean square root of squared residuals, Maydeu-Olivares, 2013) is also based on comparing these correlations $$ SRMSR=\sqrt{ \frac{1}{ J(J-1)/2 } \sum_{i < j} ( r_{ij} - \hat{r}_{ij} )^2 } $$

For calculating MADQ3 and MADaQ3, residuals $\varepsilon_{ni}=X_{ni} - e_{ni}$ of observed and expected responses for respondents $n$ and items $i$ are constructed. Then, the average of the absolute values of pairwise correlations of these residuals is computed for MADQ3. For MADaQ3, the average of the centered pairwise values (i.e. by subtracting the average Q3 statistic) is calculated.

The difference of Fisher transformed correlations (Chen et al., 2013) is also computed and used for assessing statistical inference.

For every of the fit statistics MADcor, MADacor, SRMSR, MX2, 100*MADRESIDCOV and MADQ3 it holds that smaller values (values near to zero) indicate better fit.

Standard errors and confidence intervals of fit statistics are obtained by Jackknife estimation.

Value

A list with following entries

modelfit.stat

Model fit statistics:

MADcor: mean of absolute deviations in observed and expected correlations (DiBello, Roussos & Stout, 2007)

SRMSR: standardized mean square root of squared residuals (Maydeu-Olivares, 2013; Maydeu-Olivares & Joe, 2014)

MADRESIDCOV: Mean of absolute deviations of residual covariances (McDonald & Mok, 1995)

MADQ3: Mean of absolute values of $Q_3$ statistic (Yen, 1984)

MADaQ3: Mean of absolute values of centered $Q_3$ statistic

modelfit.test

Test of global absolute model fit using test statistics of all item pairs. The statistic max(X2) is the maximum of all $\chi^2_{ij}$ statistics accompanied with a p value obtained by the Holm procedure. A similar statistic abs(fcor) is created as the absolute value of the deviations of Fisher transformed correlations as used in Chen et al. (2013).

itempairs

Fit of itempairs which can be used for inspection of local dependence. The $\chi^2_{ij}$ statistic is denoted by X2 (Chen & Thissen, 1997), the statistic $r_{ij}$ based on absolute deviations of observed and predicted correlations is fcor (Chen et al., 2013).

References

Chen, J., de la Torre, J., & Zhang, Z. (2013). Relative and absolute fit evaluation in cognitive diagnosis modeling. Journal of Educational Measurement, 50, 123-140.

Chen, W., & Thissen, D. (1997). Local dependence indexes for item pairs using item response theory. Journal of Educational and Behavioral Statistics, 22, 265-289.

DiBello, L. V., Roussos, L. A., & Stout, W. F. (2007). Review of cognitively diagnostic assessment and a summary of psychometric models. In C. R. Rao and S. Sinharay (Eds.), Handbook of Statistics, Vol. 26 (pp. 979--1030). Amsterdam: Elsevier.

Maydeu-Olivares, A. (2013). Goodness-of-fit assessment of item response theory models (with discussion). Measurement: Interdisciplinary Research and Perspectives, 11, 71-137.

Maydeu-Olivares, A., & Joe, H. (2014). Assessing approximate fit in categorical data analysis. Multivariate Behavioral Research, 49, 305-328.

McDonald, R. P., & Mok, M. M.-C. (1995). Goodness of fit in item response models. Multivariate Behavioral Research, 30, 23-40.

Yen, W. M. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Applied Psychological Measurement, 8, 125-145.

Note

The function does not handle sample weights properly.

The function modelfit.cor2 has the same functionality as modelfit.cor but it is much faster because it is based on Rcpp code.

Examples

if (FALSE) {
#############################################################################
# EXAMPLE 1: Model fit for sim.dina
#############################################################################

data(sim.dina, package="CDM")
data(sim.qmatrix, package="CDM")
dat <- sim.dina
q.matrix <- sim.qmatrix

#*** Model 1: DINA model for DINA simulated data
mod1 <- CDM::din(dat, q.matrix=q.matrix, rule="DINA" )
fmod1 <- CDM::modelfit.cor.din(mod1, jkunits=10)
summary(fmod1)
  ##   Test of Global Model Fit
  ##          type value     p
  ##   1   max(X2) 8.728 0.113
  ##   2 abs(fcor) 0.143 0.080
  ##
  ##   Fit Statistics
  ##                     est jkunits jk_est jk_se est_low est_upp
  ##   MADcor          0.030      10  0.020 0.005   0.010   0.030
  ##   SRMSR           0.040      10  0.023 0.006   0.011   0.035
  ##   100*MADRESIDCOV 0.671      10  0.445 0.125   0.200   0.690
  ##   MADQ3           0.062      10  0.037 0.008   0.021   0.052
  ##   MADaQ3          0.059      10  0.034 0.008   0.019   0.050

# look at first five item pairs with highest degree of local dependence
itempairs <- fmod1$itempairs
itempairs <- itempairs[ order( itempairs$X2, decreasing=TRUE ), ]
itempairs[ 1:5, c("item1","item2", "X2", "X2_p", "X2_p.holm", "Q3") ]
  ##      item1 item2       X2        X2_p X2_p.holm          Q3
  ##   29 Item5 Item8 8.728248 0.003133174 0.1127943 -0.26616414
  ##   32 Item6 Item8 2.644912 0.103881881 1.0000000  0.04873154
  ##   21 Item3 Item9 2.195011 0.138458201 1.0000000  0.05948456
  ##   10 Item2 Item4 1.449106 0.228671389 1.0000000 -0.08036216
  ##   30 Item5 Item9 1.393583 0.237800911 1.0000000 -0.01934420

#*** Model 2: DINO model for DINA simulated data
mod2 <- CDM::din(dat, q.matrix=q.matrix, rule="DINO" )
fmod2 <- CDM::modelfit.cor.din(mod2, jkunits=10 )   # 10 jackknife units
summary(fmod2)
  ##   Test of Global Model Fit
  ##          type  value     p
  ##   1   max(X2) 13.139 0.010
  ##   2 abs(fcor)  0.199 0.001
  ##
  ##   Fit Statistics
  ##                     est jkunits jk_est jk_se est_low est_upp
  ##   MADcor          0.056      10  0.041 0.007   0.026   0.055
  ##   SRMSR           0.072      10  0.045 0.019   0.007   0.083
  ##   100*MADRESIDCOV 1.225      10  0.878 0.183   0.519   1.236
  ##   MADQ3           0.073      10  0.055 0.012   0.031   0.080
  ##   MADaQ3          0.073      10  0.066 0.012   0.042   0.089

#*** Model 3: estimate DINA model with gdina function
mod3 <- CDM::gdina( dat, q.matrix=q.matrix, rule="DINA" )
fmod3 <- CDM::modelfit.cor.din( mod3, jkunits=0 )  # no Jackknife estimation
summary(fmod3)
  ##   Test of Global Model Fit
  ##          type value     p
  ##   1   max(X2) 8.756 0.111
  ##   2 abs(fcor) 0.143 0.078
  ##
  ##   Fit Statistics
  ##                     est
  ##   MADcor          0.030
  ##   SRMSR           0.040
  ##   MX2             0.719
  ##   100*MADRESIDCOV 0.668
  ##   MADQ3           0.062
  ##   MADaQ3          0.059

#############################################################################
# EXAMPLE 2: Simulated Example DINA model
#############################################################################

set.seed(9765)
# specify Q-matrix
Q <- matrix( c(1,0, 0,1, 1,1 ), nrow=3, ncol=2, byrow=TRUE )
q.matrix <- Q[ rep(1:3,4), ]
I <- nrow(q.matrix)

# simulate data
guess <- stats::runif(I, 0, .3 )
slip <- stats::runif( I, 0, .4 )
N <- 150   # number of persons
dat <- CDM::sim.din( N=N, q.matrix=q.matrix, slip=slip, guess=guess )$dat

#*** estmate DINA model
mod1 <- CDM::din( dat, q.matrix=q.matrix, rule="DINA" )
fmod1 <- CDM::modelfit.cor.din(mod1, jkunits=10)
summary(fmod1)
  ##  Test of Global Model Fit
  ##         type  value     p
  ##  1   max(X2) 10.697 0.071
  ##  2 abs(fcor)  0.277 0.026
  ##
  ##  Fit Statistics
  ##                    est jkunits jk_est jk_se est_low est_upp
  ##  MADcor          0.052      10  0.026 0.010   0.006   0.045
  ##  SRMSR           0.074      10  0.048 0.013   0.022   0.074
  ##  100*MADRESIDCOV 1.259      10  0.646 0.213   0.228   1.063
  ##  MADQ3           0.080      10  0.047 0.010   0.027   0.068
  ##  MADaQ3          0.079      10  0.046 0.010   0.027   0.065
}