modelfit.cor.Rd
This function computes several measures of absolute model fit and local dependence indices for dichotomous item responses which are based on comparing observed and expected frequencies of item pairs (Chen, de la Torre & Zhang, 2013; see Details).
modelfit.cor(data, posterior, probs)
modelfit.cor2(data, posterior, probs)
modelfit.cor.din( dinobj, jkunits=0 )
# S3 method for modelfit.cor.din
summary(object, ...)
An \(N \times I\) data frame of dichotomous item responses
A matrix containing the posterior distribution (e.g. obtained as
an output of the din
function).
An array of dimension [items,categories,attribute classes] containing probabilities
An object of class din
, gdina
or
gdm
(only for dichotomous item responses)
An object of class din
, gdina
or
gdm
(only for dichotomous item responses)
Number of Jackknife units. The default is to use 0 units
(no use of jackknifing). If jackknife estimation should be
employed, use (say) at least 20 jackknife units.
The input jkunits
can be also a vector of jackknife unit identifiers.
Further arguments to be passed
The fit statistics are based on predictions of the pairwise table
\((X_i, X_j)\) of item responses. The \(\chi^2\) statistic X2
for
item pairs \(i\) and \(j\) is defined as
$$ \chi^2_{ij}=\sum_{k=0}^1 \sum_{l=0}^1 \frac{ (n_{ij,kl}-e_{ij,kl}) ^2 }{ e_{ij,kl} }$$
where \(n_{ij,kl}\) is the absolute frequency of \(\{ X_{i}=k,X_j=l\}\)
and \(e_{ij,kl}\) is the expected frequency using the estimated model.
Note that for calculating \(e_{ij,kl}\), individual posterior distributions
are evaluated. The \(\chi^2_{ij} \) statistic is chi-square distributed with one
degree of freedom and can be used for testing whether items \(i\) and
\(j\) are locally dependent. To control for multiple comparisons,
p-value adjustments according to the Holm and FDR method are conducted
(see stats::p.adjust
).
The residual covariance RESIDCOV
of item pairs \((i,j)\) is calculated
as $$ RESIDCOV_{ij}=
\frac{ n_{ij,11} n_{ij,00} - n_{ij,10} n_{ij,01} }{n^2 } -
\frac{ e_{ij,11} e_{ij,00} - e_{ij,10} e_{ij,01} }{n^2 } $$
where MRESIDCOV
is the average of all RESIDCOV
statistics
and is the total sample size.
The statistic MADcor
denotes the average absolute deviation between
observed correlations \(r_{ij}\) and model predicted correlations
\(\hat{r}_{ij}\) of item pairs \((i,j)\):
$$ MADcor=\frac{1}{ J(J-1)/2 } \sum_{i < j} | r_{ij} - \hat{r}_{ij} |$$
The SRMSR (standardized root mean square root of squared residuals, Maydeu-Olivares, 2013) is also based on comparing these correlations $$ SRMSR=\sqrt{ \frac{1}{ J(J-1)/2 } \sum_{i < j} ( r_{ij} - \hat{r}_{ij} )^2 } $$
For calculating MADQ3
and MADaQ3
,
residuals \(\varepsilon_{ni}=X_{ni} - e_{ni}\) of
observed and expected responses for respondents \(n\) and items \(i\) are
constructed. Then, the average of the absolute values of pairwise correlations
of these residuals is computed for MADQ3
. For MADaQ3
, the average
of the centered pairwise values (i.e. by subtracting the average Q3 statistic)
is calculated.
The difference of Fisher transformed correlations (Chen et al., 2013) is also computed and used for assessing statistical inference.
For every of the fit statistics MADcor
, MADacor
, SRMSR
, MX2
,
100*MADRESIDCOV
and MADQ3
it holds that smaller values
(values near to zero) indicate better fit.
Standard errors and confidence intervals of fit statistics are obtained by Jackknife estimation.
A list with following entries
Model fit statistics:
MADcor
: mean of absolute deviations in observed and expected correlations
(DiBello, Roussos & Stout, 2007)
SRMSR
: standardized mean square root of squared residuals
(Maydeu-Olivares, 2013; Maydeu-Olivares & Joe, 2014)
MADRESIDCOV
: Mean of absolute deviations of residual covariances
(McDonald & Mok, 1995)
MADQ3
: Mean of absolute values of \(Q_3\) statistic (Yen, 1984)
MADaQ3
: Mean of absolute values of centered \(Q_3\) statistic
Test of global absolute model fit using test
statistics of all item pairs. The statistic max(X2)
is the
maximum of all \(\chi^2_{ij}\) statistics accompanied with a p value
obtained by the Holm procedure. A similar statistic abs(fcor)
is created as the absolute value of the deviations of Fisher
transformed correlations as used in Chen et al. (2013).
Fit of itempairs which can be used for inspection of local
dependence. The \(\chi^2_{ij}\) statistic is denoted by X2
(Chen & Thissen, 1997), the statistic \(r_{ij}\) based on absolute
deviations of observed and predicted correlations is fcor
(Chen et al., 2013).
Chen, J., de la Torre, J., & Zhang, Z. (2013). Relative and absolute fit evaluation in cognitive diagnosis modeling. Journal of Educational Measurement, 50, 123-140.
Chen, W., & Thissen, D. (1997). Local dependence indexes for item pairs using item response theory. Journal of Educational and Behavioral Statistics, 22, 265-289.
DiBello, L. V., Roussos, L. A., & Stout, W. F. (2007). Review of cognitively diagnostic assessment and a summary of psychometric models. In C. R. Rao and S. Sinharay (Eds.), Handbook of Statistics, Vol. 26 (pp. 979--1030). Amsterdam: Elsevier.
Maydeu-Olivares, A. (2013). Goodness-of-fit assessment of item response theory models (with discussion). Measurement: Interdisciplinary Research and Perspectives, 11, 71-137.
Maydeu-Olivares, A., & Joe, H. (2014). Assessing approximate fit in categorical data analysis. Multivariate Behavioral Research, 49, 305-328.
McDonald, R. P., & Mok, M. M.-C. (1995). Goodness of fit in item response models. Multivariate Behavioral Research, 30, 23-40.
Yen, W. M. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Applied Psychological Measurement, 8, 125-145.
The function does not handle sample weights properly.
The function modelfit.cor2
has the same functionality as
modelfit.cor
but it is much faster because it is based on
Rcpp code.
if (FALSE) {
#############################################################################
# EXAMPLE 1: Model fit for sim.dina
#############################################################################
data(sim.dina, package="CDM")
data(sim.qmatrix, package="CDM")
dat <- sim.dina
q.matrix <- sim.qmatrix
#*** Model 1: DINA model for DINA simulated data
mod1 <- CDM::din(dat, q.matrix=q.matrix, rule="DINA" )
fmod1 <- CDM::modelfit.cor.din(mod1, jkunits=10)
summary(fmod1)
## Test of Global Model Fit
## type value p
## 1 max(X2) 8.728 0.113
## 2 abs(fcor) 0.143 0.080
##
## Fit Statistics
## est jkunits jk_est jk_se est_low est_upp
## MADcor 0.030 10 0.020 0.005 0.010 0.030
## SRMSR 0.040 10 0.023 0.006 0.011 0.035
## 100*MADRESIDCOV 0.671 10 0.445 0.125 0.200 0.690
## MADQ3 0.062 10 0.037 0.008 0.021 0.052
## MADaQ3 0.059 10 0.034 0.008 0.019 0.050
# look at first five item pairs with highest degree of local dependence
itempairs <- fmod1$itempairs
itempairs <- itempairs[ order( itempairs$X2, decreasing=TRUE ), ]
itempairs[ 1:5, c("item1","item2", "X2", "X2_p", "X2_p.holm", "Q3") ]
## item1 item2 X2 X2_p X2_p.holm Q3
## 29 Item5 Item8 8.728248 0.003133174 0.1127943 -0.26616414
## 32 Item6 Item8 2.644912 0.103881881 1.0000000 0.04873154
## 21 Item3 Item9 2.195011 0.138458201 1.0000000 0.05948456
## 10 Item2 Item4 1.449106 0.228671389 1.0000000 -0.08036216
## 30 Item5 Item9 1.393583 0.237800911 1.0000000 -0.01934420
#*** Model 2: DINO model for DINA simulated data
mod2 <- CDM::din(dat, q.matrix=q.matrix, rule="DINO" )
fmod2 <- CDM::modelfit.cor.din(mod2, jkunits=10 ) # 10 jackknife units
summary(fmod2)
## Test of Global Model Fit
## type value p
## 1 max(X2) 13.139 0.010
## 2 abs(fcor) 0.199 0.001
##
## Fit Statistics
## est jkunits jk_est jk_se est_low est_upp
## MADcor 0.056 10 0.041 0.007 0.026 0.055
## SRMSR 0.072 10 0.045 0.019 0.007 0.083
## 100*MADRESIDCOV 1.225 10 0.878 0.183 0.519 1.236
## MADQ3 0.073 10 0.055 0.012 0.031 0.080
## MADaQ3 0.073 10 0.066 0.012 0.042 0.089
#*** Model 3: estimate DINA model with gdina function
mod3 <- CDM::gdina( dat, q.matrix=q.matrix, rule="DINA" )
fmod3 <- CDM::modelfit.cor.din( mod3, jkunits=0 ) # no Jackknife estimation
summary(fmod3)
## Test of Global Model Fit
## type value p
## 1 max(X2) 8.756 0.111
## 2 abs(fcor) 0.143 0.078
##
## Fit Statistics
## est
## MADcor 0.030
## SRMSR 0.040
## MX2 0.719
## 100*MADRESIDCOV 0.668
## MADQ3 0.062
## MADaQ3 0.059
#############################################################################
# EXAMPLE 2: Simulated Example DINA model
#############################################################################
set.seed(9765)
# specify Q-matrix
Q <- matrix( c(1,0, 0,1, 1,1 ), nrow=3, ncol=2, byrow=TRUE )
q.matrix <- Q[ rep(1:3,4), ]
I <- nrow(q.matrix)
# simulate data
guess <- stats::runif(I, 0, .3 )
slip <- stats::runif( I, 0, .4 )
N <- 150 # number of persons
dat <- CDM::sim.din( N=N, q.matrix=q.matrix, slip=slip, guess=guess )$dat
#*** estmate DINA model
mod1 <- CDM::din( dat, q.matrix=q.matrix, rule="DINA" )
fmod1 <- CDM::modelfit.cor.din(mod1, jkunits=10)
summary(fmod1)
## Test of Global Model Fit
## type value p
## 1 max(X2) 10.697 0.071
## 2 abs(fcor) 0.277 0.026
##
## Fit Statistics
## est jkunits jk_est jk_se est_low est_upp
## MADcor 0.052 10 0.026 0.010 0.006 0.045
## SRMSR 0.074 10 0.048 0.013 0.022 0.074
## 100*MADRESIDCOV 1.259 10 0.646 0.213 0.228 1.063
## MADQ3 0.080 10 0.047 0.010 0.027 0.068
## MADaQ3 0.079 10 0.046 0.010 0.027 0.065
}