Several datasets for the CDM package

data(data.cdm01)
data(data.cdm02)
data(data.cdm03)
data(data.cdm04)
data(data.cdm05)
data(data.cdm06)
data(data.cdm07)
data(data.cdm08)
data(data.cdm09)
data(data.cdm10)

Format

  • Dataset data.cdm01

    This dataset is a multiple choice dataset and used in the mcdina function. The format is:

    List of 3
    $ data :'data.frame':
    ..$ I1 : int [1:5003] 3 3 4 1 1 1 1 1 1 1 ...
    ..$ I2 : int [1:5003] 1 1 3 1 1 2 1 1 2 1 ...
    ..$ I3 : int [1:5003] 4 3 2 3 2 2 2 2 1 2 ...
    ..$ I4 : int [1:5003] 3 3 3 2 2 2 2 3 3 1 ...
    ..$ I5 : int [1:5003] 2 2 2 3 1 1 2 3 2 1 ...
    ..$ I6 : int [1:5003] 3 1 1 1 1 2 1 1 1 1 ...
    ..$ I7 : int [1:5003] 1 1 2 2 1 3 1 1 1 3 ...
    ..$ I8 : int [1:5003] 1 1 1 1 1 2 1 4 3 3 ...
    ..$ I9 : int [1:5003] 3 2 1 1 1 1 3 3 1 3 ...
    ..$ I10: int [1:5003] 2 1 2 1 1 2 2 2 2 1 ...
    ..$ I11: int [1:5003] 2 2 2 2 1 2 1 2 1 1 ...
    ..$ I12: int [1:5003] 1 2 1 1 2 1 1 1 1 2 ...
    ..$ I13: int [1:5003] 2 1 1 1 2 1 2 2 1 1 ...
    ..$ I14: int [1:5003] 1 1 1 1 1 2 1 1 2 1 ...
    ..$ I15: int [1:5003] 1 2 1 1 1 1 1 1 1 1 ...
    ..$ I16: int [1:5003] 1 2 2 1 2 2 2 1 1 1 ...
    ..$ I17: int [1:5003] 1 1 1 1 1 1 1 1 1 1 ...
    $ group : int [1:5003] 1 1 1 1 1 1 1 1 1 1 ...
    $ q.matrix:'data.frame':
    ..$ item : int [1:52] 1 1 1 1 2 2 2 2 3 3 ...
    ..$ categ: int [1:52] 1 2 3 4 1 2 3 4 1 2 ...
    ..$ A1 : int [1:52] 0 1 0 1 0 1 1 1 0 0 ...
    ..$ A2 : int [1:52] 0 0 1 1 0 0 0 1 0 0 ...
    ..$ A3 : int [1:52] 0 0 0 0 0 0 0 0 0 0 ...

  • Dataset data.cdm02

    Multiple choice dataset with a Q-matrix designed for polytomous attributes.

    List of 2
    $ data :'data.frame':
    ..$ I1 : int [1:3000] 3 3 4 1 1 1 1 1 1 1 ...
    ..$ I2 : int [1:3000] 1 1 3 1 1 2 1 1 2 1 ...
    ..$ I3 : int [1:3000] 4 3 2 3 2 2 2 2 1 2 ...
    [...]
    ..$ B17: num [1:3000] 1 1 1 1 1 1 1 1 1 1 ...
    ..$ B18: num [1:3000] 1 1 1 1 2 2 2 2 2 2 ...
    $ q.matrix:'data.frame':
    ..$ item : int [1:100] 1 1 1 1 2 2 2 2 3 3 ...
    ..$ categ: int [1:100] 1 2 3 4 1 2 3 4 1 2 ...
    ..$ A1 : num [1:100] 0 1 0 1 0 1 1 1 0 0 ...
    ..$ A2 : num [1:100] 0 0 1 1 0 0 0 1 0 0 ...
    ..$ A3 : num [1:100] 0 0 0 0 0 0 0 0 0 0 ...
    ..$ B1 : num [1:100] 0 0 0 0 0 0 0 0 0 0 ...

  • Dataset data.cdm03:

    This is a resimulated dataset from Chiu, Koehn and Wu (2016) where the data generating model is a reduced RUM model. See Example 1.

    List of 2
    $ data : num [1:725, 1:16] 0 1 1 1 1 1 1 1 1 1 ...
    ..- attr(*, "dimnames")=List of 2
    .. ..$ : NULL
    .. ..$ : chr [1:16] "I01" "I02" "I03" "I04" ...
    $ qmatrix:'data.frame': 16 obs. of 6 variables:
    ..$ item: Factor w/ 16 levels "I01","I02","I03",..: 1 2 3 4 5 6 7 8 9 10 ...
    ..$ A1 : int [1:16] 1 0 0 0 0 0 0 0 1 1 ...
    ..$ A2 : int [1:16] 0 1 0 0 1 1 0 0 0 0 ...
    ..$ A3 : int [1:16] 0 0 1 1 1 1 0 0 0 0 ...
    ..$ A4 : int [1:16] 0 0 0 0 0 0 1 1 1 1 ...
    ..$ A5 : int [1:16] 0 0 0 0 0 0 0 0 0 0 ...

  • Dataset data.cdm04:

    Simulated dataset for the sequential DINA model (as described in Ma & de la Torre, 2016). The dataset contains 1000 persons and 12 items which measure 2 skills.

    List of 3
    $ data : num [1:1000, 1:12] 0 0 0 1 1 0 0 0 0 0 ...
    ..- attr(*, "dimnames")=List of 2
    .. ..$ : NULL
    .. ..$ : chr [1:12] "I1" "I2" "I3" "I4" ...
    $ q.matrix1:'data.frame': 18 obs. of 4 variables:
    ..$ Item: chr [1:18] "I1" "I2" "I3" "I4" ...
    ..$ Cat : int [1:18] 1 1 1 1 1 1 1 2 1 2 ...
    ..$ A1 : int [1:18] 1 1 1 0 0 0 1 1 1 1 ...
    ..$ A2 : int [1:18] 0 0 0 1 1 1 0 0 0 0 ...
    $ q.matrix2:'data.frame': 18 obs. of 4 variables:
    ..$ Item: chr [1:18] "I1" "I2" "I3" "I4" ...
    ..$ Cat : int [1:18] 1 1 1 1 1 1 1 2 1 2 ...
    ..$ A1 : num [1:18] 1 1 1 0 0 0 1 1 1 1 ...
    ..$ A2 : num [1:18] 0 0 0 1 1 1 0 0 0 0 ...

  • Dataset data.cdm05:

    Example dataset used in Philipp, Strobl, de la Torre and Zeileis (2018). This dataset is a sub-dataset of the probability dataset in the pks package (Heller & Wickelmaier, 2013).

    List of 3
    $ data :'data.frame': 504 obs. of 12 variables:
    ..$ b101: num [1:504] 1 1 1 1 1 1 1 1 1 1 ...
    ..$ b102: num [1:504] 1 1 1 1 1 1 1 1 1 1 ...
    ..$ b103: num [1:504] 1 1 1 1 1 1 1 1 1 1 ...
    ..$ b104: num [1:504] 1 1 1 1 0 1 0 0 0 1 ...
    ..$ b105: num [1:504] 1 0 1 1 1 1 0 1 1 1 ...
    ..$ b106: num [1:504] 1 1 1 1 1 1 1 1 1 1 ...
    ..$ b107: num [1:504] 1 1 1 1 1 1 1 1 1 1 ...
    ..$ b108: num [1:504] 1 1 1 1 1 1 0 1 1 1 ...
    ..$ b109: num [1:504] 1 1 0 1 1 0 0 1 1 0 ...
    ..$ b110: num [1:504] 0 0 0 1 0 0 0 0 0 1 ...
    ..$ b111: num [1:504] 0 1 0 0 0 1 0 0 0 0 ...
    ..$ b112: num [1:504] 1 1 0 1 0 1 0 1 0 0 ...
    $ q.matrix:'data.frame': 12 obs. of 4 variables:
    ..$ pb: num [1:12] 1 0 0 0 1 1 1 1 1 0 ...
    ..$ cp: num [1:12] 0 1 0 0 1 1 0 0 0 1 ...
    ..$ un: num [1:12] 0 0 1 0 0 0 1 1 0 0 ...
    ..$ id: num [1:12] 0 0 0 1 0 0 0 0 1 1 ...
    $ skills : Named chr [1:4] "how to calculate the classic probability "
    ..- attr(*, "names")=chr [1:4] "pb" "cp" "un" "id"

  • Dataset data.cdm06:

    Resimulated example dataset from Chen and Chen (2017).

    List of 3
    $ data :'data.frame': 2733 obs. of 15 variables:
    ..$ I01: num [1:2733] 1 0 0 1 0 0 0 1 1 1 ...
    ..$ I02: num [1:2733] 1 0 0 1 1 0 1 0 0 1 ...
    ..$ I03: num [1:2733] 0 0 0 1 1 0 1 0 1 0 ...
    ..$ I04: num [1:2733] 1 1 0 0 0 0 1 1 1 0 ...
    ..$ I05: num [1:2733] 1 0 1 1 0 1 1 1 1 1 ...
    ..$ I06: num [1:2733] 0 0 0 1 1 0 0 0 1 1 ...
    ..$ I07: num [1:2733] 1 1 1 0 0 1 1 0 1 1 ...
    ..$ I08: num [1:2733] 0 0 0 0 0 0 0 0 1 1 ...
    ..$ I09: num [1:2733] 1 0 0 1 1 1 0 1 0 1 ...
    ..$ I10: num [1:2733] 0 0 0 1 0 1 1 0 1 1 ...
    ..$ I11: num [1:2733] 0 1 0 1 1 1 1 0 1 1 ...
    ..$ I12: num [1:2733] 0 1 0 1 0 0 0 1 1 1 ...
    ..$ I13: num [1:2733] 0 0 1 1 0 1 0 0 0 1 ...
    ..$ I14: num [1:2733] 0 0 0 1 1 0 1 1 0 0 ...
    ..$ I15: num [1:2733] 0 0 0 1 0 0 1 0 1 1 ...
    $ q.matrix:'data.frame': 15 obs. of 5 variables:
    ..$ RI: num [1:15] 1 1 1 0 1 1 1 1 0 0 ...
    ..$ JS: num [1:15] 1 0 0 1 0 0 0 0 0 1 ...
    ..$ GI: num [1:15] 0 1 0 1 0 0 1 1 1 1 ...
    ..$ II: num [1:15] 0 1 1 0 1 0 1 0 0 0 ...
    ..$ MI: num [1:15] 0 0 1 0 0 0 0 0 1 0 ...
    $ skills : chr [1:5, 1:2] "Retrieving explicit information " ...
    ..- attr(*, "dimnames")=List of 2
    .. ..$ : chr [1:5] "RI" "JS" "GI" "II" ...
    .. ..$ : chr [1:2] "skill" "description"

  • Dataset data.cdm07:

    This is a resimulated dataset from the social anxiety disorder data concerning social phobia which involve 13 dichotomous questions (Fang, Liu & Ling, 2017). The simulation was based on a latent class model with five classes. The dataset was also used in Chen, Li, Liu and Ying (2017).

    $ data : num [1:863, 1:13] 1 0 1 1 1 1 1 1 1 1 ...
    ..- attr(*, "dimnames")=List of 2
    .. ..$ : NULL
    .. ..$ : chr [1:13] "I1" "I2" "I3" "I4" ...
    $ q.matrix: num [1:13, 1:3] 1 1 1 1 0 0 0 0 0 0 ...
    ..- attr(*, "dimnames")=List of 2
    .. ..$ : chr [1:13] "I1" "I2" "I3" "I4" ...
    .. ..$ : chr [1:3] "A1" "A2" "A3"
    $ items : atomic [1:13] 1 speaking in front of other people? ...
    ..- attr(*, "stem")=chr "Have you ever had a strong fear or avoidance of ..."

  • Dataset data.cdm08:

    This is a simulated dataset involving four skills and three misconceptions for the model for simultaneously identifying skills and misconceptions (SISM; Kuo, Chen & de la Torre, 2018). The Q-matrix follows the specification in their simulation study.

    List of 2
    $ data :'data.frame': 1300 obs. of 20 variables:
    ..$ I01: num [1:1300] 1 0 0 1 1 1 1 1 1 1 ...
    ..$ I02: num [1:1300] 0 0 0 0 1 1 1 1 1 1 ...
    ..$ I03: num [1:1300] 0 0 0 0 1 1 1 1 1 1 ...
    ..$ I04: num [1:1300] 1 1 0 1 0 1 1 0 1 1 ...
    ..$ I05: num [1:1300] 1 1 1 0 1 1 0 1 1 1 ...
    ..[...]
    ..$ I18: num [1:1300] 0 1 0 0 0 0 0 0 0 1 ...
    ..$ I19: num [1:1300] 1 1 0 0 0 0 0 1 1 1 ...
    ..$ I20: num [1:1300] 1 1 0 0 0 1 0 1 0 1 ...
    $ q.matrix:'data.frame': 20 obs. of 7 variables:
    ..$ S1: num [1:20] 1 0 0 0 0 0 0 1 0 0 ...
    ..$ S2: num [1:20] 0 1 0 0 0 0 0 0 1 0 ...
    ..$ S3: num [1:20] 0 0 1 0 0 0 0 0 0 1 ...
    ..$ S4: num [1:20] 0 0 0 1 0 0 0 0 0 0 ...
    ..$ B1: num [1:20] 0 0 0 0 1 0 0 1 1 0 ...
    ..$ B2: num [1:20] 0 0 0 0 0 1 0 0 0 0 ...
    ..$ B3: num [1:20] 0 0 0 0 0 0 1 0 0 1 ...

  • Dataset data.cdm09: This is a simulated dataset involving polytomous skills which is adapted from the empirical example (proportional reasoning data) of Chen and de la Torre (2013).

    List of 2
    $ data : num [1:500, 1:15] 1 0 1 1 0 1 1 1 1 1 ...
    ..- attr(*, "dimnames")=List of 2
    .. ..$ : NULL
    .. ..$ : chr [1:15] "I1" "I2" "I3" "I4" ...
    $ q.matrix:'data.frame': 15 obs. of 4 variables:
    ..$ A1: int [1:15] 0 0 0 0 2 0 0 2 1 1 ...
    ..$ A2: int [1:15] 1 0 2 0 0 1 2 0 1 1 ...
    ..$ A3: int [1:15] 0 0 0 1 0 0 0 0 0 0 ...
    ..$ A4: int [1:15] 0 1 1 0 0 0 0 0 0 0 ...

  • Dataset data.cdm10: This is a simulated dataset involving a hierarchical skill structure. Skill A has four levels, skill B possesses two levels and skill C has three levels.

    List of 2
    $ data : num [1:1500, 1:15] 1 1 0 0 0 1 1 0 0 1 ...
    ..- attr(*, "dimnames")=List of 2
    .. ..$ : NULL
    .. ..$ : chr [1:15] "I1" "I2" "I3" "I4" ...
    $ q.matrix: num [1:15, 1:6] 1 1 1 1 1 1 0 0 0 0 ...
    ..- attr(*, "dimnames")=List of 2
    .. ..$ : chr [1:15] "I1" "I2" "I3" "I4" ...
    .. ..$ : chr [1:6] "A1" "A2" "A3" "B1" ...

References

Chen, H., & Chen, J. (2017). Cognitive diagnostic research on chinese students' English listening skills and implications on skill training. English Language Teaching, 10(12), 107-115. http://dx.doi.org/10.5539/elt.v10n12p107

Chen, J., & de la Torre, J. (2013). A general cognitive diagnosis model for expert-defined polytomous attributes. Applied Psychological Measurement, 37, 419-437. http://dx.doi.org/10.1177/0146621613479818

Chen, Y., Li, X., Liu, J., & Ying, Z. (2017). Regularized latent class analysis with application in cognitive diagnosis. Psychometrika, 82, 660-692. http://dx.doi.org/10.1007/s11336-016-9545-6

Chiu, C.-Y., Koehn, H.-F., & Wu, H.-M. (2016). Fitting the reduced RUM with Mplus: A tutorial. International Journal of Testing, 16(4), 331-351. http://dx.doi.org/10.1080/15305058.2016.1148038

Fang, G., Liu, J., & Ying, Z. (2017). On the identifiability of diagnostic classification models. arXiv, 1706.01240. https://arxiv.org/abs/1706.01240

Heller, J. and Wickelmaier, F. (2013). Minimum discrepancy estimation in probabilistic knowledge structures. Electronic Notes in Discrete Mathematics, 42, 49-56.
http://dx.doi.org/10.1016/j.endm.2013.05.145

Kuo, B.-C., Chen, C.-H., & de la Torre, J. (2018). A cognitive diagnosis model for identifying coexisting skills and misconceptions. Applied Psychological Measurement, 42(3), 179-191. http://dx.doi.org/10.1177/0146621617722791

Ma, W., & de la Torre, J. (2016). A sequential cognitive diagnosis model for polytomous responses. British Journal of Mathematical and Statistical Psychology, 69(3), 253-275.
https://doi.org/10.1111/bmsp.12070

Philipp, M., Strobl, C., de la Torre, J., & Zeileis, A. (2018). On the estimation of standard errors in cognitive diagnosis models. Journal of Educational and Behavioral Statistics, 43(1), 88-115. http://dx.doi.org/10.3102/1076998617719728

Examples

if (FALSE) {
#############################################################################
# EXAMPLE 1: Reduced RUM model, Chiu et al. (2016)
#############################################################################

data(data.cdm03, package="CDM")
dat <- data.cdm03$data
qmatrix <- data.cdm03$qmatrix

#*** Model 1: Reduced RUM
mod1 <- CDM::gdina( dat, q.matrix=qmatrix[,-1], rule="RRUM" )
summary(mod1)

#*** Model 2: Additive model with identity link function
mod2 <- CDM::gdina( dat, q.matrix=qmatrix[,-1], rule="ACDM" )
summary(mod2)

#*** Model 3: Additive model with logit link function
mod3 <- CDM::gdina( dat, q.matrix=qmatrix[,-1], rule="ACDM", linkfct="logit")
summary(mod3)

#############################################################################
# EXAMPLE 2: GDINA model - probability dataset from the pks package
#############################################################################

data(data.cdm05, package="CDM")
dat <- data.cdm05$data
Q <- data.cdm05$q.matrix

#* estimate model
mod1 <- CDM::gdina( dat, q.matrix=Q )
summary(mod1)
}