Several Datasets for the CDM Package

Several datasets for the CDM package

data(data.cdm01)
data(data.cdm02)
data(data.cdm03)
data(data.cdm04)
data(data.cdm05)
data(data.cdm06)
data(data.cdm07)
data(data.cdm08)
data(data.cdm09)
data(data.cdm10)

Format

Dataset data.cdm01

This dataset is a multiple choice dataset and used in the mcdina function. The format is:

List of 3
$ data :'data.frame':
..$ I1 : int [1:5003] 3 3 4 1 1 1 1 1 1 1 ...
..$ I2 : int [1:5003] 1 1 3 1 1 2 1 1 2 1 ...
..$ I3 : int [1:5003] 4 3 2 3 2 2 2 2 1 2 ...
..$ I4 : int [1:5003] 3 3 3 2 2 2 2 3 3 1 ...
..$ I5 : int [1:5003] 2 2 2 3 1 1 2 3 2 1 ...
..$ I6 : int [1:5003] 3 1 1 1 1 2 1 1 1 1 ...
..$ I7 : int [1:5003] 1 1 2 2 1 3 1 1 1 3 ...
..$ I8 : int [1:5003] 1 1 1 1 1 2 1 4 3 3 ...
..$ I9 : int [1:5003] 3 2 1 1 1 1 3 3 1 3 ...
..$ I10: int [1:5003] 2 1 2 1 1 2 2 2 2 1 ...
..$ I11: int [1:5003] 2 2 2 2 1 2 1 2 1 1 ...
..$ I12: int [1:5003] 1 2 1 1 2 1 1 1 1 2 ...
..$ I13: int [1:5003] 2 1 1 1 2 1 2 2 1 1 ...
..$ I14: int [1:5003] 1 1 1 1 1 2 1 1 2 1 ...
..$ I15: int [1:5003] 1 2 1 1 1 1 1 1 1 1 ...
..$ I16: int [1:5003] 1 2 2 1 2 2 2 1 1 1 ...
..$ I17: int [1:5003] 1 1 1 1 1 1 1 1 1 1 ...
$ group : int [1:5003] 1 1 1 1 1 1 1 1 1 1 ...
$ q.matrix:'data.frame':
..$ item : int [1:52] 1 1 1 1 2 2 2 2 3 3 ...
..$ categ: int [1:52] 1 2 3 4 1 2 3 4 1 2 ...
..$ A1 : int [1:52] 0 1 0 1 0 1 1 1 0 0 ...
..$ A2 : int [1:52] 0 0 1 1 0 0 0 1 0 0 ...
..$ A3 : int [1:52] 0 0 0 0 0 0 0 0 0 0 ...
Dataset data.cdm02

Multiple choice dataset with a Q-matrix designed for polytomous attributes.

List of 2
$ data :'data.frame':
..$ I1 : int [1:3000] 3 3 4 1 1 1 1 1 1 1 ...
..$ I2 : int [1:3000] 1 1 3 1 1 2 1 1 2 1 ...
..$ I3 : int [1:3000] 4 3 2 3 2 2 2 2 1 2 ...
[...]
..$ B17: num [1:3000] 1 1 1 1 1 1 1 1 1 1 ...
..$ B18: num [1:3000] 1 1 1 1 2 2 2 2 2 2 ...
$ q.matrix:'data.frame':
..$ item : int [1:100] 1 1 1 1 2 2 2 2 3 3 ...
..$ categ: int [1:100] 1 2 3 4 1 2 3 4 1 2 ...
..$ A1 : num [1:100] 0 1 0 1 0 1 1 1 0 0 ...
..$ A2 : num [1:100] 0 0 1 1 0 0 0 1 0 0 ...
..$ A3 : num [1:100] 0 0 0 0 0 0 0 0 0 0 ...
..$ B1 : num [1:100] 0 0 0 0 0 0 0 0 0 0 ...
Dataset data.cdm03:

This is a resimulated dataset from Chiu, Koehn and Wu (2016) where the data generating model is a reduced RUM model. See Example 1.

List of 2
$ data : num [1:725, 1:16] 0 1 1 1 1 1 1 1 1 1 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:16] "I01" "I02" "I03" "I04" ...
$ qmatrix:'data.frame': 16 obs. of 6 variables:
..$ item: Factor w/ 16 levels "I01","I02","I03",..: 1 2 3 4 5 6 7 8 9 10 ...
..$ A1 : int [1:16] 1 0 0 0 0 0 0 0 1 1 ...
..$ A2 : int [1:16] 0 1 0 0 1 1 0 0 0 0 ...
..$ A3 : int [1:16] 0 0 1 1 1 1 0 0 0 0 ...
..$ A4 : int [1:16] 0 0 0 0 0 0 1 1 1 1 ...
..$ A5 : int [1:16] 0 0 0 0 0 0 0 0 0 0 ...
Dataset data.cdm04:

Simulated dataset for the sequential DINA model (as described in Ma & de la Torre, 2016). The dataset contains 1000 persons and 12 items which measure 2 skills.

List of 3
$ data : num [1:1000, 1:12] 0 0 0 1 1 0 0 0 0 0 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:12] "I1" "I2" "I3" "I4" ...
$ q.matrix1:'data.frame': 18 obs. of 4 variables:
..$ Item: chr [1:18] "I1" "I2" "I3" "I4" ...
..$ Cat : int [1:18] 1 1 1 1 1 1 1 2 1 2 ...
..$ A1 : int [1:18] 1 1 1 0 0 0 1 1 1 1 ...
..$ A2 : int [1:18] 0 0 0 1 1 1 0 0 0 0 ...
$ q.matrix2:'data.frame': 18 obs. of 4 variables:
..$ Item: chr [1:18] "I1" "I2" "I3" "I4" ...
..$ Cat : int [1:18] 1 1 1 1 1 1 1 2 1 2 ...
..$ A1 : num [1:18] 1 1 1 0 0 0 1 1 1 1 ...
..$ A2 : num [1:18] 0 0 0 1 1 1 0 0 0 0 ...
Dataset data.cdm05:

Example dataset used in Philipp, Strobl, de la Torre and Zeileis (2018). This dataset is a sub-dataset of the probability dataset in the pks package (Heller & Wickelmaier, 2013).

List of 3
$ data :'data.frame': 504 obs. of 12 variables:
..$ b101: num [1:504] 1 1 1 1 1 1 1 1 1 1 ...
..$ b102: num [1:504] 1 1 1 1 1 1 1 1 1 1 ...
..$ b103: num [1:504] 1 1 1 1 1 1 1 1 1 1 ...
..$ b104: num [1:504] 1 1 1 1 0 1 0 0 0 1 ...
..$ b105: num [1:504] 1 0 1 1 1 1 0 1 1 1 ...
..$ b106: num [1:504] 1 1 1 1 1 1 1 1 1 1 ...
..$ b107: num [1:504] 1 1 1 1 1 1 1 1 1 1 ...
..$ b108: num [1:504] 1 1 1 1 1 1 0 1 1 1 ...
..$ b109: num [1:504] 1 1 0 1 1 0 0 1 1 0 ...
..$ b110: num [1:504] 0 0 0 1 0 0 0 0 0 1 ...
..$ b111: num [1:504] 0 1 0 0 0 1 0 0 0 0 ...
..$ b112: num [1:504] 1 1 0 1 0 1 0 1 0 0 ...
$ q.matrix:'data.frame': 12 obs. of 4 variables:
..$ pb: num [1:12] 1 0 0 0 1 1 1 1 1 0 ...
..$ cp: num [1:12] 0 1 0 0 1 1 0 0 0 1 ...
..$ un: num [1:12] 0 0 1 0 0 0 1 1 0 0 ...
..$ id: num [1:12] 0 0 0 1 0 0 0 0 1 1 ...
$ skills : Named chr [1:4] "how to calculate the classic probability "
..- attr(*, "names")=chr [1:4] "pb" "cp" "un" "id"
Dataset data.cdm06:

Resimulated example dataset from Chen and Chen (2017).

List of 3
$ data :'data.frame': 2733 obs. of 15 variables:
..$ I01: num [1:2733] 1 0 0 1 0 0 0 1 1 1 ...
..$ I02: num [1:2733] 1 0 0 1 1 0 1 0 0 1 ...
..$ I03: num [1:2733] 0 0 0 1 1 0 1 0 1 0 ...
..$ I04: num [1:2733] 1 1 0 0 0 0 1 1 1 0 ...
..$ I05: num [1:2733] 1 0 1 1 0 1 1 1 1 1 ...
..$ I06: num [1:2733] 0 0 0 1 1 0 0 0 1 1 ...
..$ I07: num [1:2733] 1 1 1 0 0 1 1 0 1 1 ...
..$ I08: num [1:2733] 0 0 0 0 0 0 0 0 1 1 ...
..$ I09: num [1:2733] 1 0 0 1 1 1 0 1 0 1 ...
..$ I10: num [1:2733] 0 0 0 1 0 1 1 0 1 1 ...
..$ I11: num [1:2733] 0 1 0 1 1 1 1 0 1 1 ...
..$ I12: num [1:2733] 0 1 0 1 0 0 0 1 1 1 ...
..$ I13: num [1:2733] 0 0 1 1 0 1 0 0 0 1 ...
..$ I14: num [1:2733] 0 0 0 1 1 0 1 1 0 0 ...
..$ I15: num [1:2733] 0 0 0 1 0 0 1 0 1 1 ...
$ q.matrix:'data.frame': 15 obs. of 5 variables:
..$ RI: num [1:15] 1 1 1 0 1 1 1 1 0 0 ...
..$ JS: num [1:15] 1 0 0 1 0 0 0 0 0 1 ...
..$ GI: num [1:15] 0 1 0 1 0 0 1 1 1 1 ...
..$ II: num [1:15] 0 1 1 0 1 0 1 0 0 0 ...
..$ MI: num [1:15] 0 0 1 0 0 0 0 0 1 0 ...
$ skills : chr [1:5, 1:2] "Retrieving explicit information " ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:5] "RI" "JS" "GI" "II" ...
.. ..$ : chr [1:2] "skill" "description"
Dataset data.cdm07:

This is a resimulated dataset from the social anxiety disorder data concerning social phobia which involve 13 dichotomous questions (Fang, Liu & Ling, 2017). The simulation was based on a latent class model with five classes. The dataset was also used in Chen, Li, Liu and Ying (2017).

$ data : num [1:863, 1:13] 1 0 1 1 1 1 1 1 1 1 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:13] "I1" "I2" "I3" "I4" ...
$ q.matrix: num [1:13, 1:3] 1 1 1 1 0 0 0 0 0 0 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:13] "I1" "I2" "I3" "I4" ...
.. ..$ : chr [1:3] "A1" "A2" "A3"
$ items : atomic [1:13] 1 speaking in front of other people? ...
..- attr(*, "stem")=chr "Have you ever had a strong fear or avoidance of ..."
Dataset data.cdm08:

This is a simulated dataset involving four skills and three misconceptions for the model for simultaneously identifying skills and misconceptions (SISM; Kuo, Chen & de la Torre, 2018). The Q-matrix follows the specification in their simulation study.

List of 2
$ data :'data.frame': 1300 obs. of 20 variables:
..$ I01: num [1:1300] 1 0 0 1 1 1 1 1 1 1 ...
..$ I02: num [1:1300] 0 0 0 0 1 1 1 1 1 1 ...
..$ I03: num [1:1300] 0 0 0 0 1 1 1 1 1 1 ...
..$ I04: num [1:1300] 1 1 0 1 0 1 1 0 1 1 ...
..$ I05: num [1:1300] 1 1 1 0 1 1 0 1 1 1 ...
..[...]
..$ I18: num [1:1300] 0 1 0 0 0 0 0 0 0 1 ...
..$ I19: num [1:1300] 1 1 0 0 0 0 0 1 1 1 ...
..$ I20: num [1:1300] 1 1 0 0 0 1 0 1 0 1 ...
$ q.matrix:'data.frame': 20 obs. of 7 variables:
..$ S1: num [1:20] 1 0 0 0 0 0 0 1 0 0 ...
..$ S2: num [1:20] 0 1 0 0 0 0 0 0 1 0 ...
..$ S3: num [1:20] 0 0 1 0 0 0 0 0 0 1 ...
..$ S4: num [1:20] 0 0 0 1 0 0 0 0 0 0 ...
..$ B1: num [1:20] 0 0 0 0 1 0 0 1 1 0 ...
..$ B2: num [1:20] 0 0 0 0 0 1 0 0 0 0 ...
..$ B3: num [1:20] 0 0 0 0 0 0 1 0 0 1 ...
Dataset data.cdm09: This is a simulated dataset involving polytomous skills which is adapted from the empirical example (proportional reasoning data) of Chen and de la Torre (2013).

List of 2
$ data : num [1:500, 1:15] 1 0 1 1 0 1 1 1 1 1 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:15] "I1" "I2" "I3" "I4" ...
$ q.matrix:'data.frame': 15 obs. of 4 variables:
..$ A1: int [1:15] 0 0 0 0 2 0 0 2 1 1 ...
..$ A2: int [1:15] 1 0 2 0 0 1 2 0 1 1 ...
..$ A3: int [1:15] 0 0 0 1 0 0 0 0 0 0 ...
..$ A4: int [1:15] 0 1 1 0 0 0 0 0 0 0 ...
Dataset data.cdm10: This is a simulated dataset involving a hierarchical skill structure. Skill A has four levels, skill B possesses two levels and skill C has three levels.

List of 2
$ data : num [1:1500, 1:15] 1 1 0 0 0 1 1 0 0 1 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:15] "I1" "I2" "I3" "I4" ...
$ q.matrix: num [1:15, 1:6] 1 1 1 1 1 1 0 0 0 0 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:15] "I1" "I2" "I3" "I4" ...
.. ..$ : chr [1:6] "A1" "A2" "A3" "B1" ...

References

Chen, H., & Chen, J. (2017). Cognitive diagnostic research on chinese students' English listening skills and implications on skill training. English Language Teaching, 10(12), 107-115. http://dx.doi.org/10.5539/elt.v10n12p107

Chen, J., & de la Torre, J. (2013). A general cognitive diagnosis model for expert-defined polytomous attributes. Applied Psychological Measurement, 37, 419-437. http://dx.doi.org/10.1177/0146621613479818

Chen, Y., Li, X., Liu, J., & Ying, Z. (2017). Regularized latent class analysis with application in cognitive diagnosis. Psychometrika, 82, 660-692. http://dx.doi.org/10.1007/s11336-016-9545-6

Chiu, C.-Y., Koehn, H.-F., & Wu, H.-M. (2016). Fitting the reduced RUM with Mplus: A tutorial. International Journal of Testing, 16(4), 331-351. http://dx.doi.org/10.1080/15305058.2016.1148038

Fang, G., Liu, J., & Ying, Z. (2017). On the identifiability of diagnostic classification models. arXiv, 1706.01240. https://arxiv.org/abs/1706.01240

Heller, J. and Wickelmaier, F. (2013). Minimum discrepancy estimation in probabilistic knowledge structures. Electronic Notes in Discrete Mathematics, 42, 49-56.
http://dx.doi.org/10.1016/j.endm.2013.05.145

Kuo, B.-C., Chen, C.-H., & de la Torre, J. (2018). A cognitive diagnosis model for identifying coexisting skills and misconceptions. Applied Psychological Measurement, 42(3), 179-191. http://dx.doi.org/10.1177/0146621617722791

Ma, W., & de la Torre, J. (2016). A sequential cognitive diagnosis model for polytomous responses. British Journal of Mathematical and Statistical Psychology, 69(3), 253-275.
https://doi.org/10.1111/bmsp.12070

Philipp, M., Strobl, C., de la Torre, J., & Zeileis, A. (2018). On the estimation of standard errors in cognitive diagnosis models. Journal of Educational and Behavioral Statistics, 43(1), 88-115. http://dx.doi.org/10.3102/1076998617719728

Examples

if (FALSE) {
#############################################################################
# EXAMPLE 1: Reduced RUM model, Chiu et al. (2016)
#############################################################################

data(data.cdm03, package="CDM")
dat <- data.cdm03$data
qmatrix <- data.cdm03$qmatrix

#*** Model 1: Reduced RUM
mod1 <- CDM::gdina( dat, q.matrix=qmatrix[,-1], rule="RRUM" )
summary(mod1)

#*** Model 2: Additive model with identity link function
mod2 <- CDM::gdina( dat, q.matrix=qmatrix[,-1], rule="ACDM" )
summary(mod2)

#*** Model 3: Additive model with logit link function
mod3 <- CDM::gdina( dat, q.matrix=qmatrix[,-1], rule="ACDM", linkfct="logit")
summary(mod3)

#############################################################################
# EXAMPLE 2: GDINA model - probability dataset from the pks package
#############################################################################

data(data.cdm05, package="CDM")
dat <- data.cdm05$data
Q <- data.cdm05$q.matrix

#* estimate model
mod1 <- CDM::gdina( dat, q.matrix=Q )
summary(mod1)
}