Hierarchical Rater Model Based on Signal Detection Theory (HRM-SDT)

This function estimates a version of the hierarchical rater model (HRM) based on signal detection theory (HRM-SDT; DeCarlo, 2005; DeCarlo, Kim & Johnson, 2011; Robitzsch & Steinfeld, 2018). The model is estimated by means of an EM algorithm adapted from multilevel latent class analysis (Vermunt, 2008).

Usage

rm.sdt(dat, pid, rater, Qmatrix=NULL, theta.k=seq(-9, 9, len=30),
    est.a.item=FALSE, est.c.rater="n", est.d.rater="n", est.mean=FALSE, est.sigma=TRUE,
    skillspace="normal", tau.item.fixed=NULL, a.item.fixed=NULL,
    d.min=0.5, d.max=100, d.start=3, c.start=NULL, tau.start=NULL, sd.start=1,
    d.prior=c(3,100), c.prior=c(3,100), tau.prior=c(0,1000), a.prior=c(1,100),
    link_item="GPCM", max.increment=1, numdiff.parm=0.00001, maxdevchange=0.1,
    globconv=.001, maxiter=1000, msteps=4, mstepconv=0.001, optimizer="nlminb" )

# S3 method for rm.sdt
summary(object, file=NULL, ...)

# S3 method for rm.sdt
plot(x, ask=TRUE, ...)

# S3 method for rm.sdt
anova(object,...)

# S3 method for rm.sdt
logLik(object,...)

# S3 method for rm.sdt
IRT.factor.scores(object, type="EAP", ...)

# S3 method for rm.sdt
IRT.irfprob(object,...)

# S3 method for rm.sdt
IRT.likelihood(object,...)

# S3 method for rm.sdt
IRT.posterior(object,...)

# S3 method for rm.sdt
IRT.modelfit(object,...)

# S3 method for IRT.modelfit.rm.sdt
summary(object,...)

Arguments

dat: Original data frame. Ratings on variables must be in rows, i.e. every row corresponds to a person-rater combination.
pid: Person identifier.
rater: Rater identifier.
Qmatrix: An optional Q-matrix. If this matrix is not provided, then by default the ordinary scoring of categories (from 0 to the maximum score of $K$) is used.
theta.k: A grid of theta values for the ability distribution.
est.a.item: Should item parameters $a_i$ be estimated?
est.c.rater: Type of estimation for item-rater parameters $c_{ir}$ in the signal detection model. Options are 'n' (no estimation), 'e' (set all parameters equal to each other), 'i' (itemwise estimation), 'r' (rater wise estimation) and 'a' (all parameters are estimated independently from each other).
est.d.rater: Type of estimation of $d$ parameters. Options are the same as in est.c.rater.
est.mean: Optional logical indicating whether the mean of the trait distribution should be estimated.
est.sigma: Optional logical indicating whether the standard deviation of the trait distribution should be estimated.
skillspace: Specified $\theta$ distribution type. It can be "normal" or "discrete". In the latter case, all probabilities of the distribution are separately estimated.
tau.item.fixed: Optional matrix with three columns specifying fixed $\tau$ parameters. The first two columns denote item and category indices, the third the fixed value. See Example 3.
a.item.fixed: Optional matrix with two columns specifying fixed $a$ parameters. First column: Item index. Second column: Fixed $a$ parameter.
d.min: Minimal $d$ parameter to be estimated
d.max: Maximal $d$ parameter to be estimated
d.start: Starting value(s) of $d$ parameters
c.start: Starting values of $c$ parameters
tau.start: Starting values of $\tau$ parameters
sd.start: Starting value for trait standard deviation
d.prior: Normal prior $N(M,S^2)$ for $d$ parameters
c.prior: Normal prior for $c$ parameters. The prior for parameter $c_{irk}$ is defined as $M \cdot ( k - 0.5) $ where $M$ is c.prior[1].
tau.prior: Normal prior for $\tau$ parameters
a.prior: Normal prior for $a$ parameters
link_item: Type of item response function for latent responses. Can be "GPCM" for the generalized partial credit model or "GRM" for the graded response model.
max.increment: Maximum increment of item parameters during estimation
numdiff.parm: Numerical differentiation step width
maxdevchange: Maximum relative deviance change as a convergence criterion
globconv: Maximum parameter change
maxiter: Maximum number of iterations
msteps: Maximum number of iterations during an M step
mstepconv: Convergence criterion in an M step
optimizer: Choice of optimization function in M-step for item parameters. Options are "nlminb" for stats::nlminb and "optim" for stats::optim.
object: Object of class rm.sdt
file: Optional file name in which summary should be written.
x: Object of class rm.sdt
ask: Optional logical indicating whether a new plot should be asked for.
type: Factor score estimation method. Up to now, only type="EAP" is supported.
...: Further arguments to be passed

Details

The specification of the model follows DeCarlo et al. (2011). The second level models the ideal rating (latent response) $\eta=0, ...,K$ of person $p$ on item $i$. The option link_item='GPCM' follows the generalized partial credit model $$ P( \eta_{pi}=\eta | \theta_p ) \propto exp( a_{i} q_{i \eta } \theta_p - \tau_{i \eta } ) $$. The option link_item='GRM' employs the graded response model $$ P( \eta_{pi}=\eta | \theta_p )= \Psi( \tau_{i,\eta + 1} - a_i \theta_p ) - \Psi( \tau_{i,\eta} - a_i \theta_p ) $$

At the first level, the ratings $X_{pir}$ for person $p$ on item $i$ and rater $r$ are modeled as a signal detection model $$ P( X_{pir} \le k | \eta_{pi} )= G( c_{irk} - d_{ir} \eta_{pi} )$$ where $G$ is the logistic distribution function and the categories are $k=1,\ldots, K+1$. Note that the item response model can be equivalently written as $$ P( X_{pir} \ge k | \eta_{pi} )= G( d_{ir} \eta_{pi} - c_{irk})$$

The thresholds $c_{irk}$ can be further restricted to $c_{irk}=c_{k}$ (est.c.rater='e'), $c_{irk}=c_{ik}$ (est.c.rater='i') or $c_{irk}=c_{ir}$ (est.c.rater='r'). The same holds for rater precision parameters $d_{ir}$.

Value

A list with following entries:

deviance: Deviance
ic: Information criteria and number of parameters
item: Data frame with item parameters. The columns N and M denote the number of observed ratings and the observed mean of all ratings, respectively.
In addition to item parameters $\tau_{ik}$ and $a_i$, the mean for the latent response (latM) is computed as $E( \eta_i )=\sum_p P( \theta_p ) q_{ik} P( \eta_i=k | \theta_p ) $ which provides an item parameter at the original metric of ratings. The latent standard deviation (latSD) is computed in the same manner.
rater: Data frame with rater parameters. Transformed $c$ parameters (c_x.trans) are computed as $c_{irk} / ( d_{ir} )$.
person: Data frame with person parameters: EAP and corresponding standard errors
EAP.rel: EAP reliability
EAP.rel: EAP reliability
mu: Mean of the trait distribution
sigma: Standard deviation of the trait distribution
tau.item: Item parameters $\tau_{ik}$
se.tau.item: Standard error of item parameters $\tau_{ik}$
a.item: Item slopes $a_i$
se.a.item: Standard error of item slopes $a_i$
c.rater: Rater parameters $c_{irk}$
se.c.rater: Standard error of rater severity parameter $c_{irk}$
d.rater: Rater slope parameter $d_{ir}$
se.d.rater: Standard error of rater slope parameter $d_{ir}$
f.yi.qk: Individual likelihood
f.qk.yi: Individual posterior distribution
probs: Item probabilities at grid theta.k. Note that these probabilities are calculated on the pseudo items $i \times r$, i.e. the interaction of item and rater.
prob.item: Probabilities $P( \eta_i=\eta | \theta )$ of latent item responses evaluated at theta grid $\theta_p$.
n.ik: Expected counts
pi.k: Estimated trait distribution $P(\theta_p)$.
maxK: Maximum number of categories
procdata: Processed data
iter: Number of iterations
...: Further values

References

DeCarlo, L. T. (2005). A model of rater behavior in essay grading based on signal detection theory. Journal of Educational Measurement, 42, 53-76.

DeCarlo, L. T. (2010). Studies of a latent-class signal-detection model for constructed response scoring II: Incomplete and hierarchical designs. ETS Research Report ETS RR-10-08. Princeton NJ: ETS.

DeCarlo, T., Kim, Y., & Johnson, M. S. (2011). A hierarchical rater model for constructed responses, with a signal detection rater model. Journal of Educational Measurement, 48, 333-356.

Robitzsch, A., & Steinfeld, J. (2018). Item response models for human ratings: Overview, estimation methods, and implementation in R. Psychological Test and Assessment Modeling, 60(1), 101-139.

Vermunt, J. K. (2008). Latent class and finite mixture models for multilevel data sets. Statistical Methods in Medical Research, 17, 33-51.

Examples

#############################################################################
# EXAMPLE 1: Hierarchical rater model (HRM-SDT) data.ratings1
#############################################################################
data(data.ratings1)
dat <- data.ratings1

if (FALSE) {
# Model 1: Partial Credit Model: no rater effects
mod1 <- sirt::rm.sdt( dat[, paste0( "k",1:5) ], rater=dat$rater,
            pid=dat$idstud, est.c.rater="n", d.start=100,  est.d.rater="n" )
summary(mod1)

# Model 2: Generalized Partial Credit Model: no rater effects
mod2 <- sirt::rm.sdt( dat[, paste0( "k",1:5) ], rater=dat$rater,
            pid=dat$idstud, est.c.rater="n", est.d.rater="n",
            est.a.item=TRUE, d.start=100)
summary(mod2)

# Model 3: Equal effects in SDT
mod3 <- sirt::rm.sdt( dat[, paste0( "k",1:5) ], rater=dat$rater,
            pid=dat$idstud, est.c.rater="e", est.d.rater="e")
summary(mod3)

# Model 4: Rater effects in SDT
mod4 <- sirt::rm.sdt( dat[, paste0( "k",1:5) ], rater=dat$rater,
            pid=dat$idstud, est.c.rater="r", est.d.rater="r")
summary(mod4)

#############################################################################
# EXAMPLE 2: HRM-SDT data.ratings3
#############################################################################

data(data.ratings3)
dat <- data.ratings3
dat <- dat[ dat$rater < 814, ]
psych::describe(dat)

# Model 1: item- and rater-specific effects
mod1 <- sirt::rm.sdt( dat[, paste0( "crit",c(2:4)) ], rater=dat$rater,
            pid=dat$idstud, est.c.rater="a", est.d.rater="a" )
summary(mod1)
plot(mod1)

# Model 2: Differing number of categories per variable
mod2 <- sirt::rm.sdt( dat[, paste0( "crit",c(2:4,6)) ], rater=dat$rater,
            pid=dat$idstud, est.c.rater="a", est.d.rater="a")
summary(mod2)
plot(mod2)

#############################################################################
# EXAMPLE 3: Hierarchical rater model with discrete skill spaces
#############################################################################

data(data.ratings3)
dat <- data.ratings3
dat <- dat[ dat$rater < 814, ]
psych::describe(dat)

# Model 1: Discrete theta skill space with values of 0,1,2 and 3
mod1 <- sirt::rm.sdt( dat[, paste0( "crit",c(2:4)) ], theta.k=0:3, rater=dat$rater,
            pid=dat$idstud, est.c.rater="a", est.d.rater="a", skillspace="discrete" )
summary(mod1)
plot(mod1)

# Model 2: Modelling of one item by using a discrete skill space and
#          fixed item parameters

# fixed tau and a parameters
tau.item.fixed <- cbind( 1, 1:3,  100*cumsum( c( 0.5, 1.5, 2.5)) )
a.item.fixed <- cbind( 1, 100 )
# fit HRM-SDT
mod2 <- sirt::rm.sdt( dat[, "crit2", drop=FALSE], theta.k=0:3, rater=dat$rater,
            tau.item.fixed=tau.item.fixed,a.item.fixed=a.item.fixed, pid=dat$idstud,
            est.c.rater="a", est.d.rater="a", skillspace="discrete" )
summary(mod2)
plot(mod2)
}