Several regression functions which allow for sampling weights and prior distributions.

The function yjt_regression performs a linear regression in which the response variable is transformed according to the Yeo-Johnson transformation (Yeo & Johnson, 2000; see yjt_dist) and the residuals are distributed following the scaled \(t\) distribution. The degrees of freedom of the \(t\) distribution can be fixed or estimated (est_df=TRUE). The function bct_regression has same functionality like the Yeo-Johnson transformation but employs a Box-Cox transformation of the outcome variable.

The Yeo-Johnson transformation can be extended by a probit transformation (probit=TRUE) to cover the case of bounded variables on \([0,1]\).

The function logistic_regression performs logistic regression for dichotomous data.

The function oprobit_regression performs ordinal probit regression for ordinal polytomous data.

#---- linear regression with Yeo-Johnson transformed scaled t distribution
yjt_regression(formula, data, weights=NULL, beta_init=NULL, beta_prior=NULL,
        df=Inf, lambda_fixed=NULL, probit=FALSE, est_df=FALSE, df_min=0.5, df_max=100,
        use_grad=2, h=1e-5, optimizer="optim", maxiter=300, control=NULL)

# S3 method for yjt_regression
coef(object, ...)
# S3 method for yjt_regression
logLik(object, ...)
# S3 method for yjt_regression
predict(object, newdata=NULL, trafo=TRUE,  ...)
# S3 method for yjt_regression
summary(object, digits=4, file=NULL, ...)
# S3 method for yjt_regression
vcov(object, ...)

#---- linear regression with Box-Cox transformed scaled t distribution
bct_regression(formula, data, weights=NULL, beta_init=NULL, beta_prior=NULL,
        df=Inf, lambda_fixed=NULL, est_df=FALSE, use_grad=2, h=1e-5,
        optimizer="optim", maxiter=300, control=NULL)

# S3 method for bct_regression
coef(object, ...)
# S3 method for bct_regression
logLik(object, ...)
# S3 method for bct_regression
predict(object, newdata=NULL, trafo=TRUE, ...)
# S3 method for bct_regression
summary(object, digits=4, file=NULL, ...)
# S3 method for bct_regression
vcov(object, ...)

#---- logistic regression
logistic_regression(formula, data, weights=NULL, beta_init=NULL,
         beta_prior=NULL, use_grad=2, h=1e-5, optimizer="optim", maxiter=300,
         control=NULL)

# S3 method for logistic_regression
coef(object, ...)
# S3 method for logistic_regression
logLik(object, ...)
# S3 method for logistic_regression
predict(object, newdata=NULL, ...)
# S3 method for logistic_regression
summary(object, digits=4, file=NULL, ...)
# S3 method for logistic_regression
vcov(object, ...)

#---- ordinal probit regression
oprobit_regression(formula, data, weights=NULL, beta_init=NULL,
        use_grad=2, h=1e-5, optimizer="optim", maxiter=300,
        control=NULL, control_optim_fct=NULL)

# S3 method for oprobit_regression
coef(object, ...)
# S3 method for oprobit_regression
logLik(object, ...)
# S3 method for oprobit_regression
predict(object, newdata=NULL, ...)
# S3 method for oprobit_regression
summary(object, digits=4, file=NULL, ...)
# S3 method for oprobit_regression
vcov(object, ...)

Arguments

formula

Formula

data

Data frame. The dependent variable must be coded as 0 and 1.

weights

Optional vector of sampling weights

beta_init

Optional vector of initial regression coefficients

beta_prior

Optional list containing priors of all parameters (see Examples for definition of this list).

df

Fixed degrees of freedom for scaled \(t\) distribution

lambda_fixed

Optional fixed value for \(\lambda\) for scaled \(t\) distribution with Yeo-Johnson transformation

probit

Logical whether probit transformation should be employed for bounded outcome in yjt_regression

est_df

Logical indicating whether degrees of freedom in \(t\) distribution should be estimated.

df_min

Minimum value for estimated degrees of freedom

df_max

Maximum value for estimated degrees of freedom

use_grad

Computation method for gradients in stats::optim. The value 0 is the internal approximation of stats::optim and applies the settings in mdmb (\(\le\)0.3). The specification use_grad=1 uses the calculation of the gradient in CDM::numerical_Hessian. The value 2 is usually the most efficient calculation of the gradient.

h

Numerical differentiation parameter.

optimizer

Type of optimizer to be chosen. Options are "nlminb" (stats::nlminb) and the default "optim" (stats::optim)

maxiter

Maximum number of iterations

control

Optional arguments to be passed to optimization function (stats::nlminb) or stats::optim

control_optim_fct

Optional control argument for gradient in optimization

object

Object of class logistic_regression

newdata

Design matrix for predict function

trafo

Logical indicating whether fitted values should be on the transformed metric (trafo=TRUE) or the original metric (trafo=FALSE)

digits

Number of digits for rounding

file

File name if the summary output should be sunk into a file.

...

Further arguments to be passed.

Value

List containing values

coefficients

Estimated regression coefficients

vcov

Estimated covariance matrix

partable

Parameter table

y

Vector of values of dependent variable

Xdes

Design matrix

weights

Sampling weights

fitted.values

Fitted values in metric of probabilities

linear.predictor

Fitted values in metric of logits

loglike

Log likelihood value

logprior

Log prior value

logpost

Log posterior value

deviance

Deviance

loglike_case

Case-wise likelihood

ic

Information criteria

R2

Pseudo R-square value according to McKelvey and Zavoina

References

McKelvey, R., & Zavoina, W. (1975). A statistical model for the analysis of ordinal level dependent variables. Journal of Mathematical Sociology, 4(1), 103-120. doi:10.1080/0022250X.1975.9989847

Yeo, I.-K., & Johnson, R. (2000). A new family of power transformations to improve normality or symmetry. Biometrika, 87(4), 954-959. doi:10.1093/biomet/87.4.954

Author

Alexander Robitzsch

See also

See yjt_dist or car::yjPower for functions for the Yeo-Johnson transformation.

See stats::lm and stats::glm for linear and logistic regression models.

Examples

#############################################################################
# EXAMPLE 1: Simulated example logistic regression
#############################################################################

#--- simulate dataset
set.seed(986)
N <- 500
x <- stats::rnorm(N)
y <- 1*( stats::runif(N) < stats::plogis( -0.8 + 1.2 * x ) )
data <- data.frame( x=x, y=y )

#--- estimate logistic regression with mdmb::logistic_regression
mod1 <- mdmb::logistic_regression( y ~ x, data=data )
summary(mod1)

if (FALSE) {
#--- estimate logistic regression with stats::glm
mod1b <- stats::glm( y ~ x, data=data, family="binomial")
summary(mod1b)

#--- estimate logistic regression with prior distributions
b0 <- list( "dnorm", list(mean=0, sd=100) )  # first parameter
b1 <- list( "dcauchy", list(location=0, scale=2.5) )   # second parameter
beta_priors <- list( b0, b1 )  # order in list defines priors for parameters
#* estimation
mod2 <- mdmb::logistic_regression( y ~ x, data=data, beta_prior=beta_priors )
summary(mod2)

#############################################################################
# EXAMPLE 2: Yeo-Johnson transformed scaled t regression
#############################################################################

#*** create simulated data
set.seed(9865)
n <- 1000
x <- stats::rnorm(n)
y <- .5 + 1*x + .7*stats::rt(n, df=8 )
y <- mdmb::yj_antitrafo( y, lambda=.5 )
dat <- data.frame( y=y, x=x )
# display data
graphics::hist(y)

#--- Model 1: fit regression model with transformed normal distribution (df=Inf)
mod1 <- mdmb::yjt_regression( y ~ x, data=dat )
summary(mod1)

#--- Model 2: fit regression model with transformed scaled t distribution (df=10)
mod2 <- mdmb::yjt_regression( y ~ x, data=dat, df=10)
summary(mod2)

#--- Model 3: fit regression model with transformed normal distribution (df=Inf)
#             and fixed transformation parameter lambda of .5
mod3 <- mdmb::yjt_regression( y ~ x, data=dat, lambda_fixed=.5)
summary(mod3)

#--- Model 4: fit regression model with transformed normal distribution (df=Inf)
#             and fixed transformation parameter lambda of 1
#             -> This model corresponds to least squares regression
mod4 <- mdmb::yjt_regression( y ~ x, data=dat, lambda_fixed=1)
summary(mod4)

# fit with lm function
mod4b <- stats::lm( y ~ x, data=dat )
summary(mod4b)

#--- Model 5: fit regression model with estimated degrees of freedom
mod5 <- mdmb::yjt_regression( y ~ x, data=dat, est_df=TRUE)
summary(mod5)

#** compare log-likelihood values
logLik(mod1)
logLik(mod2)
logLik(mod3)
logLik(mod4)
logLik(mod4b)
logLik(mod5)

#############################################################################
# EXAMPLE 3: Regression with Box-Cox and Yeo-Johnson transformations
#############################################################################

#*** simulate data
set.seed(985)
n <- 1000
x <- stats::rnorm(n)
y <- .5 + 1*x + stats::rnorm(n, sd=.7 )
y <- mdmb::bc_antitrafo( y, lambda=.5 )
dat <- data.frame( y=y, x=x )

#--- Model 1: fit regression model with Box-Cox transformation
mod1 <- mdmb::bct_regression( y ~ x, data=dat )
summary(mod1)
#--- Model 2: fit regression model with Yeo-Johnson transformation
mod2 <- mdmb::yjt_regression( y ~ x, data=dat )
summary(mod2)
#--- compare fit
logLik(mod1)
logLik(mod2)

#############################################################################
# EXAMPLE 4: Ordinal probit regression
#############################################################################

#--- simulate data
set.seed(987)
N <- 1500
x <- stats::rnorm(N)
z <- stats::rnorm(N)
# regression coefficients
b0 <- -.5 ; b1 <- .6 ; b2 <- .1
# vector of thresholds
thresh <- c(-1, -.3, 1)
yast <- b0 + b1 * x + b2*z + stats::rnorm(N)
y <- as.numeric( cut( yast, c(-Inf,thresh,Inf) ) ) - 1
dat <- data.frame( x=x, y=y, z=z )

#--- probit regression
mod <- mdmb::oprobit_regression( formula=y ~ x + z + I(x*z), data=dat)
summary(mod)
}