Several Regression Models with Prior Distributions and Sampling Weights

Several regression functions which allow for sampling weights and prior distributions.

The function yjt_regression performs a linear regression in which the response variable is transformed according to the Yeo-Johnson transformation (Yeo & Johnson, 2000; see yjt_dist) and the residuals are distributed following the scaled \(t\) distribution. The degrees of freedom of the \(t\) distribution can be fixed or estimated (est_df=TRUE). The function bct_regression has same functionality like the Yeo-Johnson transformation but employs a Box-Cox transformation of the outcome variable.

The Yeo-Johnson transformation can be extended by a probit transformation (probit=TRUE) to cover the case of bounded variables on \([0,1]\).

The function logistic_regression performs logistic regression for dichotomous data.

The function oprobit_regression performs ordinal probit regression for ordinal polytomous data.

#---- linear regression with Yeo-Johnson transformed scaled t distribution
yjt_regression(formula, data, weights=NULL, beta_init=NULL, beta_prior=NULL,
        df=Inf, lambda_fixed=NULL, probit=FALSE, est_df=FALSE, df_min=0.5, df_max=100,
        use_grad=2, h=1e-5, optimizer="optim", maxiter=300, control=NULL)

# S3 method for yjt_regression
coef(object, ...)
# S3 method for yjt_regression
logLik(object, ...)
# S3 method for yjt_regression
predict(object, newdata=NULL, trafo=TRUE,  ...)
# S3 method for yjt_regression
summary(object, digits=4, file=NULL, ...)
# S3 method for yjt_regression
vcov(object, ...)

#---- linear regression with Box-Cox transformed scaled t distribution
bct_regression(formula, data, weights=NULL, beta_init=NULL, beta_prior=NULL,
        df=Inf, lambda_fixed=NULL, est_df=FALSE, use_grad=2, h=1e-5,
        optimizer="optim", maxiter=300, control=NULL)

# S3 method for bct_regression
coef(object, ...)
# S3 method for bct_regression
logLik(object, ...)
# S3 method for bct_regression
predict(object, newdata=NULL, trafo=TRUE, ...)
# S3 method for bct_regression
summary(object, digits=4, file=NULL, ...)
# S3 method for bct_regression
vcov(object, ...)

#---- logistic regression
logistic_regression(formula, data, weights=NULL, beta_init=NULL,
         beta_prior=NULL, use_grad=2, h=1e-5, optimizer="optim", maxiter=300,
         control=NULL)

# S3 method for logistic_regression
coef(object, ...)
# S3 method for logistic_regression
logLik(object, ...)
# S3 method for logistic_regression
predict(object, newdata=NULL, ...)
# S3 method for logistic_regression
summary(object, digits=4, file=NULL, ...)
# S3 method for logistic_regression
vcov(object, ...)

#---- ordinal probit regression
oprobit_regression(formula, data, weights=NULL, beta_init=NULL,
        use_grad=2, h=1e-5, optimizer="optim", maxiter=300,
        control=NULL, control_optim_fct=NULL)

# S3 method for oprobit_regression
coef(object, ...)
# S3 method for oprobit_regression
logLik(object, ...)
# S3 method for oprobit_regression
predict(object, newdata=NULL, ...)
# S3 method for oprobit_regression
summary(object, digits=4, file=NULL, ...)
# S3 method for oprobit_regression
vcov(object, ...)

Arguments

formula: Formula
data: Data frame. The dependent variable must be coded as 0 and 1.
weights: Optional vector of sampling weights
beta_init: Optional vector of initial regression coefficients
beta_prior: Optional list containing priors of all parameters (see Examples for definition of this list).
df: Fixed degrees of freedom for scaled \(t\) distribution
lambda_fixed: Optional fixed value for \(\lambda\) for scaled \(t\) distribution with Yeo-Johnson transformation
probit: Logical whether probit transformation should be employed for bounded outcome in yjt_regression
est_df: Logical indicating whether degrees of freedom in \(t\) distribution should be estimated.
df_min: Minimum value for estimated degrees of freedom
df_max: Maximum value for estimated degrees of freedom
use_grad: Computation method for gradients in stats::optim. The value 0 is the internal approximation of stats::optim and applies the settings in mdmb (\(\le\)0.3). The specification use_grad=1 uses the calculation of the gradient in CDM::numerical_Hessian. The value 2 is usually the most efficient calculation of the gradient.
h: Numerical differentiation parameter.
optimizer: Type of optimizer to be chosen. Options are "nlminb" (stats::nlminb) and the default "optim" (stats::optim)
maxiter: Maximum number of iterations
control: Optional arguments to be passed to optimization function (stats::nlminb) or stats::optim
control_optim_fct: Optional control argument for gradient in optimization
object: Object of class logistic_regression
newdata: Design matrix for predict function
trafo: Logical indicating whether fitted values should be on the transformed metric (trafo=TRUE) or the original metric (trafo=FALSE)
digits: Number of digits for rounding
file: File name if the summary output should be sunk into a file.
...: Further arguments to be passed.

Value

List containing values

coefficients: Estimated regression coefficients
vcov: Estimated covariance matrix
partable: Parameter table
y: Vector of values of dependent variable
Xdes: Design matrix
weights: Sampling weights
fitted.values: Fitted values in metric of probabilities
linear.predictor: Fitted values in metric of logits
loglike: Log likelihood value
logprior: Log prior value
logpost: Log posterior value
deviance: Deviance
loglike_case: Case-wise likelihood
ic: Information criteria
R2: Pseudo R-square value according to McKelvey and Zavoina

References

McKelvey, R., & Zavoina, W. (1975). A statistical model for the analysis of ordinal level dependent variables. Journal of Mathematical Sociology, 4(1), 103-120. doi:10.1080/0022250X.1975.9989847

Yeo, I.-K., & Johnson, R. (2000). A new family of power transformations to improve normality or symmetry. Biometrika, 87(4), 954-959. doi:10.1093/biomet/87.4.954

Author

Alexander Robitzsch

Examples

#############################################################################
# EXAMPLE 1: Simulated example logistic regression
#############################################################################

#--- simulate dataset
set.seed(986)
N <- 500
x <- stats::rnorm(N)
y <- 1*( stats::runif(N) < stats::plogis( -0.8 + 1.2 * x ) )
data <- data.frame( x=x, y=y )

#--- estimate logistic regression with mdmb::logistic_regression
mod1 <- mdmb::logistic_regression( y ~ x, data=data )
summary(mod1)

if (FALSE) {
#--- estimate logistic regression with stats::glm
mod1b <- stats::glm( y ~ x, data=data, family="binomial")
summary(mod1b)

#--- estimate logistic regression with prior distributions
b0 <- list( "dnorm", list(mean=0, sd=100) )  # first parameter
b1 <- list( "dcauchy", list(location=0, scale=2.5) )   # second parameter
beta_priors <- list( b0, b1 )  # order in list defines priors for parameters
#* estimation
mod2 <- mdmb::logistic_regression( y ~ x, data=data, beta_prior=beta_priors )
summary(mod2)

#############################################################################
# EXAMPLE 2: Yeo-Johnson transformed scaled t regression
#############################################################################

#*** create simulated data
set.seed(9865)
n <- 1000
x <- stats::rnorm(n)
y <- .5 + 1*x + .7*stats::rt(n, df=8 )
y <- mdmb::yj_antitrafo( y, lambda=.5 )
dat <- data.frame( y=y, x=x )
# display data
graphics::hist(y)

#--- Model 1: fit regression model with transformed normal distribution (df=Inf)
mod1 <- mdmb::yjt_regression( y ~ x, data=dat )
summary(mod1)

#--- Model 2: fit regression model with transformed scaled t distribution (df=10)
mod2 <- mdmb::yjt_regression( y ~ x, data=dat, df=10)
summary(mod2)

#--- Model 3: fit regression model with transformed normal distribution (df=Inf)
#             and fixed transformation parameter lambda of .5
mod3 <- mdmb::yjt_regression( y ~ x, data=dat, lambda_fixed=.5)
summary(mod3)

#--- Model 4: fit regression model with transformed normal distribution (df=Inf)
#             and fixed transformation parameter lambda of 1
#             -> This model corresponds to least squares regression
mod4 <- mdmb::yjt_regression( y ~ x, data=dat, lambda_fixed=1)
summary(mod4)

# fit with lm function
mod4b <- stats::lm( y ~ x, data=dat )
summary(mod4b)

#--- Model 5: fit regression model with estimated degrees of freedom
mod5 <- mdmb::yjt_regression( y ~ x, data=dat, est_df=TRUE)
summary(mod5)

#** compare log-likelihood values
logLik(mod1)
logLik(mod2)
logLik(mod3)
logLik(mod4)
logLik(mod4b)
logLik(mod5)

#############################################################################
# EXAMPLE 3: Regression with Box-Cox and Yeo-Johnson transformations
#############################################################################

#*** simulate data
set.seed(985)
n <- 1000
x <- stats::rnorm(n)
y <- .5 + 1*x + stats::rnorm(n, sd=.7 )
y <- mdmb::bc_antitrafo( y, lambda=.5 )
dat <- data.frame( y=y, x=x )

#--- Model 1: fit regression model with Box-Cox transformation
mod1 <- mdmb::bct_regression( y ~ x, data=dat )
summary(mod1)
#--- Model 2: fit regression model with Yeo-Johnson transformation
mod2 <- mdmb::yjt_regression( y ~ x, data=dat )
summary(mod2)
#--- compare fit
logLik(mod1)
logLik(mod2)

#############################################################################
# EXAMPLE 4: Ordinal probit regression
#############################################################################

#--- simulate data
set.seed(987)
N <- 1500
x <- stats::rnorm(N)
z <- stats::rnorm(N)
# regression coefficients
b0 <- -.5 ; b1 <- .6 ; b2 <- .1
# vector of thresholds
thresh <- c(-1, -.3, 1)
yast <- b0 + b1 * x + b2*z + stats::rnorm(N)
y <- as.numeric( cut( yast, c(-Inf,thresh,Inf) ) ) - 1
dat <- data.frame( x=x, y=y, z=z )

#--- probit regression
mod <- mdmb::oprobit_regression( formula=y ~ x + z + I(x*z), data=dat)
summary(mod)
}