mdmb_regression.Rd
Several regression functions which allow for sampling weights and prior distributions.
The function yjt_regression
performs a linear regression in which the
response variable is transformed according to the Yeo-Johnson transformation
(Yeo & Johnson, 2000; see yjt_dist
) and the residuals are
distributed following the scaled \(t\) distribution. The degrees of freedom
of the \(t\) distribution can be fixed or estimated (est_df=TRUE
).
The function bct_regression
has same functionality like the
Yeo-Johnson transformation but employs a Box-Cox transformation
of the outcome variable.
The Yeo-Johnson transformation can be extended by a probit transformation
(probit=TRUE
) to cover the case of bounded variables on \([0,1]\).
The function logistic_regression
performs logistic regression
for dichotomous data.
The function oprobit_regression
performs ordinal probit regression
for ordinal polytomous data.
#---- linear regression with Yeo-Johnson transformed scaled t distribution
yjt_regression(formula, data, weights=NULL, beta_init=NULL, beta_prior=NULL,
df=Inf, lambda_fixed=NULL, probit=FALSE, est_df=FALSE, df_min=0.5, df_max=100,
use_grad=2, h=1e-5, optimizer="optim", maxiter=300, control=NULL)
# S3 method for yjt_regression
coef(object, ...)
# S3 method for yjt_regression
logLik(object, ...)
# S3 method for yjt_regression
predict(object, newdata=NULL, trafo=TRUE, ...)
# S3 method for yjt_regression
summary(object, digits=4, file=NULL, ...)
# S3 method for yjt_regression
vcov(object, ...)
#---- linear regression with Box-Cox transformed scaled t distribution
bct_regression(formula, data, weights=NULL, beta_init=NULL, beta_prior=NULL,
df=Inf, lambda_fixed=NULL, est_df=FALSE, use_grad=2, h=1e-5,
optimizer="optim", maxiter=300, control=NULL)
# S3 method for bct_regression
coef(object, ...)
# S3 method for bct_regression
logLik(object, ...)
# S3 method for bct_regression
predict(object, newdata=NULL, trafo=TRUE, ...)
# S3 method for bct_regression
summary(object, digits=4, file=NULL, ...)
# S3 method for bct_regression
vcov(object, ...)
#---- logistic regression
logistic_regression(formula, data, weights=NULL, beta_init=NULL,
beta_prior=NULL, use_grad=2, h=1e-5, optimizer="optim", maxiter=300,
control=NULL)
# S3 method for logistic_regression
coef(object, ...)
# S3 method for logistic_regression
logLik(object, ...)
# S3 method for logistic_regression
predict(object, newdata=NULL, ...)
# S3 method for logistic_regression
summary(object, digits=4, file=NULL, ...)
# S3 method for logistic_regression
vcov(object, ...)
#---- ordinal probit regression
oprobit_regression(formula, data, weights=NULL, beta_init=NULL,
use_grad=2, h=1e-5, optimizer="optim", maxiter=300,
control=NULL, control_optim_fct=NULL)
# S3 method for oprobit_regression
coef(object, ...)
# S3 method for oprobit_regression
logLik(object, ...)
# S3 method for oprobit_regression
predict(object, newdata=NULL, ...)
# S3 method for oprobit_regression
summary(object, digits=4, file=NULL, ...)
# S3 method for oprobit_regression
vcov(object, ...)
Formula
Data frame. The dependent variable must be coded as 0 and 1.
Optional vector of sampling weights
Optional vector of initial regression coefficients
Optional list containing priors of all parameters (see Examples for definition of this list).
Fixed degrees of freedom for scaled \(t\) distribution
Optional fixed value for \(\lambda\) for scaled \(t\) distribution with Yeo-Johnson transformation
Logical whether probit transformation should be employed for
bounded outcome in yjt_regression
Logical indicating whether degrees of freedom in \(t\) distribution should be estimated.
Minimum value for estimated degrees of freedom
Maximum value for estimated degrees of freedom
Computation method for gradients in stats::optim
.
The value 0
is the internal approximation of
stats::optim
and applies the settings in mdmb (\(\le\)0.3).
The specification use_grad=1
uses the calculation of
the gradient in CDM::numerical_Hessian
. The value 2
is
usually the most efficient calculation of the gradient.
Numerical differentiation parameter.
Type of optimizer to be chosen. Options are
"nlminb"
(stats::nlminb
) and
the default "optim"
(stats::optim
)
Maximum number of iterations
Optional arguments to be passed to optimization function
(stats::nlminb
) or
stats::optim
Optional control argument for gradient in optimization
Object of class logistic_regression
Design matrix for predict
function
Logical indicating whether fitted values should be on the
transformed metric (trafo=TRUE
) or the original metric
(trafo=FALSE
)
Number of digits for rounding
File name if the summary
output should be sunk into a file.
Further arguments to be passed.
List containing values
Estimated regression coefficients
Estimated covariance matrix
Parameter table
Vector of values of dependent variable
Design matrix
Sampling weights
Fitted values in metric of probabilities
Fitted values in metric of logits
Log likelihood value
Log prior value
Log posterior value
Deviance
Case-wise likelihood
Information criteria
Pseudo R-square value according to McKelvey and Zavoina
McKelvey, R., & Zavoina, W. (1975). A statistical model for the analysis of ordinal level dependent variables. Journal of Mathematical Sociology, 4(1), 103-120. doi:10.1080/0022250X.1975.9989847
Yeo, I.-K., & Johnson, R. (2000). A new family of power transformations to improve normality or symmetry. Biometrika, 87(4), 954-959. doi:10.1093/biomet/87.4.954
See yjt_dist
or car::yjPower
for functions for the Yeo-Johnson transformation.
See stats::lm
and
stats::glm
for linear and logistic
regression models.
#############################################################################
# EXAMPLE 1: Simulated example logistic regression
#############################################################################
#--- simulate dataset
set.seed(986)
N <- 500
x <- stats::rnorm(N)
y <- 1*( stats::runif(N) < stats::plogis( -0.8 + 1.2 * x ) )
data <- data.frame( x=x, y=y )
#--- estimate logistic regression with mdmb::logistic_regression
mod1 <- mdmb::logistic_regression( y ~ x, data=data )
summary(mod1)
if (FALSE) {
#--- estimate logistic regression with stats::glm
mod1b <- stats::glm( y ~ x, data=data, family="binomial")
summary(mod1b)
#--- estimate logistic regression with prior distributions
b0 <- list( "dnorm", list(mean=0, sd=100) ) # first parameter
b1 <- list( "dcauchy", list(location=0, scale=2.5) ) # second parameter
beta_priors <- list( b0, b1 ) # order in list defines priors for parameters
#* estimation
mod2 <- mdmb::logistic_regression( y ~ x, data=data, beta_prior=beta_priors )
summary(mod2)
#############################################################################
# EXAMPLE 2: Yeo-Johnson transformed scaled t regression
#############################################################################
#*** create simulated data
set.seed(9865)
n <- 1000
x <- stats::rnorm(n)
y <- .5 + 1*x + .7*stats::rt(n, df=8 )
y <- mdmb::yj_antitrafo( y, lambda=.5 )
dat <- data.frame( y=y, x=x )
# display data
graphics::hist(y)
#--- Model 1: fit regression model with transformed normal distribution (df=Inf)
mod1 <- mdmb::yjt_regression( y ~ x, data=dat )
summary(mod1)
#--- Model 2: fit regression model with transformed scaled t distribution (df=10)
mod2 <- mdmb::yjt_regression( y ~ x, data=dat, df=10)
summary(mod2)
#--- Model 3: fit regression model with transformed normal distribution (df=Inf)
# and fixed transformation parameter lambda of .5
mod3 <- mdmb::yjt_regression( y ~ x, data=dat, lambda_fixed=.5)
summary(mod3)
#--- Model 4: fit regression model with transformed normal distribution (df=Inf)
# and fixed transformation parameter lambda of 1
# -> This model corresponds to least squares regression
mod4 <- mdmb::yjt_regression( y ~ x, data=dat, lambda_fixed=1)
summary(mod4)
# fit with lm function
mod4b <- stats::lm( y ~ x, data=dat )
summary(mod4b)
#--- Model 5: fit regression model with estimated degrees of freedom
mod5 <- mdmb::yjt_regression( y ~ x, data=dat, est_df=TRUE)
summary(mod5)
#** compare log-likelihood values
logLik(mod1)
logLik(mod2)
logLik(mod3)
logLik(mod4)
logLik(mod4b)
logLik(mod5)
#############################################################################
# EXAMPLE 3: Regression with Box-Cox and Yeo-Johnson transformations
#############################################################################
#*** simulate data
set.seed(985)
n <- 1000
x <- stats::rnorm(n)
y <- .5 + 1*x + stats::rnorm(n, sd=.7 )
y <- mdmb::bc_antitrafo( y, lambda=.5 )
dat <- data.frame( y=y, x=x )
#--- Model 1: fit regression model with Box-Cox transformation
mod1 <- mdmb::bct_regression( y ~ x, data=dat )
summary(mod1)
#--- Model 2: fit regression model with Yeo-Johnson transformation
mod2 <- mdmb::yjt_regression( y ~ x, data=dat )
summary(mod2)
#--- compare fit
logLik(mod1)
logLik(mod2)
#############################################################################
# EXAMPLE 4: Ordinal probit regression
#############################################################################
#--- simulate data
set.seed(987)
N <- 1500
x <- stats::rnorm(N)
z <- stats::rnorm(N)
# regression coefficients
b0 <- -.5 ; b1 <- .6 ; b2 <- .1
# vector of thresholds
thresh <- c(-1, -.3, 1)
yast <- b0 + b1 * x + b2*z + stats::rnorm(N)
y <- as.numeric( cut( yast, c(-Inf,thresh,Inf) ) ) - 1
dat <- data.frame( x=x, y=y, z=z )
#--- probit regression
mod <- mdmb::oprobit_regression( formula=y ~ x + z + I(x*z), data=dat)
summary(mod)
}