Scaled \(t\) Distribution with Yeo-Johnson and Box-Cox Transformations

Collection of functions for the Yeo-Johnson transformation (Yeo & Johnson, 2000) and the corresponding distribution family of scaled \(t\) distribution with and without Yeo-Johnson transformation (see Details). The Yeo-Johnson transformation can also be applied for bounded variables on \((0,1)\) which uses a probit transformation (see Details; argument probit).

The Box-Cox transformation (bc; Sakia, 1992) can be applied for variables with positive values.

# Yeo-Johnson transformation and its inverse transformation
yj_trafo(y, lambda, use_rcpp=TRUE, probit=FALSE)
yj_antitrafo(y, lambda, probit=FALSE)

#---- scaled t distribution with Yeo-Johnson transformation
dyjt_scaled(x, location=0, shape=1, lambda=1, df=Inf, log=FALSE, probit=FALSE)
ryjt_scaled(n, location=0, shape=1, lambda=1, df=Inf, probit=FALSE)

fit_yjt_scaled(x, df=Inf, par_init=NULL, lambda_fixed=NULL, weights=NULL, probit=FALSE)
# S3 method for fit_yjt_scaled
coef(object, ...)
# S3 method for fit_yjt_scaled
logLik(object, ...)
# S3 method for fit_yjt_scaled
summary(object, digits=4, file=NULL, ...)
# S3 method for fit_yjt_scaled
vcov(object, ...)

# Box-Cox transformation and its inverse transformation
bc_trafo(y, lambda)
bc_antitrafo(y, lambda)

#---- scaled t distribution with Box-Cox transformation
dbct_scaled(x, location=0, shape=1, lambda=1, df=Inf, log=FALSE, check_zero=TRUE)
rbct_scaled(n, location=0, shape=1, lambda=1, df=Inf)

fit_bct_scaled(x, df=Inf, par_init=NULL, lambda_fixed=NULL, weights=NULL)
# S3 method for fit_bct_scaled
coef(object, ...)
# S3 method for fit_bct_scaled
logLik(object, ...)
# S3 method for fit_bct_scaled
summary(object, digits=4, file=NULL, ...)
# S3 method for fit_bct_scaled
vcov(object, ...)

#---- scaled t distribution
dt_scaled(x, location=0, shape=1, df=Inf, log=FALSE)
rt_scaled(n, location=0, shape=1, df=Inf)

fit_t_scaled(x, df=Inf, par_init=NULL, weights=NULL)
# S3 method for fit_t_scaled
coef(object, ...)
# S3 method for fit_t_scaled
logLik(object, ...)
# S3 method for fit_t_scaled
summary(object, digits=4, file=NULL, ...)
# S3 method for fit_t_scaled
vcov(object, ...)

Arguments

y: Numeric vector
lambda: Transformation parameter \(\lambda\) for Yeo-Johnson transformation
use_rcpp: Logical indicating whether Rcpp package should be used
probit: Logical indicating whether probit transformation should be applied for bounded variables on \((0,1)\)
x: Numeric vector
location: Location parameter of (transformed) scaled \(t\) distribution
shape: Shape parameter of (transformed) scaled \(t\) distribution
df: Degrees of freedom of (transformed) scaled \(t\) distribution
log: Logical indicating whether logarithm of the density should be computed
check_zero: Logical indicating whether check for inadmissible values should be conducted
n: Number of observations to be simulated
par_init: Optional vector of initial parameters
lambda_fixed: Optional value for fixed \(\lambda\) parameter
weights: Optional vector of sampling weights
object: Object of class fit_yjt_scaled or fit_t_scaled
digits: Number of digits used for rounding in summary
file: File name for the summary to be sunk into
...: Further arguments to be passed

Details

Let \(g_\lambda\) be the Yeo-Johnson transformation. A random variable \(X\) is distribution as Scaled \(t\) with Yeo-Johnson transformation with location \(\mu\), scale \(\sigma\) and transformation parameter \(\lambda\) iff \(X=g_\lambda ( \mu + \sigma Z ) \) and \(Z\) is \(t\) distributed with df degrees of freedom.

For a bounded variable \(X\) on \((0,1)\), the probit transformation \(\Phi\) is applied such that \(X=\Phi( g_\lambda ( \mu + \sigma Z ) ) \) with a \(t\) distributed variable \(Z\).

For a Yeo-Johnson normally distributed variable, a normally distributed variable results in case of \(\lambda=1\). For a Box-Cox normally distributed variable, a normally distributed variable results for \(\lambda=1\).

Value

Vector or an object of fitted distribution depending on the called function

References

Sakia, S. M. (1992). The Box-Cox transformation technique: A review. The Statistician, 41(2), 169-178. doi:10.2307/2348250

Yeo, I.-K., & Johnson, R. (2000). A new family of power transformations to improve normality or symmetry. Biometrika, 87(4), 954-959. doi:10.1093/biomet/87.4.954

Examples

#############################################################################
# EXAMPLE 1: Transforming values according to Yeo-Johnson transformation
#############################################################################

# vector of y values
y <- seq(-3,3, len=100)

# non-negative lambda values
plot( y, mdmb::yj_trafo( y, lambda=1 ), type="l", ylim=8*c(-1,1),
           ylab=expression( g[lambda] (y) ) )
lines( y, mdmb::yj_trafo( y, lambda=2 ), lty=2 )
lines( y, mdmb::yj_trafo( y, lambda=.5 ), lty=3 )
lines( y, mdmb::yj_trafo( y, lambda=0 ), lty=4 )

# non-positive lambda values
plot( y, mdmb::yj_trafo( y, lambda=-1 ), type="l", ylim=8*c(-1,1),
           ylab=expression(g[lambda] (y) ) )
lines( y, mdmb::yj_trafo( y, lambda=-2 ), lty=2 )
lines( y, mdmb::yj_trafo( y, lambda=-.5 ), lty=3 )
lines( y, mdmb::yj_trafo( y, lambda=0 ), lty=4 )

if (FALSE) {
#############################################################################
# EXAMPLE 2: Density of scaled t distribution
#############################################################################

# define location and scale parameter
m0 <- 0.3
sig <- 1.5
#-- compare density of scaled t distribution with large degrees of freedom
#   with normal distribution
y1 <- mdmb::dt_scaled( y, location=m0, shape=sig, df=100 )
y2 <- stats::dnorm( y, mean=m0, sd=sig )
max(abs(y1-y2))

#############################################################################
# EXAMPLE 3: Simulating and fitting the scaled t distribution
#############################################################################

#-- simulate data with 10 degrees of freedom
set.seed(987)
df0 <- 10    # define degrees of freedom
x <- mdmb::rt_scaled( n=1E4, location=m0, shape=sig, df=df0 )
#** fit data with df=10 degrees of freedom
fit1 <- mdmb::fit_t_scaled(x=x, df=df0 )
#** compare with fit from normal distribution
fit2 <- mdmb::fit_t_scaled(x=x, df=Inf )  # df=Inf is the default

#-- some comparisons
coef(fit1)
summary(fit1)
logLik(fit1)
AIC(fit1)
AIC(fit2)

#############################################################################
# EXAMPLE 4: Simulation and fitting of scaled t distribution with
#            Yeo-Johnson transformation
#############################################################################

# define parameters of transformed scaled t distribution
m0 <- .5
sig <- 1.5
lam <- .5

# evaluate density
x <- seq( -5, 5, len=100 )
y <- mdmb::dyjt_scaled( x, location=m0, shape=sig, lambda=lam )
graphics::plot( x, y, type="l")

# transform original values
mdmb::yj_trafo( y=x, lambda=lam )

#** simulate data
set.seed(987)
x <- mdmb::ryjt_scaled(n=3000, location=m0, shape=sig, lambda=lam )
graphics::hist(x, breaks=30)

#*** Model 1: Fit data with lambda to be estimated
fit1 <- mdmb::fit_yjt_scaled(x=x)
summary(fit1)
coef(fit1)

#*** Model 2: Fit data with lambda fixed to simulated lambda
fit2 <- mdmb::fit_yjt_scaled(x=x, lambda_fixed=lam)
summary(fit2)
coef(fit2)

#*** Model 3: Fit data with lambda fixed to 1
fit3 <- mdmb::fit_yjt_scaled(x=x, lambda_fixed=1)

#-- compare log-likelihood values
logLik(fit1)
logLik(fit2)
logLik(fit3)

#############################################################################
# EXAMPLE 5: Approximating the chi square distribution
#            with yjt and bct distribution
#############################################################################

#-- simulate data
set.seed(987)
n <- 3000
df0 <- 5
x <- stats::rchisq( n=n, df=df0 )

#-- plot data
graphics::hist(x, breaks=30)

#-- fit data with yjt distribution
fit1 <- mdmb::fit_yjt_scaled(x)
summary(fit1)
c1 <- coef(fit1)

#-- fit data with bct distribution
fit2 <- mdmb::fit_bct_scaled(x)
summary(fit2)
c2 <- coef(fit2)
# compare log-likelihood values
logLik(fit1)
logLik(fit2)

#-- plot chi square distribution and approximating yjt distribution
y <- seq( .01, 3*df0, len=100 )
dy <- stats::dchisq( y, df=df0 )
graphics::plot( y, dy, type="l", ylim=c(0, max(dy) )*1.1 )
# approximation with scaled t distribution and Yeo-Johnson transformation
graphics::lines( y, mdmb::dyjt_scaled(y, location=c1[1], shape=c1[2], lambda=c1[3]),
                     lty=2)
# approximation with scaled t distribution and Box-Cox transformation
graphocs::lines( y, mdmb::dbct_scaled(y, location=c2[1], shape=c2[2], lambda=c2[3]),
                     lty=3)
# appoximating normal distribution
graphics::lines( y, stats::dnorm( y, mean=df0, sd=sqrt(2*df0) ), lty=4)
graphics::legend( .6*max(y), .9*max(dy), c("chi square", "yjt", "bct", "norm"),
                     lty=1:4)

#############################################################################
# EXAMPLE 6: Bounded variable on (0,1) with Probit Yeo-Johnson transformation
#############################################################################

set.seed(876)
n <- 1000
x <- stats::rnorm(n)
y <- stats::pnorm( 1*x + stats::rnorm(n, sd=sqrt(.5) ) )
dat <- data.frame( y=y, x=x )

#*** fit Probit Yeo-Johnson distribution
mod1 <- mdmb::fit_yjt_scaled(x=y, probit=TRUE)
summary(mod1)

#*** estimation using regression model
mod2 <- mdmb::yjt_regression( y ~ x, data=dat, probit=TRUE )
summary(mod2)
}