Collection of functions for the Yeo-Johnson transformation (Yeo & Johnson, 2000) and the corresponding distribution family of scaled \(t\) distribution with and without Yeo-Johnson transformation (see Details). The Yeo-Johnson transformation can also be applied for bounded variables on \((0,1)\) which uses a probit transformation (see Details; argument probit).

The Box-Cox transformation (bc; Sakia, 1992) can be applied for variables with positive values.

# Yeo-Johnson transformation and its inverse transformation
yj_trafo(y, lambda, use_rcpp=TRUE, probit=FALSE)
yj_antitrafo(y, lambda, probit=FALSE)

#---- scaled t distribution with Yeo-Johnson transformation
dyjt_scaled(x, location=0, shape=1, lambda=1, df=Inf, log=FALSE, probit=FALSE)
ryjt_scaled(n, location=0, shape=1, lambda=1, df=Inf, probit=FALSE)

fit_yjt_scaled(x, df=Inf, par_init=NULL, lambda_fixed=NULL, weights=NULL, probit=FALSE)
# S3 method for fit_yjt_scaled
coef(object, ...)
# S3 method for fit_yjt_scaled
logLik(object, ...)
# S3 method for fit_yjt_scaled
summary(object, digits=4, file=NULL, ...)
# S3 method for fit_yjt_scaled
vcov(object, ...)

# Box-Cox transformation and its inverse transformation
bc_trafo(y, lambda)
bc_antitrafo(y, lambda)

#---- scaled t distribution with Box-Cox transformation
dbct_scaled(x, location=0, shape=1, lambda=1, df=Inf, log=FALSE, check_zero=TRUE)
rbct_scaled(n, location=0, shape=1, lambda=1, df=Inf)

fit_bct_scaled(x, df=Inf, par_init=NULL, lambda_fixed=NULL, weights=NULL)
# S3 method for fit_bct_scaled
coef(object, ...)
# S3 method for fit_bct_scaled
logLik(object, ...)
# S3 method for fit_bct_scaled
summary(object, digits=4, file=NULL, ...)
# S3 method for fit_bct_scaled
vcov(object, ...)

#---- scaled t distribution
dt_scaled(x, location=0, shape=1, df=Inf, log=FALSE)
rt_scaled(n, location=0, shape=1, df=Inf)

fit_t_scaled(x, df=Inf, par_init=NULL, weights=NULL)
# S3 method for fit_t_scaled
coef(object, ...)
# S3 method for fit_t_scaled
logLik(object, ...)
# S3 method for fit_t_scaled
summary(object, digits=4, file=NULL, ...)
# S3 method for fit_t_scaled
vcov(object, ...)

Arguments

y

Numeric vector

lambda

Transformation parameter \(\lambda\) for Yeo-Johnson transformation

use_rcpp

Logical indicating whether Rcpp package should be used

probit

Logical indicating whether probit transformation should be applied for bounded variables on \((0,1)\)

x

Numeric vector

location

Location parameter of (transformed) scaled \(t\) distribution

shape

Shape parameter of (transformed) scaled \(t\) distribution

df

Degrees of freedom of (transformed) scaled \(t\) distribution

log

Logical indicating whether logarithm of the density should be computed

check_zero

Logical indicating whether check for inadmissible values should be conducted

n

Number of observations to be simulated

par_init

Optional vector of initial parameters

lambda_fixed

Optional value for fixed \(\lambda\) parameter

weights

Optional vector of sampling weights

object

Object of class fit_yjt_scaled or fit_t_scaled

digits

Number of digits used for rounding in summary

file

File name for the summary to be sunk into

...

Further arguments to be passed

Details

Let \(g_\lambda\) be the Yeo-Johnson transformation. A random variable \(X\) is distribution as Scaled \(t\) with Yeo-Johnson transformation with location \(\mu\), scale \(\sigma\) and transformation parameter \(\lambda\) iff \(X=g_\lambda ( \mu + \sigma Z ) \) and \(Z\) is \(t\) distributed with df degrees of freedom.

For a bounded variable \(X\) on \((0,1)\), the probit transformation \(\Phi\) is applied such that \(X=\Phi( g_\lambda ( \mu + \sigma Z ) ) \) with a \(t\) distributed variable \(Z\).

For a Yeo-Johnson normally distributed variable, a normally distributed variable results in case of \(\lambda=1\). For a Box-Cox normally distributed variable, a normally distributed variable results for \(\lambda=1\).

Value

Vector or an object of fitted distribution depending on the called function

References

Sakia, S. M. (1992). The Box-Cox transformation technique: A review. The Statistician, 41(2), 169-178. doi: 10.2307/2348250

Yeo, I.-K., & Johnson, R. (2000). A new family of power transformations to improve normality or symmetry. Biometrika, 87(4), 954-959. doi: 10.1093/biomet/87.4.954

See also

See yjt_regression for fitting a regression model in which the response variable is distributed according to the scaled \(t\) distribution with Yeo-Johnson transformation.

See car::yjPower for fitting the Yeo-Johnson transformation in the car package. See car::bcPower for the Box-Cox transformation.

The scaled \(t\) distribution can be also found in metRology::dt.scaled (metRology package).

See stats::dt for the \(t\) distribution.

See the fitdistrplus package or the general stats4::mle function for fitting several distributions in R.

Examples

#############################################################################
# EXAMPLE 1: Transforming values according to Yeo-Johnson transformation
#############################################################################

# vector of y values
y <- seq(-3,3, len=100)

# non-negative lambda values
plot( y, mdmb::yj_trafo( y, lambda=1 ), type="l", ylim=8*c(-1,1),
           ylab=expression( g[lambda] (y) ) )
lines( y, mdmb::yj_trafo( y, lambda=2 ), lty=2 )
lines( y, mdmb::yj_trafo( y, lambda=.5 ), lty=3 )
lines( y, mdmb::yj_trafo( y, lambda=0 ), lty=4 )

# non-positive lambda values
plot( y, mdmb::yj_trafo( y, lambda=-1 ), type="l", ylim=8*c(-1,1),
           ylab=expression(g[lambda] (y) ) )
lines( y, mdmb::yj_trafo( y, lambda=-2 ), lty=2 )
lines( y, mdmb::yj_trafo( y, lambda=-.5 ), lty=3 )
lines( y, mdmb::yj_trafo( y, lambda=0 ), lty=4 )

if (FALSE) {
#############################################################################
# EXAMPLE 2: Density of scaled t distribution
#############################################################################

# define location and scale parameter
m0 <- 0.3
sig <- 1.5
#-- compare density of scaled t distribution with large degrees of freedom
#   with normal distribution
y1 <- mdmb::dt_scaled( y, location=m0, shape=sig, df=100 )
y2 <- stats::dnorm( y, mean=m0, sd=sig )
max(abs(y1-y2))

#############################################################################
# EXAMPLE 3: Simulating and fitting the scaled t distribution
#############################################################################

#-- simulate data with 10 degrees of freedom
set.seed(987)
df0 <- 10    # define degrees of freedom
x <- mdmb::rt_scaled( n=1E4, location=m0, shape=sig, df=df0 )
#** fit data with df=10 degrees of freedom
fit1 <- mdmb::fit_t_scaled(x=x, df=df0 )
#** compare with fit from normal distribution
fit2 <- mdmb::fit_t_scaled(x=x, df=Inf )  # df=Inf is the default

#-- some comparisons
coef(fit1)
summary(fit1)
logLik(fit1)
AIC(fit1)
AIC(fit2)

#############################################################################
# EXAMPLE 4: Simulation and fitting of scaled t distribution with
#            Yeo-Johnson transformation
#############################################################################

# define parameters of transformed scaled t distribution
m0 <- .5
sig <- 1.5
lam <- .5

# evaluate density
x <- seq( -5, 5, len=100 )
y <- mdmb::dyjt_scaled( x, location=m0, shape=sig, lambda=lam )
graphics::plot( x, y, type="l")

# transform original values
mdmb::yj_trafo( y=x, lambda=lam )

#** simulate data
set.seed(987)
x <- mdmb::ryjt_scaled(n=3000, location=m0, shape=sig, lambda=lam )
graphics::hist(x, breaks=30)

#*** Model 1: Fit data with lambda to be estimated
fit1 <- mdmb::fit_yjt_scaled(x=x)
summary(fit1)
coef(fit1)

#*** Model 2: Fit data with lambda fixed to simulated lambda
fit2 <- mdmb::fit_yjt_scaled(x=x, lambda_fixed=lam)
summary(fit2)
coef(fit2)

#*** Model 3: Fit data with lambda fixed to 1
fit3 <- mdmb::fit_yjt_scaled(x=x, lambda_fixed=1)

#-- compare log-likelihood values
logLik(fit1)
logLik(fit2)
logLik(fit3)

#############################################################################
# EXAMPLE 5: Approximating the chi square distribution
#            with yjt and bct distribution
#############################################################################

#-- simulate data
set.seed(987)
n <- 3000
df0 <- 5
x <- stats::rchisq( n=n, df=df0 )

#-- plot data
graphics::hist(x, breaks=30)

#-- fit data with yjt distribution
fit1 <- mdmb::fit_yjt_scaled(x)
summary(fit1)
c1 <- coef(fit1)

#-- fit data with bct distribution
fit2 <- mdmb::fit_bct_scaled(x)
summary(fit2)
c2 <- coef(fit2)
# compare log-likelihood values
logLik(fit1)
logLik(fit2)

#-- plot chi square distribution and approximating yjt distribution
y <- seq( .01, 3*df0, len=100 )
dy <- stats::dchisq( y, df=df0 )
graphics::plot( y, dy, type="l", ylim=c(0, max(dy) )*1.1 )
# approximation with scaled t distribution and Yeo-Johnson transformation
graphics::lines( y, mdmb::dyjt_scaled(y, location=c1[1], shape=c1[2], lambda=c1[3]),
                     lty=2)
# approximation with scaled t distribution and Box-Cox transformation
graphocs::lines( y, mdmb::dbct_scaled(y, location=c2[1], shape=c2[2], lambda=c2[3]),
                     lty=3)
# appoximating normal distribution
graphics::lines( y, stats::dnorm( y, mean=df0, sd=sqrt(2*df0) ), lty=4)
graphics::legend( .6*max(y), .9*max(dy), c("chi square", "yjt", "bct", "norm"),
                     lty=1:4)

#############################################################################
# EXAMPLE 6: Bounded variable on (0,1) with Probit Yeo-Johnson transformation
#############################################################################

set.seed(876)
n <- 1000
x <- stats::rnorm(n)
y <- stats::pnorm( 1*x + stats::rnorm(n, sd=sqrt(.5) ) )
dat <- data.frame( y=y, x=x )

#*** fit Probit Yeo-Johnson distribution
mod1 <- mdmb::fit_yjt_scaled(x=y, probit=TRUE)
summary(mod1)

#*** estimation using regression model
mod2 <- mdmb::yjt_regression( y ~ x, data=dat, probit=TRUE )
summary(mod2)
}