yjt_dist.Rd
Collection of functions for the Yeo-Johnson transformation
(Yeo & Johnson, 2000) and the corresponding distribution family of scaled
\(t\) distribution with and without Yeo-Johnson transformation
(see Details). The Yeo-Johnson transformation can also be applied for bounded variables
on \((0,1)\) which uses a probit transformation (see Details; argument probit
).
The Box-Cox transformation (bc
; Sakia, 1992)
can be applied for variables with positive values.
# Yeo-Johnson transformation and its inverse transformation yj_trafo(y, lambda, use_rcpp=TRUE, probit=FALSE) yj_antitrafo(y, lambda, probit=FALSE) #---- scaled t distribution with Yeo-Johnson transformation dyjt_scaled(x, location=0, shape=1, lambda=1, df=Inf, log=FALSE, probit=FALSE) ryjt_scaled(n, location=0, shape=1, lambda=1, df=Inf, probit=FALSE) fit_yjt_scaled(x, df=Inf, par_init=NULL, lambda_fixed=NULL, weights=NULL, probit=FALSE) # S3 method for fit_yjt_scaled coef(object, ...) # S3 method for fit_yjt_scaled logLik(object, ...) # S3 method for fit_yjt_scaled summary(object, digits=4, file=NULL, ...) # S3 method for fit_yjt_scaled vcov(object, ...) # Box-Cox transformation and its inverse transformation bc_trafo(y, lambda) bc_antitrafo(y, lambda) #---- scaled t distribution with Box-Cox transformation dbct_scaled(x, location=0, shape=1, lambda=1, df=Inf, log=FALSE, check_zero=TRUE) rbct_scaled(n, location=0, shape=1, lambda=1, df=Inf) fit_bct_scaled(x, df=Inf, par_init=NULL, lambda_fixed=NULL, weights=NULL) # S3 method for fit_bct_scaled coef(object, ...) # S3 method for fit_bct_scaled logLik(object, ...) # S3 method for fit_bct_scaled summary(object, digits=4, file=NULL, ...) # S3 method for fit_bct_scaled vcov(object, ...) #---- scaled t distribution dt_scaled(x, location=0, shape=1, df=Inf, log=FALSE) rt_scaled(n, location=0, shape=1, df=Inf) fit_t_scaled(x, df=Inf, par_init=NULL, weights=NULL) # S3 method for fit_t_scaled coef(object, ...) # S3 method for fit_t_scaled logLik(object, ...) # S3 method for fit_t_scaled summary(object, digits=4, file=NULL, ...) # S3 method for fit_t_scaled vcov(object, ...)
y | Numeric vector |
---|---|
lambda | Transformation parameter \(\lambda\) for Yeo-Johnson transformation |
use_rcpp | Logical indicating whether Rcpp package should be used |
probit | Logical indicating whether probit transformation should be applied for bounded variables on \((0,1)\) |
x | Numeric vector |
location | Location parameter of (transformed) scaled \(t\) distribution |
shape | Shape parameter of (transformed) scaled \(t\) distribution |
df | Degrees of freedom of (transformed) scaled \(t\) distribution |
log | Logical indicating whether logarithm of the density should be computed |
check_zero | Logical indicating whether check for inadmissible values should be conducted |
n | Number of observations to be simulated |
par_init | Optional vector of initial parameters |
lambda_fixed | Optional value for fixed \(\lambda\) parameter |
weights | Optional vector of sampling weights |
object | Object of class |
digits | Number of digits used for rounding in |
file | File name for the |
... | Further arguments to be passed |
Let \(g_\lambda\) be the Yeo-Johnson transformation. A random variable \(X\)
is distribution as Scaled \(t\) with Yeo-Johnson transformation with location
\(\mu\), scale \(\sigma\) and transformation parameter \(\lambda\)
iff \(X=g_\lambda ( \mu + \sigma Z ) \) and \(Z\) is \(t\) distributed
with df
degrees of freedom.
For a bounded variable \(X\) on \((0,1)\), the probit transformation \(\Phi\) is applied such that \(X=\Phi( g_\lambda ( \mu + \sigma Z ) ) \) with a \(t\) distributed variable \(Z\).
For a Yeo-Johnson normally distributed variable, a normally distributed variable results in case of \(\lambda=1\). For a Box-Cox normally distributed variable, a normally distributed variable results for \(\lambda=1\).
Vector or an object of fitted distribution depending on the called function
Sakia, S. M. (1992). The Box-Cox transformation technique: A review. The Statistician, 41(2), 169-178. doi: 10.2307/2348250
Yeo, I.-K., & Johnson, R. (2000). A new family of power transformations to improve normality or symmetry. Biometrika, 87(4), 954-959. doi: 10.1093/biomet/87.4.954
See yjt_regression
for fitting a regression model in which
the response variable is distributed according to the scaled \(t\)
distribution with Yeo-Johnson transformation.
See car::yjPower
for fitting the Yeo-Johnson
transformation in the car package. See car::bcPower
for the
Box-Cox transformation.
The scaled \(t\) distribution can be also found in
metRology::dt.scaled
(metRology package).
See stats::dt
for the \(t\) distribution.
See the fitdistrplus package or the general
stats4::mle
function
for fitting several distributions in R.
############################################################################# # EXAMPLE 1: Transforming values according to Yeo-Johnson transformation ############################################################################# # vector of y values y <- seq(-3,3, len=100) # non-negative lambda values plot( y, mdmb::yj_trafo( y, lambda=1 ), type="l", ylim=8*c(-1,1), ylab=expression( g[lambda] (y) ) ) lines( y, mdmb::yj_trafo( y, lambda=2 ), lty=2 ) lines( y, mdmb::yj_trafo( y, lambda=.5 ), lty=3 ) lines( y, mdmb::yj_trafo( y, lambda=0 ), lty=4 ) # non-positive lambda values plot( y, mdmb::yj_trafo( y, lambda=-1 ), type="l", ylim=8*c(-1,1), ylab=expression(g[lambda] (y) ) ) lines( y, mdmb::yj_trafo( y, lambda=-2 ), lty=2 ) lines( y, mdmb::yj_trafo( y, lambda=-.5 ), lty=3 ) lines( y, mdmb::yj_trafo( y, lambda=0 ), lty=4 ) if (FALSE) { ############################################################################# # EXAMPLE 2: Density of scaled t distribution ############################################################################# # define location and scale parameter m0 <- 0.3 sig <- 1.5 #-- compare density of scaled t distribution with large degrees of freedom # with normal distribution y1 <- mdmb::dt_scaled( y, location=m0, shape=sig, df=100 ) y2 <- stats::dnorm( y, mean=m0, sd=sig ) max(abs(y1-y2)) ############################################################################# # EXAMPLE 3: Simulating and fitting the scaled t distribution ############################################################################# #-- simulate data with 10 degrees of freedom set.seed(987) df0 <- 10 # define degrees of freedom x <- mdmb::rt_scaled( n=1E4, location=m0, shape=sig, df=df0 ) #** fit data with df=10 degrees of freedom fit1 <- mdmb::fit_t_scaled(x=x, df=df0 ) #** compare with fit from normal distribution fit2 <- mdmb::fit_t_scaled(x=x, df=Inf ) # df=Inf is the default #-- some comparisons coef(fit1) summary(fit1) logLik(fit1) AIC(fit1) AIC(fit2) ############################################################################# # EXAMPLE 4: Simulation and fitting of scaled t distribution with # Yeo-Johnson transformation ############################################################################# # define parameters of transformed scaled t distribution m0 <- .5 sig <- 1.5 lam <- .5 # evaluate density x <- seq( -5, 5, len=100 ) y <- mdmb::dyjt_scaled( x, location=m0, shape=sig, lambda=lam ) graphics::plot( x, y, type="l") # transform original values mdmb::yj_trafo( y=x, lambda=lam ) #** simulate data set.seed(987) x <- mdmb::ryjt_scaled(n=3000, location=m0, shape=sig, lambda=lam ) graphics::hist(x, breaks=30) #*** Model 1: Fit data with lambda to be estimated fit1 <- mdmb::fit_yjt_scaled(x=x) summary(fit1) coef(fit1) #*** Model 2: Fit data with lambda fixed to simulated lambda fit2 <- mdmb::fit_yjt_scaled(x=x, lambda_fixed=lam) summary(fit2) coef(fit2) #*** Model 3: Fit data with lambda fixed to 1 fit3 <- mdmb::fit_yjt_scaled(x=x, lambda_fixed=1) #-- compare log-likelihood values logLik(fit1) logLik(fit2) logLik(fit3) ############################################################################# # EXAMPLE 5: Approximating the chi square distribution # with yjt and bct distribution ############################################################################# #-- simulate data set.seed(987) n <- 3000 df0 <- 5 x <- stats::rchisq( n=n, df=df0 ) #-- plot data graphics::hist(x, breaks=30) #-- fit data with yjt distribution fit1 <- mdmb::fit_yjt_scaled(x) summary(fit1) c1 <- coef(fit1) #-- fit data with bct distribution fit2 <- mdmb::fit_bct_scaled(x) summary(fit2) c2 <- coef(fit2) # compare log-likelihood values logLik(fit1) logLik(fit2) #-- plot chi square distribution and approximating yjt distribution y <- seq( .01, 3*df0, len=100 ) dy <- stats::dchisq( y, df=df0 ) graphics::plot( y, dy, type="l", ylim=c(0, max(dy) )*1.1 ) # approximation with scaled t distribution and Yeo-Johnson transformation graphics::lines( y, mdmb::dyjt_scaled(y, location=c1[1], shape=c1[2], lambda=c1[3]), lty=2) # approximation with scaled t distribution and Box-Cox transformation graphocs::lines( y, mdmb::dbct_scaled(y, location=c2[1], shape=c2[2], lambda=c2[3]), lty=3) # appoximating normal distribution graphics::lines( y, stats::dnorm( y, mean=df0, sd=sqrt(2*df0) ), lty=4) graphics::legend( .6*max(y), .9*max(dy), c("chi square", "yjt", "bct", "norm"), lty=1:4) ############################################################################# # EXAMPLE 6: Bounded variable on (0,1) with Probit Yeo-Johnson transformation ############################################################################# set.seed(876) n <- 1000 x <- stats::rnorm(n) y <- stats::pnorm( 1*x + stats::rnorm(n, sd=sqrt(.5) ) ) dat <- data.frame( y=y, x=x ) #*** fit Probit Yeo-Johnson distribution mod1 <- mdmb::fit_yjt_scaled(x=y, probit=TRUE) summary(mod1) #*** estimation using regression model mod2 <- mdmb::yjt_regression( y ~ x, data=dat, probit=TRUE ) summary(mod2) }