syn_da.Rd
This function generates synthetic data utilizing data augmentation
(Jiang et al., 2022; Grund et al., 2022). Continuous
and ordinal variables can be handled. The order of the synthesized variables
can be defined using the argument syn_vars
.
syn_da(dat, syn_vars=NULL, fix_vars=NULL, ord_vars=NULL, da_noise=0.5, use_pls=TRUE, ncomp=20, exact_regression=TRUE, exact_marginal=TRUE, imp_maxit=5)
dat | Original dataset |
---|---|
syn_vars | Vector with variable names that should be synthesized |
fix_vars | Vector with variable names that are held fixed in the synthesis |
ord_vars | Vector with ordinal variables that are treated as factors when modeled as predictors in the regression model |
da_noise | Proportion of variance (i.e., unreliability) that is added as noise in data augmentation. The argument can be numeric or a vector, depending on whether it is made variable-specific. |
use_pls | Logical indicating whether partial least squares (PLS) should be used for dimension reduction |
ncomp | Number of PLS factors |
exact_regression | Logical indicating whether residuals are forced to be uncorrelated with predictors in the synthesis model |
exact_marginal | Logical indicating whether marginal distributions of the variables should be preserved |
imp_maxit | Number of iterations in the imputation if the original dataset contains missing values |
A list with entries
generated synthetic data
Data frame containing original and synthetic data
more entries
Grund, S., Luedtke, O., & Robitzsch, A. (2022). Using synthetic data to improve the reproducibility of statistical results in psychological research. Psychological Methods. Epub ahead of print. doi: 10.1037/met0000526
Jiang, B., Raftery, A. E., Steele, R. J., & Wang, N. (2022). Balancing inferential integrity and disclosure risk via model targeted masking and multiple imputation. Journal of the American Statistical Association, 117(537), 52-66. doi: 10.1080/01621459.2021.1909597
if (FALSE) { ############################################################################# # EXAMPLE 1: Generate synthetic data with item responses and covariates ############################################################################# data(data.ma09, package="miceadds") dat <- data.ma09 # fixed variables in synthesis fix_vars <- c("PV1MATH", "SEX","AGE") # ordinal variables in synthesis ord_vars <- c("FISCED", "MISCED", items) # variables that should be synthesized syn_vars <- c("HISEI", "FISCED", "MISCED", items) #-- synthesize data mod <- miceadds::syn_da( dat=dat0, syn_vars=syn_vars, fix_vars=fix_vars, ord_vars=ord_vars, da_noise=0.5, imp_maxit=2, use_pls=TRUE, ncomp=20, exact_regression=TRUE, exact_marginal=TRUE) #- extract synthetic dataset mod$dat_syn }