Title: | Generate and Modify Synthetic Datasets |
---|---|
Description: | Set of functions to create datasets using a correlation matrix. |
Authors: | Francis Huang <[email protected]> |
Maintainer: | Francis Huang <[email protected]> |
License: | GPL-3 |
Version: | 1.2.0 |
Built: | 2025-03-07 05:19:02 UTC |
Source: | https://github.com/cran/gendata |
Create synthetic datasets based on a correlation table. Additional functions can be used to rescale, transform, and reverse code variables.
Package: | gendata |
Type: | Package |
Version: | 1.1 |
Date: | 2012-02-27 |
License: | GPL-3 |
Additional functions are for modifying the dataset.
genmvnorm:
creates the dataset (generates a multivariate normal dataset).
recalib : for rescaling the dataset
dtrans : for giving a variable a new mean and standard deviation
revcode : for reverse coding a variable
Francis Huang
Maintainer: Francis Huang <[email protected]>
Fan, X., Felsovalyi, A., Sivo, S., & Keenan, S. (2002). SAS for Monte Carlo studies: A guide for quantitative researchers. SAS Institute.
genmvnorm revcode dtrans recalib
Transforms variables in a dataset with a specified mean and standard deviation.
dtrans(data, m, sd, rnd = FALSE)
dtrans(data, m, sd, rnd = FALSE)
data |
name of your dataset. |
m |
indicate a vector of desired means. |
sd |
indicate a vector of desired standard deviations. |
rnd |
indicates if you want to round the numbers (no decimals). |
Francis Huang
sdata <- genmvnorm(cor = c(.7, .2, .3), k = 3, n = 500, seed = 12345) cor(sdata) summary(sdata) #note: data are in z scores s2 <- dtrans(sdata, c(0, 100, 50), c(1, 15, 10), rnd = FALSE) summary(s2) sd(s2[,2]) sd(s2[,3]) #note: variables X2 and X3 are now rescaled with the appropriate means and standard deviations. head(s2) s2 <- dtrans(sdata, c(0, 100, 50), c(1, 15, 10), rnd = TRUE) #at times, you may want a dataset to not have decimals. use \code{rnd= TRUE}. head(s2)
sdata <- genmvnorm(cor = c(.7, .2, .3), k = 3, n = 500, seed = 12345) cor(sdata) summary(sdata) #note: data are in z scores s2 <- dtrans(sdata, c(0, 100, 50), c(1, 15, 10), rnd = FALSE) summary(s2) sd(s2[,2]) sd(s2[,3]) #note: variables X2 and X3 are now rescaled with the appropriate means and standard deviations. head(s2) s2 <- dtrans(sdata, c(0, 100, 50), c(1, 15, 10), rnd = TRUE) #at times, you may want a dataset to not have decimals. use \code{rnd= TRUE}. head(s2)
Generates a multivariate normal dataset based on a specified correlation matrix.
genmvnorm(cor, k, n, seed = FALSE)
genmvnorm(cor, k, n, seed = FALSE)
cor |
Can be a correlation matrix– e.g., data<-cor(xyz)– or the lower half of a correlation matrix, e.g., for a 3 variable dataset, data<-c(.7,.3,.2)– useful for creating datasets without having to specify both halves of the correlation matrix. |
k |
Indicate the number of variables in your dataset. |
n |
Indicate the number of observations in your new synthetic dataset. |
seed |
For reproducability of results, set a specific seed number. |
For creating synthetic datasets. Based on the SAS chapter by Fan et al. (2002).
Francis Huang
Based on:
Fan, X., Felsovalyi, A., Sivo, S., & Keenan, S. (2002). SAS for Monte Carlo studies: A guide for quantitative researchers. SAS Institute.
sdata<-genmvnorm(cor=c(.7,.2,.3),k=3,n=500,seed=12345) cor(sdata) #dataset above uses the lower half of a correlation table # 1 .7 .2 # .7 1 .3 # .2 .3 1 # Can also use a correlation table data(iris) dat<-cor(iris[,1:3]) dat sdata<-genmvnorm(cor=dat,k=3,n=100,seed=123) cor(sdata) #example above uses the IRIS dataset.
sdata<-genmvnorm(cor=c(.7,.2,.3),k=3,n=500,seed=12345) cor(sdata) #dataset above uses the lower half of a correlation table # 1 .7 .2 # .7 1 .3 # .2 .3 1 # Can also use a correlation table data(iris) dat<-cor(iris[,1:3]) dat sdata<-genmvnorm(cor=dat,k=3,n=100,seed=123) cor(sdata) #example above uses the IRIS dataset.
Rescale variables (one at a time) to have a new minimum and maximum value.
recalib(data, var, low, high)
recalib(data, var, low, high)
data |
the dataset to use. |
var |
indicate the variable number (or variable name). |
low |
Indicate the new minimum value. |
high |
Indicate the new maximum value. |
Specify the rescaling of variables one at a time.
Francis Huang
sdata <- genmvnorm(cor = c(.7, .2, .3), k = 3, n = 500, seed = 12345) cor(sdata) summary(sdata[,1]) #note the min and max of variable X1 #changes variable one to have a minimum of 10 and a maximum of 50 #correlations remain the same s2 <- recalib(sdata, 1, 10, 50) cor(s2) summary(s2[,1]) #note revised values of variable X1
sdata <- genmvnorm(cor = c(.7, .2, .3), k = 3, n = 500, seed = 12345) cor(sdata) summary(sdata[,1]) #note the min and max of variable X1 #changes variable one to have a minimum of 10 and a maximum of 50 #correlations remain the same s2 <- recalib(sdata, 1, 10, 50) cor(s2) summary(s2[,1]) #note revised values of variable X1
Reverse codes variables
revcode(data, vars)
revcode(data, vars)
data |
indicates your dataset. |
vars |
indicates the variable number or name to reverse code. |
Francis Huang