Synthesise rating-scale data with given first and second moments and a predefined correlation matrix

makeScales() generates a dataframe of random discrete values so the data replicate a rating scale, and are correlated close to a predefined correlation matrix.

makeScales() is wrapper function for:

lfast(), generates a dataframe that best fits the desired moments, and
lcor(), which rearranges values in each column of the dataframe so they closely match the desired correlation matrix.

Usage

makeScales(n, means, sds, lowerbound = 1, upperbound = 5, items = 1, cormatrix)

Arguments

n: (positive, int) sample-size - number of observations
means: (real) target means: a vector of length k of mean values for each scale item
sds: (positive, real) target standard deviations: a vector of length k of standard deviation values for each scale item
lowerbound: (positive, int) a vector of length k (same as rows & columns of correlation matrix) of values for lower bound of each scale item (e.g. '1' for a 1-5 rating scale). Default = 1.
upperbound: (positive, int) a vector of length k (same as rows & columns of correlation matrix) of values for upper bound of each scale item (e.g. '5' for a 1-5 rating scale). Default = 5.
items: (positive, int) a vector of length k of number of items in each scale. Default = 1.
cormatrix: (real, matrix) the target correlation matrix: a square symmetric positive-semi-definite matrix of values ranging between -1 and +1, and '1' in the diagonal.

Value

a dataframe of rating-scale values

Examples


## Example 1: four correlated items (questions)

### define parameters

n <- 16
dfMeans <- c(2.5, 3.0, 3.0, 3.5)
dfSds <- c(1.0, 1.0, 1.5, 0.75)
lowerbound <- rep(1, 4)
upperbound <- rep(5, 4)

corMat <- matrix(
  c(
    1.00, 0.30, 0.40, 0.60,
    0.30, 1.00, 0.50, 0.70,
    0.40, 0.50, 1.00, 0.80,
    0.60, 0.70, 0.80, 1.00
  ),
  nrow = 4, ncol = 4
)

scale_names <- c("Q1", "Q2", "Q3", "Q4")
rownames(corMat) <- scale_names
colnames(corMat) <- scale_names

### apply function

df1 <- makeScales(
  n = n, means = dfMeans, sds = dfSds,
  lowerbound = lowerbound, upperbound = upperbound, cormatrix = corMat
)
#> Variable  1 :  Q1  - 
#> reached maximum of 1024 iterations
#> Variable  2 :  Q2  - 
#> reached maximum of 1024 iterations
#> Variable  3 :  Q3  - 
#> reached maximum of 1024 iterations
#> Variable  4 :  Q4  - 
#> reached maximum of 1024 iterations
#> 
#> Arranging data to match correlations
#> 
#> Successfully generated correlated variables
#> 

### test function

str(df1)
#> 'data.frame':	16 obs. of  4 variables:
#>  $ Q1: num  4 3 3 2 1 1 1 2 4 2 ...
#>  $ Q2: num  4 3 4 2 2 4 2 1 3 3 ...
#>  $ Q3: num  2 5 5 4 1 3 1 2 3 2 ...
#>  $ Q4: num  4 4 4 3 2 4 3 3 4 3 ...

#### means
apply(df1, 2, mean) |> round(3)
#>  Q1  Q2  Q3  Q4 
#> 2.5 3.0 3.0 3.5 

#### standard deviations
apply(df1, 2, sd) |> round(3)
#>    Q1    Q2    Q3    Q4 
#> 1.033 1.033 1.506 0.730 

#### correlations
cor(df1) |> round(3)
#>       Q1    Q2    Q3    Q4
#> Q1 1.000 0.313 0.386 0.619
#> Q2 0.313 1.000 0.514 0.707
#> Q3 0.386 0.514 1.000 0.728
#> Q4 0.619 0.707 0.728 1.000



## Example 2: five correlated Likert scales

### a study on employee engagement and organizational climate:
# Job Satisfaction (JS)
# Organizational Commitment (OC)
# Perceived Supervisor Support (PSS)
# Work Engagement (WE)
# Turnover Intention (TI) (reverse-related to others).

### define parameters

n <- 128
dfMeans <- c(3.8, 3.6, 3.7, 3.9, 2.2)
dfSds <- c(0.7, 0.8, 0.7, 0.6, 0.9)
lowerbound <- rep(1, 5)
upperbound <- rep(5, 5)
items <- c(4, 4, 3, 3, 3)

corMat <- matrix(
  c(
    1.00, 0.72, 0.58, 0.65, -0.55,
    0.72, 1.00, 0.54, 0.60, -0.60,
    0.58, 0.54, 1.00, 0.57, -0.45,
    0.65, 0.60, 0.57, 1.00, -0.50,
    -0.55, -0.60, -0.45, -0.50, 1.00
  ),
  nrow = 5, ncol = 5
)

scale_names <- c("JS", "OC", "PSS", "WE", "TI")
rownames(corMat) <- scale_names
colnames(corMat) <- scale_names

### apply function

df2 <- makeScales(
  n = n, means = dfMeans, sds = dfSds,
  lowerbound = lowerbound, upperbound = upperbound,
  items = items, cormatrix = corMat
)
#> Variable  1 :  JS  - 
#> best solution in 88 iterations
#> Variable  2 :  OC  - 
#> best solution in 1334 iterations
#> Variable  3 :  PSS  - 
#> best solution in 1443 iterations
#> Variable  4 :  WE  - 
#> best solution in 623 iterations
#> Variable  5 :  TI  - 
#> best solution in 703 iterations
#> 
#> Arranging data to match correlations
#> 
#> Successfully generated correlated variables
#> 

### test function

str(df2)
#> 'data.frame':	128 obs. of  5 variables:
#>  $ JS : num  2.75 4.25 3.5 2.75 3.5 3 3.75 4.5 4.75 4 ...
#>  $ OC : num  2.25 4 2 3.5 3.5 3.25 3.25 4 4.5 4 ...
#>  $ PSS: num  3.67 3 4.33 3.33 2.33 ...
#>  $ WE : num  3.67 4.33 3 2.67 2.33 ...
#>  $ TI : num  3 2.33 4 2.33 2 ...

#### means
apply(df2, 2, mean) |> round(3)
#>    JS    OC   PSS    WE    TI 
#> 3.799 3.602 3.701 3.901 2.201 

#### standard deviations
apply(df2, 2, sd) |> round(3)
#>    JS    OC   PSS    WE    TI 
#> 0.699 0.800 0.701 0.599 0.898 

#### correlations
cor(df2) |> round(3)
#>        JS     OC   PSS     WE     TI
#> JS   1.00  0.720  0.58  0.650 -0.550
#> OC   0.72  1.000  0.54  0.600 -0.599
#> PSS  0.58  0.540  1.00  0.570 -0.450
#> WE   0.65  0.600  0.57  1.000 -0.499
#> TI  -0.55 -0.599 -0.45 -0.499  1.000