Skip to contents

Generates synthetic rating-scale data that replicates reported regression results. This function is useful for reproducing analyses from published research where only summary statistics (standardised regression coefficients and R-squared) are reported.

Usage

makeScalesRegression(
  n,
  beta_std,
  r_squared,
  iv_cormatrix = NULL,
  iv_cor_mean = 0.3,
  iv_cor_variance = 0.01,
  iv_cor_range = c(-0.7, 0.7),
  iv_means,
  iv_sds,
  dv_mean,
  dv_sd,
  lowerbound_iv,
  upperbound_iv,
  lowerbound_dv,
  upperbound_dv,
  items_iv = 1,
  items_dv = 1,
  var_names = NULL,
  tolerance = 0.005
)

Arguments

n

Integer. Sample size

beta_std

Numeric vector of standardised regression coefficients (length k)

r_squared

Numeric. R-squared from regression (-1 to 1)

iv_cormatrix

k x k correlation matrix of independent variables. If missing (NULL), will be optimised.

iv_cor_mean

Numeric. Mean correlation among IVs when optimising (ignored if iv_cormatrix provided). Default = 0.3

iv_cor_variance

Numeric. Variance of correlations when optimising (ignored if iv_cormatrix provided). Default = 0.01

iv_cor_range

Numeric vector of length 2. Min and max constraints on correlations when optimising. Default = c(-0.7, 0.7)

iv_means

Numeric vector of means for IVs (length k)

iv_sds

Numeric vector of standard deviations for IVs (length k)

dv_mean

Numeric. Mean of dependent variable

dv_sd

Numeric. Standard deviation of dependent variable

lowerbound_iv

Numeric vector of lower bounds for each IV scale (or single value for all)

upperbound_iv

Numeric vector of upper bounds for each IV scale (or single value for all)

lowerbound_dv

Numeric. Lower bound for DV scale

upperbound_dv

Numeric. Upper bound for DV scale

items_iv

Integer vector of number of items per IV scale (or single value for all). Default = 1

items_dv

Integer. Number of items in DV scale. Default = 1

var_names

Character vector of variable names (length k+1: IVs then DV)

tolerance

Numeric. Acceptable deviation from target R-squared (default 0.005)

Value

A list containing:

data

Generated dataframe with k IVs and 1 DV

target_stats

List of target statistics provided

achieved_stats

List of achieved statistics from generated data

diagnostics

Comparison of target vs achieved

iv_dv_cors

Calculated correlations between IVs and DV

full_cormatrix

The complete (k+1) x (k+1) correlation matrix used

optimisation_info

If IV correlations were optimised, details about the optimisation

Details

Generate regression data from summary statistics

The function can operate in two modes:

Mode 1: With IV correlation matrix provided

When iv_cormatrix is provided, the function uses the given correlation structure among independent variables and calculates the implied IV-DV correlations from the regression coefficients.

Mode 2: With optimisation (IV correlation matrix not provided)

When iv_cormatrix = NULL, the function optimises to find a plausible correlation structure among independent variables that matches the reported regression statistics. Initial correlations are sampled using Fisher's z-transformation to ensure proper distribution, then iteratively adjusted to match the target R-squared.

The function generates Likert-scale data (not individual items) using lfast() for each variable with specified moments, then correlates them using lcor(). Generated data are verified by running a regression and comparing achieved statistics with targets.

See also

lfast for generating individual rating-scale vectors with exact moments.

lcor for rearranging values to achieve target correlations.

makeCorrAlpha for generating correlation matrices from Cronbach's Alpha.

Examples


# Example 1: With provided IV correlation matrix
set.seed(123)
iv_corr <- matrix(c(1.0, 0.3, 0.3, 1.0), nrow = 2)

result1 <- makeScalesRegression(
  n = 64,
  beta_std = c(0.4, 0.3),
  r_squared = 0.35,
  iv_cormatrix = iv_corr,
  iv_means = c(3.0, 3.5),
  iv_sds = c(1.0, 0.9),
  dv_mean = 3.8,
  dv_sd = 1.1,
  lowerbound_iv = 1,
  upperbound_iv = 5,
  lowerbound_dv = 1,
  upperbound_dv = 5,
  items_iv = 4,
  items_dv = 4,
  var_names = c("Attitude", "Intention", "Behaviour")
)
#> Warning: Predicted R-squared (0.3220) differs from target (0.3500) by 0.0280,
#>         which exceeds tolerance (0.0050).
#>         
#> Input statistics may be inconsistent.
#> best solution in 397 iterations
#> best solution in 1507 iterations
#> best solution in 162 iterations

print(result1)
#> Regression Data Generation Results
#> ===================================
#> 
#> Sample size: 64 
#> Number of IVs: 2 
#> 
#> IV Correlation Matrix: PROVIDED
#> 
#> Key Statistics:
#> ---------------
#> Target R-squared:   0.3500
#> Achieved R-squared: 0.3217
#> Difference:         -0.0283
#> 
#> Regression Coefficients (Standardised):
#>   Variable Target Achieved   Diff
#>   Attitude    0.4   0.4000  0e+00
#>  Intention    0.3   0.2995 -5e-04
#> 
#> For full diagnostics, see $diagnostics
#> For generated data, see $data
head(result1$data)
#>   Attitude Intention Behaviour
#> 1     2.75      1.75      1.50
#> 2     3.25      3.25      2.75
#> 3     2.50      2.25      3.25
#> 4     2.25      3.25      4.50
#> 5     2.25      5.00      4.25
#> 6     3.00      3.50      3.75


# Example 2: With optimisation (no IV correlation matrix)
set.seed(456)
result2 <- makeScalesRegression(
  n = 128,
  beta_std = c(0.3, 0.25, 0.2),
  r_squared = 0.40,
  iv_cormatrix = NULL, # Will be optimised
  iv_cor_mean = 0.3,
  iv_cor_variance = 0.02,
  iv_means = c(3.0, 3.2, 2.8),
  iv_sds = c(1.0, 0.9, 1.1),
  dv_mean = 3.5,
  dv_sd = 1.0,
  lowerbound_iv = 1,
  upperbound_iv = 5,
  lowerbound_dv = 1,
  upperbound_dv = 5,
  items_iv = 4,
  items_dv = 5
)
#> IV correlation matrix not provided.
#>             
#> Optimising to find plausible structure...
#> Optimisation converged after 7 iterations
#>       
#> (R-sq target: 0.4000, achieved in optimisation: 0.4020)
#> best solution in 5083 iterations
#> best solution in 669 iterations
#> best solution in 1675 iterations
#> best solution in 423 iterations

# View optimised correlation matrix
print(result2$target_stats$iv_cormatrix)
#>           [,1]      [,2]      [,3]
#> [1,] 1.0000000 0.4117389 0.6615179
#> [2,] 0.4117389 1.0000000 0.6832584
#> [3,] 0.6615179 0.6832584 1.0000000
print(result2$optimisation_info)
#> $converged
#> [1] TRUE
#> 
#> $iterations
#> [1] 7
#> 
#> $achieved_r_squared_in_optimisation
#> [1] 0.4019688
#> 
#> $iv_cor_mean_used
#> [1] 0.3
#> 
#> $iv_cor_variance_used
#> [1] 0.02
#> 
#> $iv_cor_range_used
#> [1] -0.7  0.7
#>