Reproduce Repeated-Measures Data from ANOVA Summary Statistics

Reconstructs a synthetic dataset and inter-timepoint correlation matrix from a repeated-measures ANOVA result, based on reported means, standard deviations, and an F-statistic. This is useful when only summary statistics are available from published studies.

Usage

makeRepeated(
  n,
  k,
  means,
  sds,
  f_stat,
  df_between = k - 1,
  df_within = (n - 1) * (k - 1),
  structure = c("cs", "ar1", "toeplitz"),
  names = paste0("time_", 1:k),
  items = 1,
  lowerbound = 1,
  upperbound = 5,
  return_corr_only = FALSE,
  diagnostics = FALSE,
  ...
)

Arguments

n: Integer. Sample size used in the original study.
k: Integer. Number of repeated measures (timepoints).
means: Numeric vector of length k. Mean values reported for each timepoint.
sds: Numeric vector of length k. Standard deviations reported for each timepoint.
f_stat: Numeric. The reported repeated-measures ANOVA F-statistic for the within-subjects factor.
df_between: Degrees of freedom between conditions (default: k - 1).
df_within: Degrees of freedom within-subjects (default: (n - 1) * (k - 1)).
structure: Character. Correlation structure to assume: "cs", "ar1", or "toeplitz" (default).
names: Character vector of length k. Variable names for each timepoint (default: "time_1" to "time_k").
items: Integer. Number of items used to generate each scale score (passed to lfast).
lowerbound,: Integer. Lower bounds for Likert-type response scales (default: 1).
upperbound,: Integer. upper bounds for Likert-type response scales (default: 5).
return_corr_only: Logical. If TRUE, return only the estimated correlation matrix.
diagnostics: Logical. If TRUE, include diagnostic summaries such as feasible F-statistic range and effect sizes.
...: Reserved for future use.

Value

A named list with:

data: A data frame of simulated repeated-measures responses (unless return_corr_only = TRUE).
correlation_matrix: The estimated inter-timepoint correlation matrix.
structure: The correlation structure assumed.
achieved_f: The F-statistic produced by the estimated rho value (if diagnostics = TRUE).
feasible_f_range: Minimum and maximum achievable F-values under the structure (if diagnostics requested).
recommended_f: Conservative, moderate, and strong F-statistic suggestions for similar designs.
effect_size_raw: Unstandardized effect size across timepoints.
effect_size_standardised: Effect size standardized by average variance.

Details

This function estimates the average correlation between repeated measures by matching the reported F-statistic, under one of three assumed correlation structures:

"cs" (Compound Symmetry): Assumes all timepoints are equally correlated. Common in standard RM-ANOVA settings.
"ar1" (First-Order Autoregressive): Assumes correlations decay exponentially with time lag — higher correlation for closer timepoints.
"toeplitz" (Linearly Decreasing): Assumes correlation declines linearly with time lag, offering a middle ground between "cs" and "ar1".

The function then generates a data frame of synthetic item-scale ratings using lfast, and adjusts them to match the estimated correlation structure using lcor.

Set return_corr_only = TRUE to extract only the estimated correlation matrix.

Examples


set.seed(123)
out1 <- makeRepeated(
  n = 128,
  k = 3,
  means = c(3.1, 3.5, 3.9),
  sds = c(1.0, 1.1, 1.0),
  items = 4,
  f_stat = 4.87,
  structure = "cs",
  diagnostics = FALSE
)
#> Warning: Optimization may not have converged. Check results carefully.
#> best solution in 759 iterations
#> best solution in 13245 iterations
#> best solution in 276 iterations

head(out1$data)
#>   time_1 time_2 time_3
#> 1   3.75   4.50   2.00
#> 2   3.75   2.00   4.75
#> 3   4.75   4.25   1.75
#> 4   3.75   1.50   5.00
#> 5   2.25   4.75   3.75
#> 6   3.50   2.00   5.00
out1$correlation_matrix
#>            time_1     time_2     time_3
#> time_1  1.0000000 -0.4899454 -0.4899454
#> time_2 -0.4899454  1.0000000 -0.4899454
#> time_3 -0.4899454 -0.4899454  1.0000000


out2 <- makeRepeated(
  n = 32, k = 4,
  means = c(2.75, 3.5, 4.0, 4.4),
  sds = c(0.8, 1.0, 1.2, 1.0),
  f_stat = 16,
  structure = "ar1",
  items = 5,
  lowerbound = 1, upperbound = 7,
  return_corr_only = FALSE,
  diagnostics = TRUE
)
#> reached maximum of 1024 iterations
#> reached maximum of 1024 iterations
#> reached maximum of 1024 iterations
#> reached maximum of 1024 iterations

print(out2)
#> $data
#>    time_1 time_2 time_3 time_4
#> 1     1.6    3.0    4.6    4.0
#> 2     1.8    4.4    5.2    5.8
#> 3     2.2    2.2    3.8    2.8
#> 4     2.6    4.6    6.4    3.6
#> 5     2.4    2.0    2.2    3.4
#> 6     3.4    4.8    3.4    3.4
#> 7     3.4    4.4    3.0    4.2
#> 8     2.4    2.8    2.2    4.4
#> 9     1.8    3.2    5.0    4.8
#> 10    3.0    4.0    5.2    6.2
#> 11    2.6    2.2    3.6    4.6
#> 12    1.8    2.4    2.4    4.2
#> 13    2.6    2.4    5.2    5.2
#> 14    3.6    3.2    3.4    3.4
#> 15    3.2    4.4    5.0    3.6
#> 16    2.6    3.0    4.8    3.2
#> 17    2.8    5.6    4.2    3.6
#> 18    3.4    5.2    5.2    4.8
#> 19    4.6    3.6    5.2    5.6
#> 20    1.6    3.4    2.4    5.2
#> 21    2.6    4.6    5.4    5.2
#> 22    5.0    3.6    3.4    4.2
#> 23    3.0    3.6    4.0    4.4
#> 24    2.4    2.4    3.2    2.6
#> 25    4.0    4.8    4.0    5.2
#> 26    3.0    2.4    4.4    5.8
#> 27    2.2    3.2    3.0    5.0
#> 28    2.0    2.2    2.4    4.0
#> 29    2.4    3.2    6.0    6.6
#> 30    2.2    2.8    2.8    3.0
#> 31    2.6    4.2    2.4    4.6
#> 32    3.2    4.2    4.4    4.2
#> 
#> $correlation_matrix
#>            time_1    time_2    time_3     time_4
#> time_1 1.00000000 0.3910032 0.1528835 0.05977794
#> time_2 0.39100319 1.0000000 0.3910032 0.15288350
#> time_3 0.15288350 0.3910032 1.0000000 0.39100319
#> time_4 0.05977794 0.1528835 0.3910032 1.00000000
#> 
#> $structure
#> [1] "ar1"
#> 
#> $feasible_f_range
#>       min       max 
#>  9.353034 39.481390 
#> 
#> $recommended_f
#> $recommended_f$conservative
#> [1] 10.21
#> 
#> $recommended_f$moderate
#> [1] 11.91
#> 
#> $recommended_f$strong
#> [1] 30.29
#> 
#> 
#> $achieved_f
#> [1] 15.99983
#> 
#> $effect_size_raw
#> [1] 0.3792188
#> 
#> $effect_size_standardised
#> [1] 0.3717831
#> 


out3 <- makeRepeated(
  n = 32, k = 4,
  means = c(2.0, 2.5, 3.0, 2.8),
  sds = c(0.8, 0.9, 1.0, 0.9),
  items = 4,
  f_stat = 24,
  structure = "toeplitz",
  diagnostics = TRUE
)
#> Warning: Optimization may not have converged. Check results carefully.
#> best solution in 750 iterations
#> reached maximum of 1024 iterations
#> reached maximum of 1024 iterations
#> reached maximum of 1024 iterations

str(out3)
#> List of 8
#>  $ data                    :'data.frame':	32 obs. of  4 variables:
#>   ..$ time_1: num [1:32] 2.5 1.75 1.75 1.25 3.25 1.5 1.5 1.25 3 1.5 ...
#>   ..$ time_2: num [1:32] 4 2.5 3 1.5 3.25 1.5 2.5 2.25 3.25 1.5 ...
#>   ..$ time_3: num [1:32] 4 3 4 1.75 3.75 2.5 3.5 4 2.25 1.25 ...
#>   ..$ time_4: num [1:32] 3.5 2.75 3.75 2.5 3.5 3.25 3.75 4.5 2.5 1.25 ...
#>  $ correlation_matrix      : num [1:4, 1:4] 1 0.66 0.33 0 0.66 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : chr [1:4] "time_1" "time_2" "time_3" "time_4"
#>   .. ..$ : chr [1:4] "time_1" "time_2" "time_3" "time_4"
#>  $ structure               : chr "toeplitz"
#>  $ feasible_f_range        : Named num [1:2] 5.57 8.64
#>   ..- attr(*, "names")= chr [1:2] "min" "max"
#>  $ recommended_f           :List of 3
#>   ..$ conservative: num 5.59
#>   ..$ moderate    : num 5.62
#>   ..$ strong      : num 7.64
#>  $ achieved_f              : num 9.95
#>  $ effect_size_raw         : num 0.142
#>  $ effect_size_standardised: num 0.174