Help for package r2redux

Title:

R2 Statistic

Version:

1.0.18

Description:

R2 statistic for significance test. Variance and covariance of R2 values used to assess the 95% CI and p-value of the R2 difference.

License:

GPL (≥ 3)

URL:

https://github.com/mommy003/r2redux

Encoding:

UTF-8

RoxygenNote:

7.1.2

NeedsCompilation:

Depends:

R (≥ 2.10)

LazyData:

true

Suggests:

testthat (≥ 3.0.0)

Config/testthat/edition:

Packaged:

2025-02-15 05:10:42 UTC; alh-admmdm

Author:

Hong Lee [aut, cph], Moksedul Momin [aut, cre, cph]

Maintainer:

Moksedul Momin <cvasu.momin@gmail.com>

Repository:

CRAN

Date/Publication:

2025-02-15 05:30:02 UTC

cc_trf function

Description

This function transforms the predictive ability (R2) and its standard error (se) between the observed scale and liability scale

Usage

cc_trf(R2, se, K, P)

Arguments

R2

R2 or coefficient of determination on the observed or liability scale

se

Standard error of R2

K

Population prevalence

P

The ratio of cases in the study samples

Value

This function will transform the R2 and its s.e between observed scale and liability scale.Output from the command is the lists of outcomes.

R2l

Transformed R2 on the liability scale

sel

Transformed se on the liability scale

R2O

Transformed R2 on the observed scale

seO

Transformed se on the observed scale

References

Lee, S. H., Goddard, M. E., Wray, N. R., and Visscher, P. M. A better coefficient of determination for genetic profile analysis. Genetic epidemiology,(2012). 36(3): p. 214-224.

Examples

#To get the transformed R2

output=cc_trf(0.06, 0.002, 0.05, 0.05)
output

#output$R2l (transformed R2 on the liability scale)
#0.2679337

#output$sel (transformed se on the liability scale)
#0.008931123

#output$R2O (transformed R2 on the observed scale)
#0.01343616

#output$seO (transformed se on the observed scale)
#0.000447872

Phenotypes and 10 sets of PGSs

Description

A dataset containing phenotypes and multiple PGSs estimated from 10 sets of SNPs according to GWAS p-value thresholds

Usage

dat1

Format

A data frame with 1000 rows and 11 variables:

V1: Phenotype, value
V2: PGS1, for p value threshold <=1
V3: PGS2, for p value threshold <=0.5
V4: PGS3, for p value threshold <=0.4
V5: PGS4, for p value threshold <=0.3
V6: PGS5, for p value threshold <=0.2
V7: PGS6, for p value threshold <=0.1
V8: PGS7, for p value threshold <=0.05
V9: PGS8, for p value threshold <=0.01
V10: PGS9, for p value threshold <=0.001
V11: PGS10, for p value threshold <=0.0001

Phenotypes and 2 sets of PGSs

Description

A dataset containing phenotypes and 2 sets of PGSs estimated from 2 sets of SNPs from regulatroy and non-regulatory genomic regions

Usage

dat2

Format

A data frame with 1000 rows and 3 variables:

V1: Phenotype
V2: PGS1, regulatory region
V3: PGS2, non-regulatory region

olkin12_1 function

Description

olkin12_1 function

Usage

olkin12_1(omat, nv)

Arguments

omat

3 by 3 matrix having the correlation coefficients between y, x1 and x2, i.e. omat=cor(dat) where dat is N by 3 matrix having variables in the order of cbind (y,x1,x2)

nv

Sample size

Value

This function will be used as source code

olkin12_13 function

Description

olkin12_13 function

Usage

olkin12_13(omat, nv)

Arguments

omat

3 by 3 matrix having the correlation coefficients between y, x1 and x2, i.e. omat=cor(dat) where dat is N by 3 matrix having variables in the order of cbind (y,x1,x2)

nv

Sample size

Value

This function will be used as source code

olkin12_3 function

Description

olkin12_3 function

Usage

olkin12_3(omat, nv)

Arguments

omat

3 by 3 matrix having the correlation coefficients between y, x1 and x2, i.e. omat=cor(dat) where dat is N by 3 matrix having variables in the order of cbind (y,x1,x2)

nv

Sample size

Value

This function will be used as source code

olkin12_34 function

Description

olkin12_34 function

Usage

olkin12_34(omat, nv)

Arguments

omat

3 by 3 matrix having the correlation coefficients between y, x1 and x2, i.e. omat=cor(dat) where dat is N by 3 matrix having variables in the order of cbind (y,x1,x2)

nv

Sample size

Value

This function will be used as source code

olkin1_2 function

Description

olkin1_2 function

Usage

olkin1_2(omat, nv)

Arguments

omat

3 by 3 matrix having the correlation coefficients between y, x1 and x2, i.e. omat=cor(dat) where dat is N by 3 matrix having variables in the order of cbind (y,x1,x2)

nv

Sample size

Value

This function will be used as source code

olkin_beta1_2 function

Description

This function derives Information matrix for beta1^2 and beta2^2 where beta1 and 2 are regression coefficients from a multiple regression model, i.e. y = x1 * beta1 + x2 * beta2 + e, where y, x1 and x2 are column-standardised, (i.e. in the context of correlation coefficients,see Olkin and Finn 1995).

Usage

olkin_beta1_2(omat, nv)

Arguments

omat

3 by 3 matrix having the correlation coefficients between y, x1 and x2, i.e. omat=cor(dat) where dat is N by 3 matrix having variables in the order of cbind (y,x1,x2)

nv

Sample size

Value

This function will give information (variance-covariance) matrix of beta1^2 and beta2^2.To get information (variance-covariance) matrix of beta1^2 and beta2^2. Where beta1 and beta2 are regression coefficients from a multiple regression model. The outputs are listed as follows.

info

2x2 information (variance-covariance) matrix

var1

Variance of beta1_2

var2

Variance of beta2_2

var1_2

Variance of difference between beta1_2 and beta2_2

References

Olkin, I. and Finn, J.D. Correlations redux. Psychological Bulletin, 1995. 118(1): p. 155.

Examples

#To get information (variance-covariance) matrix of beta1_2 and beta2_2 where 
#beta1 and 2 are regression coefficients from a multiple regression model.
dat=dat1
omat=cor(dat)[1:3,1:3]
#omat
#1.0000000 0.1958636 0.1970060
#0.1958636 1.0000000 0.9981003
#0.1970060 0.9981003 1.0000000

nv=length(dat$V1)
output=olkin_beta1_2(omat,nv) 
output

#output$info (2x2 information (variance-covariance) matrix)
#0.04146276 0.08158261
#0.08158261 0.16111124             

#output$var1 (variance of beta1_2)
#0.04146276 
            
#output$var2 (variance of beta2_2)
#0.1611112

#output$var1_2 (variance of difference between beta1_2 and beta2_2)
#0.03940878

olkin_beta_inf function

Description

This function derives Information matrix for beta1 and beta2 where beta1 and 2 are regression coefficients from a multiple regression model, i.e. y = x1 * beta1 + x2 * beta2 + e, where y, x1 and x2 are column-standardised (see Olkin and Finn 1995).

Usage

olkin_beta_inf(omat, nv)

Arguments

omat

3 by 3 matrix having the correlation coefficients between y, x1 and x2, i.e. omat=cor(dat) where dat is N by 3 matrix having variables in the order of cbind (y,x1,x2)

nv

Sample size

Value

This function will generate information (variance-covariance) matrix of beta1 and beta2.The outputs are listed as follows.

info

2x2 information (variance-covariance) matrix

var1

Variance of beta1

var2

Variance of beta2

var1_2

Variance of difference between beta1 and beta2

References

Olkin, I. and Finn, J.D. Correlations redux. Psychological Bulletin, 1995. 118(1): p. 155.

Examples

#To get information (variance-covariance) matrix of beta1 and beta2 where 
#beta1 and 2 are regression coefficients from a multiple regression model.
dat=dat1
omat=cor(dat)[1:3,1:3]
#omat
#1.0000000 0.1958636 0.1970060
#0.1958636 1.0000000 0.9981003
#0.1970060 0.9981003 1.0000000

nv=length(dat$V1)
output=olkin_beta_inf(omat,nv)
output 

#output$info (2x2 information (variance-covariance) matrix)
#0.2531406 -0.2526212
#-0.2526212  0.2530269            

#output$var1 (variance of beta1)
#0.2531406
            
#output$var2 (variance of beta2)
#0.2530269

#output$var1_2 (variance of difference between beta1 and beta2)
#1.01141

olkin_beta_ratio function

Description

This function derives variance of beta1^2 / R^2 where beta1 and 2 are regression coefficients from a multiple regression model, i.e. y = x1 * beta1 + x2 * beta2 + e, where y, x1 and x2 are column-standardised (see Olkin and Finn 1995).

Usage

olkin_beta_ratio(omat, nv)

Arguments

omat

3 by 3 matrix having the correlation coefficients between y, x1 and x2, i.e. omat=cor(dat) where dat is N by 3 matrix having variables in the order of cbind (y,x1,x2)

nv

sampel size

Value

This function will generate the variance of the proportion, i.e. beta1_2/R^2.The outputs are listed as follows.

ratio_var

Variance of ratio

References

Olkin, I. and Finn, J.D. Correlations redux. Psychological Bulletin, 1995. 118(1): p. 155.

Examples

#To get information (variance-covariance) matrix of beta1 and beta2 where 
#beta1 and 2 are regression coefficients from a multiple regression model.
dat=dat2
omat=cor(dat)[1:3,1:3]
#omat
#1.0000000 0.1497007 0.136431
#0.1497007 1.0000000 0.622790
#0.1364310 0.6227900 1.000000

nv=length(dat$V1)
output=olkin_beta_ratio(omat,nv)
output 

#r2redux output

#output$ratio_var (Variance of ratio)
#0.08042288

r2_beta_var

Description

This function estimates var(beta1^2) and (beta2^2), and beta1 and 2 are regression coefficients from a multiple regression model, i.e. y = x1 * beta1 + x2 * beta2 +e, y, x1 and x2 are column-standardised (see Olkin and Finn 1995). y is N by 1 matrix having the dependent variable, x1 is N by 1 matrix having the ith explanatory variable. x2 is N by 1 matrix having the jth explanatory variable. v1 and v2 indicates the ith and jth column in the data (v1 or v2 should be a single interger between 1 - M, see Arguments below).

Usage

r2_beta_var(dat, v1, v2, nv)

Arguments

dat

N by (M+1) matrix having variables in the order of cbind(y,x)

v1

This can be set as v1=1, v1=2, v1=3 or any value between 1 - M based on combination

v2

This can be set as v2=1, v2=2, v2=3, or any value between 1 - M based on combination

nv

Sample size

Value

This function will estiamte the variance of beta1^2 and beta2^2, and the covariance between beta1^2 and beta2^2, i.e. the information matrix of squared regression coefficients. beta1 and beta2 are regression coefficients from a multiple regression model, i.e. y = x1 * beta1 + x2 * beta2 +e, where y, x1 and x2 are column-standardised. The outputs are listed as follows.

beta1_sq

beta1_sq

beta2_sq

beta2_sq

var1

Variance of beta1_sq

var2

Variance of beta2_sq

var1_2

Variance of difference between beta1_sq and beta2_sq

cov

Covariance between beta1_sq and beta2_sq

upper_beta1_sq

upper limit of 95% CI for beta1_sq

lower_beta1_sq

lower limit of 95% CI for beta1_sq

upper_beta2_sq

upper limit of 95% CI for beta2_sq

lower_beta2_sq

lower limit of 95% CI for beta2_sq

References

Olkin, I. and Finn, J.D. Correlations redux. Psychological Bulletin, 1995. 118(1): p. 155.

Examples

#To get the 95% CI of beta1_sq and beta2_sq
#beta1 and beta2 are regression coefficients from a multiple regression model,
#i.e. y = x1 * beta1 + x2 * beta2 +e, where y, x1 and x2 are column-standardised.

dat=dat2
nv=length(dat$V1)
v1=c(1)
v2=c(2)
output=r2_beta_var(dat,v1,v2,nv)
output
#r2redux output
#output$beta1_sq (beta1_sq)
#0.01118301

#output$beta2_sq (beta2_sq)
#0.004980285

#output$var1 (variance of beta1_sq)
#7.072931e-05

#output$var2 (variance of beta2_sq)
#3.161929e-05

#output$var1_2 (variance of difference between beta1_sq and beta2_sq)
#0.000162113

#output$cov (covariance between beta1_sq and beta2_sq)
#-2.988221e-05

#output$upper_beta1_sq (upper limit of 95% CI for beta1_sq)
#0.03037793

#output$lower_beta1_sq (lower limit of 95% CI for beta1_sq)
#-0.00123582

#output$upper_beta2_sq (upper limit of 95% CI for beta2_sq)
#0.02490076

#output$lower_beta2_sq (lower limit of 95% CI for beta2_sq)
#-0.005127546

r2_diff function

Description

This function estimates var(R2(y~x[,v1]) - R2(y~x[,v2])) where R2 is the R squared value of the model, y is N by 1 matrix having the dependent variable, and x is N by M matrix having M explanatory variables. v1 or v2 indicates the ith column in the x matrix (v1 or v2 can be multiple values between 1 - M, see Arguments below)

Usage

r2_diff(dat, v1, v2, nv)

Arguments

dat

N by (M+1) matrix having variables in the order of cbind(y,x)

v1

This can be set as v1=c(1) or v1=c(1,2)

v2

This can be set as v2=c(2), v2=c(3), v2=c(1,3) or v2=c(3,4)

nv

Sample size

Value

This function will estimate significant difference between two PGS (either dependent or independent and joint or single). To get the test statistics for the difference between R2(y~x[,v1]) and R2(y~x[,v2]). (here we define R2_1=R2(y~x[,v1])) and R2_2=R2(y~x[,v2]))). The outputs are listed as follows.

rsq1

R2_1

rsq2

R2_2

var1

Variance of R2_1

var2

variance of R2_2

var_diff

Variance of difference between R2_1 and R2_2

r2_based_p

two tailed P-value for significant difference between R2_1 and R2_2

r2_based_p_one_tail

one tailed P-value for significant difference

mean_diff

Differences between R2_1 and R2_2

upper_diff

Upper limit of 95% CI for the difference

lower_diff

Lower limit of 95% CI for the difference

Examples

#To get the test statistics for the difference between R2(y~x[,1]) and 
#R2(y~x[,2]). (here we define R2_1=R2(y~x[,1])) and R2_2=R2(y~x[,2])))

dat=dat1
nv=length(dat$V1)
v1=c(1)
v2=c(2)
output=r2_diff(dat,v1,v2,nv)
output

#r2redux output

#output$rsq1 (R2_1)
#0.03836254

#output$rsq2 (R2_2)
#0.03881135

#output$var1 (variance of R2_1)
#0.0001436128

#output$var2 (variance of R2_2)
#0.0001451358

#output$var_diff (variance of difference between R2_1 and R2_2)
#5.678517e-07

#output$r2_based_p (two tailed p-value for significant difference)
#0.5514562

#output$r2_based_p_one_tail(one tailed p-value for significant difference)
#0.2757281

#output$mean_diff (differences between R2_1 and R2_2)
#-0.0004488044

#output$upper_diff (upper limit of 95% CI for the difference)
#0.001028172

#output$lower_diff (lower limit of 95% CI for the difference)
#-0.001925781

#output$$p$nested
#1

#output$$p$nonnested
#0.5514562

#output$$p$LRT
#1

#To get the test statistics for the difference between R2(y~x[,1]+x[,2]) and 
#R2(y~x[,2]). (here R2_1=R2(y~x[,1]+x[,2]) and R2_2=R2(y~x[,1]))

dat=dat1
nv=length(dat$V1)
v1=c(1,2)
v2=c(1)
output=r2_diff(dat,v1,v2,nv)

#r2redux output

#output$rsq1 (R2_1)
#0.03896678

#output$rsq2 (R2_2)
#0.03836254

#output$var1 (variance of R2_1)
#0.0001473686

#output$var2 (variance of R2_2)
#0.0001436128

#output$var_diff (variance of difference between R2_1 and R2_2)
#2.321425e-06

#output$r2_based_p (p-value for significant difference between R2_1 and R2_2)
#0.4366883

#output$mean_diff (differences between R2_1 and R2_2)
#0.0006042383

#output$upper_diff (upper limit of 95% CI for the difference)
#0.00488788

#output$lower_diff (lower limit of 95% CI for the difference)
#-0.0005576171


#Note: If the directions are not consistent, for instance, if one correlation 
#is positive (R_1) and another is negative (R_2), or vice versa, it is crucial
#to approach the interpretation of the comparative test with caution. 
#It's important to note that R^2 alone does not provide information about the 
#direction or sign of the relationships between predictors and the response variable. 


#When faced with multiple predictors common between two models, for example, 
#y = any_cov1 + any_cov2 + ... + any_covN  + e   vs.
#y = PRS + any_cov1 + any_cov2 +...+ any_covN + e

#A more streamlined approach can be adopted by consolidating the various
#predictors into a single predictor (see R code below).

#R
#dat=dat1
#here let's assume, we wanted to test one PRS (dat$V2) 
#with 5 covariates (dat$V7 to dat$V11)
#mod1 <- lm(dat$V1~dat$V2 + dat$V7+ dat$V8+ dat$V9+ dat$V10+ dat$V11)
#merged_predictor1 <- mod1$fitted.values
#mod2 <- lm(dat$V1~ dat$V7+ dat$V8+ dat$V9+ dat$V10+ dat$V11)
#merged_predictor2 <- mod2$fitted.values
#dat=data.frame(dat$V1,merged_predictor1,merged_predictor2)

#the comparison can be equivalently expressed as:
#y = merged_predictor1 + e   vs. 
#y = merged_predictor2 + e
  
#This comparison can be simply achieved using the r2_diff function, e.g.

#To get the test statistics for the difference between R2(y~x[,1]) and
#R2(y~x[,2]). (here x[,1]= merged_predictor2 (from full model), 
#and x[,2]= merged_predictor1(from reduced model))
#v1=c(1)
#v2=c(2)
#output=r2_diff(dat,v1,v2,nv)
#note that the merged predictor from the full model (v1) should be the first.

#str(output)
#List of 11
#$ rsq1               : num 0.0428
#$ rsq2               : num 0.042
#$ var1               : num 0.0.000158
#$ var2               : num 0.0.000156
#$ var_diff           : num 2.87e-06
#$ r2_based_p         : num 0.658
#$ r2_based_p_one_tail: num 0.329
#$ mean_diff          : num 0.000751
#$ upper_diff         : num 0.00407
#$ lower_diff         : num -0.00257
#$ p                  :List of 3
#..$ nested   : num 0.386
#..$ nonnested: num 0.658
#..$ LRT      : num 0.376


#Importantly note that in this case, merged_predictor1 is nested within
#merged_predictor2 (see mod1 vs. mod2 above). Therefore, this is 
#nested model comparison. So, output$p$nested (0.386) should be used 
#instead of output$p$nonnested (0.658). 
#Note that r2_based_p is the same as output$p$nonnested (0.658) here. 
  

##For this scenario, alternatively, the outcome variable (y) can be preadjusted
#with covariate(s), following the procedure in R:

#mod <- lm(y ~ any_cov1 + any_cov2 + ... + any_covN)
#y_adj=scale(mod$residuals)
#then, the comparative significance test can be approximated by using
#the following model y_adj = PRS (r2_var(dat, v1, nv)) 

#R
#dat=dat1
#mod <- lm(dat$V1~dat$V7+ dat$V8+ dat$V9+ dat$V10+ dat$V11)
#y_adj=scale(mod$residuals)
#dat=data.frame(y_adj,dat$V2)
#v1=c(1)
#output=r2_var(dat, v1, nv)

#str(output)
#$ var       : num 2e-06
#$ LRT_p     :Class 'logLik' : 0.98 (df=2)
#$ r2_based_p: num 0.977
#$ rsq       : num 8.21e-07
#$ upper_r2  : num 0.00403
#$ lower_r2  : num -0.000999


#In another scenario where the same covariates, but different
#PRS1 and PRS2 are compared,   
#y = PRS1 + any_cov1 + any_cov2 + ... + any_covN  + e   vs.
#y = PRS2 + any_cov1 + any_cov2 + ... + any_covN + e
     
#following approach can be employed (see R code below).
   
#R
#dat=dat1
#here let's assume dat$V2 as PRS1, dat$V3 as PRS2 and dat$V7 to dat$V11 as covariates
#mod1 <- lm(dat$V1~dat$V2 + dat$V7+ dat$V8+ dat$V9+ dat$V10+ dat$V11)
#merged_predictor1 <- mod1$fitted.values
#mod2 <- lm(dat$V1~dat$V3 + dat$V7+ dat$V8+ dat$V9+ dat$V10+ dat$V11)
#merged_predictor2 <- mod2$fitted.values
#dat=data.frame(dat$V1,merged_predictor2,merged_predictor1)
  
#the comparison can be equivalently expressed as:
#y = merged_predictor1 + e   vs. 
#y = merged_predictor2 + e

#This comparison can be simply achieved using the r2_diff function, e.g.

#To get the test statistics for the difference between R2(y~x[,1]) and
#R2(y~x[,2]). (here x[,1]= merged_predictor2, and x[,2]= merged_predictor1)
#v1=c(1)
#v2=c(2)
#output=r2_diff(dat,v1,v2,nv)

#str(output)
#List of 11
#$ rsq1               : num 0.043
#$ rsq2               : num 0.0428
#$ var1               : num 0.000159
#$ var2               : num 0.000158
#$ var_diff           : num 2.6e-07
#$ r2_based_p         : num 0.657
#$ r2_based_p_one_tail: num 0.328
#$ mean_diff          : num 0.000227
#$ upper_diff         : num 0.00123
#$ lower_diff         : num 0.000773
#$ p                  :List of 3
#..$ nested   : num 0.634
#..$ nonnested: num 0.657
#..$ LRT      : num 0.627

#Importantly note that in this case, merged_predictor1 and merged_predictor2 
#are not nested to each other (see mod1 vs. mod2 above). 
#Therefore, this is nonnested model comparison. 
#So, output$p$nonnested (0.657) should be used instead of 
#output$p$nested (0.634). Note that r2_based_p is the same 
#as output$p$nonnested (0.657) here. 


#For the above non-nested scenario, alternatively, the outcome variable (y) 
#can be preadjusted with covariate(s), following the procedure in R:
#mod <- lm(y ~ any_cov1 + any_cov2 + ... + any_covN)
#y_adj=scale(mod$residuals)

#R
#dat=dat1
#mod <- lm(dat$V1~dat$V7+ dat$V8+ dat$V9+ dat$V10+ dat$V11)
#y_adj=scale(mod$residuals)
#dat=data.frame(y_adj,dat$V3,dat$V2)

#the comparison can be equivalently expressed as:
#y_adj = PRS1 + e   vs. 
#y_adj = PRS2 + e
#then, the comparative significance test can be approximated by using r2_diff function
#To get the test statistics for the difference between R2(y~x[,1]) and
#R2(y~x[,2]). (here x[,1]= PRS1 and x[,2]= PRS2)
#v1=c(1)
#v2=c(2)
#output=r2_diff(dat,v1,v2,nv)

#str(output)
#List of 11
#$ rsq1               : num 5.16e-05
#$ rsq2               : num 4.63e-05
#$ var1               : num 2.21e-06
#$ var2               : num 2.18e-06
#$ var_diff           : num 1.31e-09
#$ r2_based_p         : num 0.884
#$ r2_based_p_one_tail: num 0.442
#$ mean_diff          : num 5.28e-06
#$ upper_diff         : num 7.63e-05
#$ lower_diff         : num -6.57e-05
#$ p                  :List of 3
#..$ nested   : num 0.942
#..$ nonnested: num 0.884
#..$ LRT      : num 0.942

r2_enrich_beta

Description

This function estimates var(beta1^2/R^2), beta1 and R^2 are regression coefficient and the coefficient of determination from a multiple regression model, i.e. y = x1 * beta1 + x2 * beta2 +e, where y, x1 and x2 are column-standardised (see Olkin and Finn 1995). y is N by 1 matrix having the dependent variable, and x1 is N by 1 matrix having the ith explanatory variables. x2 is N by 1 matrix having the jth explanatory variables. v1 and v2 indicates the ith and jth column in the data (v1 or v2 should be a single interger between 1 - M, see Arguments below).

Usage

r2_enrich_beta(dat, v1, v2, nv, exp1)

Arguments

dat

N by (M+1) matrix having variables in the order of cbind(y,x)

v1

These can be set as v1=1, v1=2, v1=3 or any value between 1 - M based on combination

v2

These can be set as v2=1, v2=2, v2=3, or any value between 1 - M based on combination

nv

Sample size

exp1

The expectation of the ratio (e.g. ratio of # SNPs in genomic partitioning)

Value

This function will estimate var(beta1^2/R^2), beta1 and R^2 are regression coefficient and the coefficient of determination from a multiple regression model, i.e. y = x1 * beta1 + x2 * beta2 +e, where y, x1 and x2 are column-standardised. The outputs are listed as follows.

beta1_sq

beta1_sq

beta2_sq

beta2_sq

ratio1

beta1_sq/R^2

ratio2

beta2_sq/R^2

ratio_var1

variance of ratio 1

ratio_var2

variance of ratio 2

upper_ratio1

upper limit of 95% CI for ratio 1

lower_ratio1

lower limit of 95% CI for ratio 1

upper_ratio2

upper limit of 95% CI for ratio 2

lower_ratio2

lower limit of 95% CI for ratio 2

enrich_p1

two tailed P-value for beta1_sq/R^2 is significantly different from exp1

enrich_p1_one_tail

one tailed P-value for beta1_sq/R^2 is significantly different from exp1

enrich_p2

P-value for beta2_sq/R2 is significantly different from (1-exp1)

enrich_p2_one_tail

one tailed P-value for beta2_sq/R2 is significantly different from (1-exp1)

References

Olkin, I. and Finn, J.D. Correlations redux. Psychological Bulletin, 1995. 118(1): p. 155.

Examples

#To get the test statistic for the ratio which is significantly
#different from the expectation, this function estiamtes 
#var (beta1^2/R^2), where 
#beta1^2 and R^2 are regression coefficients and the 
#coefficient of dterminationfrom a multiple regression model,
#i.e. y = x1 * beta1 + x2 * beta2 +e, where y, x1 and x2 are 
#column-standardised.

dat=dat2
nv=length(dat$V1)
v1=c(1)
v2=c(2)
expected_ratio=0.04
output=r2_enrich_beta(dat,v1,v2,nv,expected_ratio)
output

#r2redux output

#output$beta1_sq (beta1_sq)
#0.01118301

#output$beta2_sq (beta2_sq)
#0.004980285

#output$ratio1 (beta1_sq/R^2)
#0.4392572

#output$ratio2 (beta2_sq/R^2)
#0.1956205

#output$ratio_var1 (variance of ratio 1)
#0.08042288

#output$ratio_var2 (variance of ratio 2)
#0.0431134

#output$upper_ratio1 (upper limit of 95% CI for ratio 1)
#0.9950922

#output$lower_ratio1 (lower limit of 95% CI for ratio 1)
#-0.1165778

#output$upper_ratio2 upper limit of 95% CI for ratio 2)
#0.6025904

#output$lower_ratio2 (lower limit of 95% CI for ratio 2)
#-0.2113493

#output$enrich_p1 (two tailed P-value for beta1_sq/R^2 is 
#significantly different from exp1)
#0.1591692

#output$enrich_p1_one_tail (one tailed P-value for beta1_sq/R^2 
#is significantly different from exp1)
#0.07958459

#output$enrich_p2 (two tailed P-value for beta2_sq/R2 is 
#significantly different from (1-exp1))
#0.000232035

#output$enrich_p2_one_tail (one tailed P-value for beta2_sq/R2  
#is significantly different from (1-exp1))
#0.0001160175

r2_var function

Description

This function estimates var(R2(y~x[,v1])) where R2 is the R squared value of the model, where R2 is the R squared value of the model, y is N by 1 matrix having the dependent variable, and x is N by M matrix having M explanatory variables. v1 indicates the ith column in the x matrix (v1 can be multiple values between 1 - M, see Arguments below)

Usage

r2_var(dat, v1, nv)

Arguments

dat

N by (M+1) matrix having variables in the order of cbind(y,x)

v1

This can be set as v1=c(1), v1=c(1,2) or possibly with more values

nv

Sample size

Value

This function will test the null hypothesis for R2. To get the test statistics for R2(y~x[,v1]). The outputs are listed as follows.

rsq

var

Variance of R2

r2_based_p

P-value under the null hypothesis, i.e. R2=0

upper_r2

Upper limit of 95% CI for R2

lower_r2

Lower limit of 95% CI for R2

Examples


#To get the test statistics for R2(y~x[,1])
dat=dat1
nv=length(dat$V1)
v1=c(1)
output=r2_var(dat,v1,nv)
output

#r2redux output

#output$rsq (R2)
#0.03836254

#output$var (variance of R2) 
#0.0001436128

#output$r2_based_p (P-value under the null hypothesis, i.e. R2=0)
#1.188162e-10

#output$upper_r2 (upper limit of 95% CI for R2)
#0.06433782

#output$lower_r2 (lower limit of 95% CI for R2)
#0.01764252


#To get the test statistic for R2(y~x[,1]+x[,2]+x[,3])

dat=dat1
nv=length(dat$V1)
v1=c(1,2,3) 
r2_var(dat,v1,nv)

#r2redux output

#output$rsq (R2)
#0.03836254

#output$var (variance of R2)
#0.0001436128

#output$r2_based_p (R2 based P-value)
#1.188162e-10

#output$upper_r2 (upper limit of 95% CI for R2)
#0.06433782

#output$lower_r2 (lower limit of 95% CI for R2)
#0.0176425


#When comparing two independent sets of PGSs 
#Let’s assume dat1$V1 and dat2$V2 are independent for this example
#(e.g. male PGS vs. female PGS)

nv=length(dat1$V1)
v1=c(1)
output1=r2_var(dat1,v1,nv)
nv=length(dat2$V1)
v1=c(1)
output2=r2_var(dat2,v1,nv)

#To get the difference between two independent sets of PGSs 
r2_diff_independent=abs(output1$rsq-output2$rsq)

#To get the variance of the difference between two independent sets of PGSs 
var_r2_diff_independent= output1$var+output2$var
sd_r2_diff_independent=sqrt(var_r2_diff_independent)

#To get p-value (following eq. 15 in the paper)
chi=r2_diff_independent^2/var_r2_diff_independent
p_value=pchisq(chi,1,lower.tail=FALSE)
#to get 95% CI (following eq. 15 in the paper)
uci=r2_diff_independent+1.96*sd_r2_diff_independent
lci=r2_diff_independent-1.96*sd_r2_diff_independent

r_diff function

Description

This function estimates var(R(y~x[,v1]) - R(y~x[,v2])) where R is the correlation between y and x, y is N by 1 matrix having the dependent variable, and x is N by M matrix having M explanatory variables. v1 or v2 indicates the ith column in the x matrix (v1 or v2 can be multiple values between 1 - M, see Arguments below)

Usage

r_diff(dat, v1, v2, nv)

Arguments

dat

N by (M+1) matrix having variables in the order of cbind(y,x)

v1

This can be set as v1=c(1) or v1=c(1,2)

v2

This can be set as v2=c(2), v2=c(3), v2=c(1,3) or v2=c(3,4)

nv

Sample size

Value

This function will estimate significant difference between two PGS (either dependent or independent and joint or single). To get the test statistics for the difference between R(y~x[,v1]) and R(y~x[,v2]). (here we define R_1=R(y~x[,v1])) and R_2=R(y~x[,v2]))). The outputs are listed as follows.

r1

R_1

r2

R_2

var1

Variance of R_1

var2

variance of R_2

var_diff

Variance of difference between R_1 and R_2

r2_based_p

P-value for significant difference between R_1 and R_2 for two tailed test

r_based_p_one_tail

P-value for significant difference between R_1 and R_2 for one tailed test

mean_diff

Differences between R_1 and R_2

upper_diff

Upper limit of 95% CI for the difference

lower_diff

Lower limit of 95% CI for the difference

Examples

#To get the test statistics for the difference between R(y~x[,1]) and 
#R(y~x[,2]). (here we define R_1=R(y~x[,1])) and R_2=R(y~x[,2])))

dat=dat1
nv=length(dat$V1)
v1=c(1)
v2=c(2)
output=r_diff(dat,v1,v2,nv)
output

#r2redux output

#output$r1 (R_1)
#0.1958636

#output$r2 (R_2)
#0.197006

#output$var1 (variance of R_1)
#0.0009247466

#output$var2 (variance of R_1)
#0.0001451358

#output$var_diff (variance of difference between R_1 and R_2)
#3.65286e-06

#output$r_based_p (two tailed p-value for significant difference between R_1 and R_2)
#0.5500319

#output$r_based_p_one_tail (one tailed p-value 
#0.2750159

#output$mean_diff
#-0.001142375 (differences between R2_1 and R2_2)

#output$upper_diff (upper limit of 95% CI for the difference)
#0.002603666

#output$lower_diff (lower limit of 95% CI for the difference)
#-0.004888417


#To get the test statistics for the difference between R(y~x[,1]+[,2]) and 
#R(y~x[,2]). (here R_1=R(y~x[,1]+x[,2]) and R_2=R(y~x[,1]))

nv=length(dat$V1)
v1=c(1,2)
v2=c(2)
output=r_diff(dat,v1,v2,nv)
output

#output$r1
#0.1974001

#output$r2
#0.197006

#output$var1
#0.0009235848

#output$var2
#0.0009238836

#output$var_diff
#3.837451e-06

#output$r2_based_p
#0.8405593

#output$mean_diff
#0.0003940961

#output$upper_diff
#0.004233621

#output$lower_diff
#-0.003445429


#Note: If the directions are not consistent, for instance, if one correlation
#is positive (R_1) and another is negative (R_2), or vice versa, it is 
#crucial to approach the interpretation of the comparative test with caution. 
#This caution is especially emphasized when applying r_diff() 
#in a nested model comparison involving a joint model