| Title: | R2 Statistic | 
| Version: | 1.0.18 | 
| Description: | R2 statistic for significance test. Variance and covariance of R2 values used to assess the 95% CI and p-value of the R2 difference. | 
| License: | GPL (≥ 3) | 
| URL: | https://github.com/mommy003/r2redux | 
| Encoding: | UTF-8 | 
| RoxygenNote: | 7.1.2 | 
| NeedsCompilation: | no | 
| Depends: | R (≥ 2.10) | 
| LazyData: | true | 
| Suggests: | testthat (≥ 3.0.0) | 
| Config/testthat/edition: | 3 | 
| Packaged: | 2025-02-15 05:10:42 UTC; alh-admmdm | 
| Author: | Hong Lee [aut, cph], Moksedul Momin [aut, cre, cph] | 
| Maintainer: | Moksedul Momin <cvasu.momin@gmail.com> | 
| Repository: | CRAN | 
| Date/Publication: | 2025-02-15 05:30:02 UTC | 
cc_trf function
Description
This function transforms the predictive ability (R2) and its standard error (se) between the observed scale and liability scale
Usage
cc_trf(R2, se, K, P)
Arguments
| R2 | R2 or coefficient of determination on the observed or liability scale | 
| se | Standard error of R2 | 
| K | Population prevalence | 
| P | The ratio of cases in the study samples | 
Value
This function will transform the R2 and its s.e between observed scale and liability scale.Output from the command is the lists of outcomes.
| R2l | Transformed R2 on the liability scale | 
| sel | Transformed se on the liability scale | 
| R2O | Transformed R2 on the observed scale | 
| seO | Transformed se on the observed scale | 
References
Lee, S. H., Goddard, M. E., Wray, N. R., and Visscher, P. M. A better coefficient of determination for genetic profile analysis. Genetic epidemiology,(2012). 36(3): p. 214-224.
Examples
#To get the transformed R2
output=cc_trf(0.06, 0.002, 0.05, 0.05)
output
#output$R2l (transformed R2 on the liability scale)
#0.2679337
#output$sel (transformed se on the liability scale)
#0.008931123
#output$R2O (transformed R2 on the observed scale)
#0.01343616
#output$seO (transformed se on the observed scale)
#0.000447872
Phenotypes and 10 sets of PGSs
Description
A dataset containing phenotypes and multiple PGSs estimated from 10 sets of SNPs according to GWAS p-value thresholds
Usage
dat1
Format
A data frame with 1000 rows and 11 variables:
- V1
- Phenotype, value 
- V2
- PGS1, for p value threshold <=1 
- V3
- PGS2, for p value threshold <=0.5 
- V4
- PGS3, for p value threshold <=0.4 
- V5
- PGS4, for p value threshold <=0.3 
- V6
- PGS5, for p value threshold <=0.2 
- V7
- PGS6, for p value threshold <=0.1 
- V8
- PGS7, for p value threshold <=0.05 
- V9
- PGS8, for p value threshold <=0.01 
- V10
- PGS9, for p value threshold <=0.001 
- V11
- PGS10, for p value threshold <=0.0001 
Phenotypes and 2 sets of PGSs
Description
A dataset containing phenotypes and 2 sets of PGSs estimated from 2 sets of SNPs from regulatroy and non-regulatory genomic regions
Usage
dat2
Format
A data frame with 1000 rows and 3 variables:
- V1
- Phenotype 
- V2
- PGS1, regulatory region 
- V3
- PGS2, non-regulatory region 
olkin12_1 function
Description
olkin12_1 function
Usage
olkin12_1(omat, nv)
Arguments
| omat | 3 by 3 matrix having the correlation coefficients between y, x1 and x2, i.e. omat=cor(dat) where dat is N by 3 matrix having variables in the order of cbind (y,x1,x2) | 
| nv | Sample size | 
Value
This function will be used as source code
olkin12_13 function
Description
olkin12_13 function
Usage
olkin12_13(omat, nv)
Arguments
| omat | 3 by 3 matrix having the correlation coefficients between y, x1 and x2, i.e. omat=cor(dat) where dat is N by 3 matrix having variables in the order of cbind (y,x1,x2) | 
| nv | Sample size | 
Value
This function will be used as source code
olkin12_3 function
Description
olkin12_3 function
Usage
olkin12_3(omat, nv)
Arguments
| omat | 3 by 3 matrix having the correlation coefficients between y, x1 and x2, i.e. omat=cor(dat) where dat is N by 3 matrix having variables in the order of cbind (y,x1,x2) | 
| nv | Sample size | 
Value
This function will be used as source code
olkin12_34 function
Description
olkin12_34 function
Usage
olkin12_34(omat, nv)
Arguments
| omat | 3 by 3 matrix having the correlation coefficients between y, x1 and x2, i.e. omat=cor(dat) where dat is N by 3 matrix having variables in the order of cbind (y,x1,x2) | 
| nv | Sample size | 
Value
This function will be used as source code
olkin1_2 function
Description
olkin1_2 function
Usage
olkin1_2(omat, nv)
Arguments
| omat | 3 by 3 matrix having the correlation coefficients between y, x1 and x2, i.e. omat=cor(dat) where dat is N by 3 matrix having variables in the order of cbind (y,x1,x2) | 
| nv | Sample size | 
Value
This function will be used as source code
olkin_beta1_2 function
Description
This function derives Information matrix for beta1^2 and beta2^2 where beta1 and 2 are regression coefficients from a multiple regression model, i.e. y = x1 * beta1 + x2 * beta2 + e, where y, x1 and x2 are column-standardised, (i.e. in the context of correlation coefficients,see Olkin and Finn 1995).
Usage
olkin_beta1_2(omat, nv)
Arguments
| omat | 3 by 3 matrix having the correlation coefficients between y, x1 and x2, i.e. omat=cor(dat) where dat is N by 3 matrix having variables in the order of cbind (y,x1,x2) | 
| nv | Sample size | 
Value
This function will give information (variance-covariance) matrix of beta1^2 and beta2^2.To get information (variance-covariance) matrix of beta1^2 and beta2^2. Where beta1 and beta2 are regression coefficients from a multiple regression model. The outputs are listed as follows.
| info | 2x2 information (variance-covariance) matrix | 
| var1 | Variance of beta1_2 | 
| var2 | Variance of beta2_2 | 
| var1_2 | Variance of difference between beta1_2 and beta2_2 | 
References
Olkin, I. and Finn, J.D. Correlations redux. Psychological Bulletin, 1995. 118(1): p. 155.
Examples
#To get information (variance-covariance) matrix of beta1_2 and beta2_2 where 
#beta1 and 2 are regression coefficients from a multiple regression model.
dat=dat1
omat=cor(dat)[1:3,1:3]
#omat
#1.0000000 0.1958636 0.1970060
#0.1958636 1.0000000 0.9981003
#0.1970060 0.9981003 1.0000000
nv=length(dat$V1)
output=olkin_beta1_2(omat,nv) 
output
#output$info (2x2 information (variance-covariance) matrix)
#0.04146276 0.08158261
#0.08158261 0.16111124             
#output$var1 (variance of beta1_2)
#0.04146276 
            
#output$var2 (variance of beta2_2)
#0.1611112
#output$var1_2 (variance of difference between beta1_2 and beta2_2)
#0.03940878
olkin_beta_inf function
Description
This function derives Information matrix for beta1 and beta2 where beta1 and 2 are regression coefficients from a multiple regression model, i.e. y = x1 * beta1 + x2 * beta2 + e, where y, x1 and x2 are column-standardised (see Olkin and Finn 1995).
Usage
olkin_beta_inf(omat, nv)
Arguments
| omat | 3 by 3 matrix having the correlation coefficients between y, x1 and x2, i.e. omat=cor(dat) where dat is N by 3 matrix having variables in the order of cbind (y,x1,x2) | 
| nv | Sample size | 
Value
This function will generate information (variance-covariance) matrix of beta1 and beta2.The outputs are listed as follows.
| info | 2x2 information (variance-covariance) matrix | 
| var1 | Variance of beta1 | 
| var2 | Variance of beta2 | 
| var1_2 | Variance of difference between beta1 and beta2 | 
References
Olkin, I. and Finn, J.D. Correlations redux. Psychological Bulletin, 1995. 118(1): p. 155.
Examples
#To get information (variance-covariance) matrix of beta1 and beta2 where 
#beta1 and 2 are regression coefficients from a multiple regression model.
dat=dat1
omat=cor(dat)[1:3,1:3]
#omat
#1.0000000 0.1958636 0.1970060
#0.1958636 1.0000000 0.9981003
#0.1970060 0.9981003 1.0000000
nv=length(dat$V1)
output=olkin_beta_inf(omat,nv)
output 
#output$info (2x2 information (variance-covariance) matrix)
#0.2531406 -0.2526212
#-0.2526212  0.2530269            
#output$var1 (variance of beta1)
#0.2531406
            
#output$var2 (variance of beta2)
#0.2530269
#output$var1_2 (variance of difference between beta1 and beta2)
#1.01141  
olkin_beta_ratio function
Description
This function derives variance of beta1^2 / R^2 where beta1 and 2 are regression coefficients from a multiple regression model, i.e. y = x1 * beta1 + x2 * beta2 + e, where y, x1 and x2 are column-standardised (see Olkin and Finn 1995).
Usage
olkin_beta_ratio(omat, nv)
Arguments
| omat | 3 by 3 matrix having the correlation coefficients between y, x1 and x2, i.e. omat=cor(dat) where dat is N by 3 matrix having variables in the order of cbind (y,x1,x2) | 
| nv | sampel size | 
Value
This function will generate the variance of the proportion, i.e. beta1_2/R^2.The outputs are listed as follows.
| ratio_var | Variance of ratio | 
References
Olkin, I. and Finn, J.D. Correlations redux. Psychological Bulletin, 1995. 118(1): p. 155.
Examples
#To get information (variance-covariance) matrix of beta1 and beta2 where 
#beta1 and 2 are regression coefficients from a multiple regression model.
dat=dat2
omat=cor(dat)[1:3,1:3]
#omat
#1.0000000 0.1497007 0.136431
#0.1497007 1.0000000 0.622790
#0.1364310 0.6227900 1.000000
nv=length(dat$V1)
output=olkin_beta_ratio(omat,nv)
output 
#r2redux output
#output$ratio_var (Variance of ratio)
#0.08042288
r2_beta_var
Description
This function estimates var(beta1^2) and (beta2^2), and beta1 and 2 are regression coefficients from a multiple regression model, i.e. y = x1 * beta1 + x2 * beta2 +e, y, x1 and x2 are column-standardised (see Olkin and Finn 1995). y is N by 1 matrix having the dependent variable, x1 is N by 1 matrix having the ith explanatory variable. x2 is N by 1 matrix having the jth explanatory variable. v1 and v2 indicates the ith and jth column in the data (v1 or v2 should be a single interger between 1 - M, see Arguments below).
Usage
r2_beta_var(dat, v1, v2, nv)
Arguments
| dat | N by (M+1) matrix having variables in the order of cbind(y,x) | 
| v1 | This can be set as v1=1, v1=2, v1=3 or any value between 1 - M based on combination | 
| v2 | This can be set as v2=1, v2=2, v2=3, or any value between 1 - M based on combination | 
| nv | Sample size | 
Value
This function will estiamte the variance of beta1^2 and beta2^2, and the covariance between beta1^2 and beta2^2, i.e. the information matrix of squared regression coefficients. beta1 and beta2 are regression coefficients from a multiple regression model, i.e. y = x1 * beta1 + x2 * beta2 +e, where y, x1 and x2 are column-standardised. The outputs are listed as follows.
| beta1_sq | beta1_sq | 
| beta2_sq | beta2_sq | 
| var1 | Variance of beta1_sq | 
| var2 | Variance of beta2_sq | 
| var1_2 | Variance of difference between beta1_sq and beta2_sq | 
| cov | Covariance between beta1_sq and beta2_sq | 
| upper_beta1_sq | upper limit of 95% CI for beta1_sq | 
| lower_beta1_sq | lower limit of 95% CI for beta1_sq | 
| upper_beta2_sq | upper limit of 95% CI for beta2_sq | 
| lower_beta2_sq | lower limit of 95% CI for beta2_sq | 
References
Olkin, I. and Finn, J.D. Correlations redux. Psychological Bulletin, 1995. 118(1): p. 155.
Examples
#To get the 95% CI of beta1_sq and beta2_sq
#beta1 and beta2 are regression coefficients from a multiple regression model,
#i.e. y = x1 * beta1 + x2 * beta2 +e, where y, x1 and x2 are column-standardised.
dat=dat2
nv=length(dat$V1)
v1=c(1)
v2=c(2)
output=r2_beta_var(dat,v1,v2,nv)
output
#r2redux output
#output$beta1_sq (beta1_sq)
#0.01118301
#output$beta2_sq (beta2_sq)
#0.004980285
#output$var1 (variance of beta1_sq)
#7.072931e-05
#output$var2 (variance of beta2_sq)
#3.161929e-05
#output$var1_2 (variance of difference between beta1_sq and beta2_sq)
#0.000162113
#output$cov (covariance between beta1_sq and beta2_sq)
#-2.988221e-05
#output$upper_beta1_sq (upper limit of 95% CI for beta1_sq)
#0.03037793
#output$lower_beta1_sq (lower limit of 95% CI for beta1_sq)
#-0.00123582
#output$upper_beta2_sq (upper limit of 95% CI for beta2_sq)
#0.02490076
#output$lower_beta2_sq (lower limit of 95% CI for beta2_sq)
#-0.005127546
r2_diff function
Description
This function estimates var(R2(y~x[,v1]) - R2(y~x[,v2])) where R2 is the R squared value of the model, y is N by 1 matrix having the dependent variable, and x is N by M matrix having M explanatory variables. v1 or v2 indicates the ith column in the x matrix (v1 or v2 can be multiple values between 1 - M, see Arguments below)
Usage
r2_diff(dat, v1, v2, nv)
Arguments
| dat | N by (M+1) matrix having variables in the order of cbind(y,x) | 
| v1 | This can be set as v1=c(1) or v1=c(1,2) | 
| v2 | This can be set as v2=c(2), v2=c(3), v2=c(1,3) or v2=c(3,4) | 
| nv | Sample size | 
Value
This function will estimate significant difference between two PGS (either dependent or independent and joint or single). To get the test statistics for the difference between R2(y~x[,v1]) and R2(y~x[,v2]). (here we define R2_1=R2(y~x[,v1])) and R2_2=R2(y~x[,v2]))). The outputs are listed as follows.
| rsq1 | R2_1 | 
| rsq2 | R2_2 | 
| var1 | Variance of R2_1 | 
| var2 | variance of R2_2 | 
| var_diff | Variance of difference between R2_1 and R2_2 | 
| r2_based_p | two tailed P-value for significant difference between R2_1 and R2_2 | 
| r2_based_p_one_tail | one tailed P-value for significant difference | 
| mean_diff | Differences between R2_1 and R2_2 | 
| upper_diff | Upper limit of 95% CI for the difference | 
| lower_diff | Lower limit of 95% CI for the difference | 
Examples
#To get the test statistics for the difference between R2(y~x[,1]) and 
#R2(y~x[,2]). (here we define R2_1=R2(y~x[,1])) and R2_2=R2(y~x[,2])))
dat=dat1
nv=length(dat$V1)
v1=c(1)
v2=c(2)
output=r2_diff(dat,v1,v2,nv)
output
#r2redux output
#output$rsq1 (R2_1)
#0.03836254
#output$rsq2 (R2_2)
#0.03881135
#output$var1 (variance of R2_1)
#0.0001436128
#output$var2 (variance of R2_2)
#0.0001451358
#output$var_diff (variance of difference between R2_1 and R2_2)
#5.678517e-07
#output$r2_based_p (two tailed p-value for significant difference)
#0.5514562
#output$r2_based_p_one_tail(one tailed p-value for significant difference)
#0.2757281
#output$mean_diff (differences between R2_1 and R2_2)
#-0.0004488044
#output$upper_diff (upper limit of 95% CI for the difference)
#0.001028172
#output$lower_diff (lower limit of 95% CI for the difference)
#-0.001925781
#output$$p$nested
#1
#output$$p$nonnested
#0.5514562
#output$$p$LRT
#1
#To get the test statistics for the difference between R2(y~x[,1]+x[,2]) and 
#R2(y~x[,2]). (here R2_1=R2(y~x[,1]+x[,2]) and R2_2=R2(y~x[,1]))
dat=dat1
nv=length(dat$V1)
v1=c(1,2)
v2=c(1)
output=r2_diff(dat,v1,v2,nv)
#r2redux output
#output$rsq1 (R2_1)
#0.03896678
#output$rsq2 (R2_2)
#0.03836254
#output$var1 (variance of R2_1)
#0.0001473686
#output$var2 (variance of R2_2)
#0.0001436128
#output$var_diff (variance of difference between R2_1 and R2_2)
#2.321425e-06
#output$r2_based_p (p-value for significant difference between R2_1 and R2_2)
#0.4366883
#output$mean_diff (differences between R2_1 and R2_2)
#0.0006042383
#output$upper_diff (upper limit of 95% CI for the difference)
#0.00488788
#output$lower_diff (lower limit of 95% CI for the difference)
#-0.0005576171
#Note: If the directions are not consistent, for instance, if one correlation 
#is positive (R_1) and another is negative (R_2), or vice versa, it is crucial
#to approach the interpretation of the comparative test with caution. 
#It's important to note that R^2 alone does not provide information about the 
#direction or sign of the relationships between predictors and the response variable. 
#When faced with multiple predictors common between two models, for example, 
#y = any_cov1 + any_cov2 + ... + any_covN  + e   vs.
#y = PRS + any_cov1 + any_cov2 +...+ any_covN + e
#A more streamlined approach can be adopted by consolidating the various
#predictors into a single predictor (see R code below).
#R
#dat=dat1
#here let's assume, we wanted to test one PRS (dat$V2) 
#with 5 covariates (dat$V7 to dat$V11)
#mod1 <- lm(dat$V1~dat$V2 + dat$V7+ dat$V8+ dat$V9+ dat$V10+ dat$V11)
#merged_predictor1 <- mod1$fitted.values
#mod2 <- lm(dat$V1~ dat$V7+ dat$V8+ dat$V9+ dat$V10+ dat$V11)
#merged_predictor2 <- mod2$fitted.values
#dat=data.frame(dat$V1,merged_predictor1,merged_predictor2)
#the comparison can be equivalently expressed as:
#y = merged_predictor1 + e   vs. 
#y = merged_predictor2 + e
  
#This comparison can be simply achieved using the r2_diff function, e.g.
#To get the test statistics for the difference between R2(y~x[,1]) and
#R2(y~x[,2]). (here x[,1]= merged_predictor2 (from full model), 
#and x[,2]= merged_predictor1(from reduced model))
#v1=c(1)
#v2=c(2)
#output=r2_diff(dat,v1,v2,nv)
#note that the merged predictor from the full model (v1) should be the first.
#str(output)
#List of 11
#$ rsq1               : num 0.0428
#$ rsq2               : num 0.042
#$ var1               : num 0.0.000158
#$ var2               : num 0.0.000156
#$ var_diff           : num 2.87e-06
#$ r2_based_p         : num 0.658
#$ r2_based_p_one_tail: num 0.329
#$ mean_diff          : num 0.000751
#$ upper_diff         : num 0.00407
#$ lower_diff         : num -0.00257
#$ p                  :List of 3
#..$ nested   : num 0.386
#..$ nonnested: num 0.658
#..$ LRT      : num 0.376
#Importantly note that in this case, merged_predictor1 is nested within
#merged_predictor2 (see mod1 vs. mod2 above). Therefore, this is 
#nested model comparison. So, output$p$nested (0.386) should be used 
#instead of output$p$nonnested (0.658). 
#Note that r2_based_p is the same as output$p$nonnested (0.658) here. 
  
##For this scenario, alternatively, the outcome variable (y) can be preadjusted
#with covariate(s), following the procedure in R:
#mod <- lm(y ~ any_cov1 + any_cov2 + ... + any_covN)
#y_adj=scale(mod$residuals)
#then, the comparative significance test can be approximated by using
#the following model y_adj = PRS (r2_var(dat, v1, nv)) 
#R
#dat=dat1
#mod <- lm(dat$V1~dat$V7+ dat$V8+ dat$V9+ dat$V10+ dat$V11)
#y_adj=scale(mod$residuals)
#dat=data.frame(y_adj,dat$V2)
#v1=c(1)
#output=r2_var(dat, v1, nv)
#str(output)
#$ var       : num 2e-06
#$ LRT_p     :Class 'logLik' : 0.98 (df=2)
#$ r2_based_p: num 0.977
#$ rsq       : num 8.21e-07
#$ upper_r2  : num 0.00403
#$ lower_r2  : num -0.000999
#In another scenario where the same covariates, but different
#PRS1 and PRS2 are compared,   
#y = PRS1 + any_cov1 + any_cov2 + ... + any_covN  + e   vs.
#y = PRS2 + any_cov1 + any_cov2 + ... + any_covN + e
     
#following approach can be employed (see R code below).
   
#R
#dat=dat1
#here let's assume dat$V2 as PRS1, dat$V3 as PRS2 and dat$V7 to dat$V11 as covariates
#mod1 <- lm(dat$V1~dat$V2 + dat$V7+ dat$V8+ dat$V9+ dat$V10+ dat$V11)
#merged_predictor1 <- mod1$fitted.values
#mod2 <- lm(dat$V1~dat$V3 + dat$V7+ dat$V8+ dat$V9+ dat$V10+ dat$V11)
#merged_predictor2 <- mod2$fitted.values
#dat=data.frame(dat$V1,merged_predictor2,merged_predictor1)
  
#the comparison can be equivalently expressed as:
#y = merged_predictor1 + e   vs. 
#y = merged_predictor2 + e
#This comparison can be simply achieved using the r2_diff function, e.g.
#To get the test statistics for the difference between R2(y~x[,1]) and
#R2(y~x[,2]). (here x[,1]= merged_predictor2, and x[,2]= merged_predictor1)
#v1=c(1)
#v2=c(2)
#output=r2_diff(dat,v1,v2,nv)
#str(output)
#List of 11
#$ rsq1               : num 0.043
#$ rsq2               : num 0.0428
#$ var1               : num 0.000159
#$ var2               : num 0.000158
#$ var_diff           : num 2.6e-07
#$ r2_based_p         : num 0.657
#$ r2_based_p_one_tail: num 0.328
#$ mean_diff          : num 0.000227
#$ upper_diff         : num 0.00123
#$ lower_diff         : num 0.000773
#$ p                  :List of 3
#..$ nested   : num 0.634
#..$ nonnested: num 0.657
#..$ LRT      : num 0.627
#Importantly note that in this case, merged_predictor1 and merged_predictor2 
#are not nested to each other (see mod1 vs. mod2 above). 
#Therefore, this is nonnested model comparison. 
#So, output$p$nonnested (0.657) should be used instead of 
#output$p$nested (0.634). Note that r2_based_p is the same 
#as output$p$nonnested (0.657) here. 
#For the above non-nested scenario, alternatively, the outcome variable (y) 
#can be preadjusted with covariate(s), following the procedure in R:
#mod <- lm(y ~ any_cov1 + any_cov2 + ... + any_covN)
#y_adj=scale(mod$residuals)
#R
#dat=dat1
#mod <- lm(dat$V1~dat$V7+ dat$V8+ dat$V9+ dat$V10+ dat$V11)
#y_adj=scale(mod$residuals)
#dat=data.frame(y_adj,dat$V3,dat$V2)
#the comparison can be equivalently expressed as:
#y_adj = PRS1 + e   vs. 
#y_adj = PRS2 + e
#then, the comparative significance test can be approximated by using r2_diff function
#To get the test statistics for the difference between R2(y~x[,1]) and
#R2(y~x[,2]). (here x[,1]= PRS1 and x[,2]= PRS2)
#v1=c(1)
#v2=c(2)
#output=r2_diff(dat,v1,v2,nv)
#str(output)
#List of 11
#$ rsq1               : num 5.16e-05
#$ rsq2               : num 4.63e-05
#$ var1               : num 2.21e-06
#$ var2               : num 2.18e-06
#$ var_diff           : num 1.31e-09
#$ r2_based_p         : num 0.884
#$ r2_based_p_one_tail: num 0.442
#$ mean_diff          : num 5.28e-06
#$ upper_diff         : num 7.63e-05
#$ lower_diff         : num -6.57e-05
#$ p                  :List of 3
#..$ nested   : num 0.942
#..$ nonnested: num 0.884
#..$ LRT      : num 0.942
r2_enrich_beta
Description
This function estimates var(beta1^2/R^2), beta1 and R^2 are regression coefficient and the coefficient of determination from a multiple regression model, i.e. y = x1 * beta1 + x2 * beta2 +e, where y, x1 and x2 are column-standardised (see Olkin and Finn 1995). y is N by 1 matrix having the dependent variable, and x1 is N by 1 matrix having the ith explanatory variables. x2 is N by 1 matrix having the jth explanatory variables. v1 and v2 indicates the ith and jth column in the data (v1 or v2 should be a single interger between 1 - M, see Arguments below).
Usage
r2_enrich_beta(dat, v1, v2, nv, exp1)
Arguments
| dat | N by (M+1) matrix having variables in the order of cbind(y,x) | 
| v1 | These can be set as v1=1, v1=2, v1=3 or any value between 1 - M based on combination | 
| v2 | These can be set as v2=1, v2=2, v2=3, or any value between 1 - M based on combination | 
| nv | Sample size | 
| exp1 | The expectation of the ratio (e.g. ratio of # SNPs in genomic partitioning) | 
Value
This function will estimate var(beta1^2/R^2), beta1 and R^2 are regression coefficient and the coefficient of determination from a multiple regression model, i.e. y = x1 * beta1 + x2 * beta2 +e, where y, x1 and x2 are column-standardised. The outputs are listed as follows.
| beta1_sq | beta1_sq | 
| beta2_sq | beta2_sq | 
| ratio1 | beta1_sq/R^2 | 
| ratio2 | beta2_sq/R^2 | 
| ratio_var1 | variance of ratio 1 | 
| ratio_var2 | variance of ratio 2 | 
| upper_ratio1 | upper limit of 95% CI for ratio 1 | 
| lower_ratio1 | lower limit of 95% CI for ratio 1 | 
| upper_ratio2 | upper limit of 95% CI for ratio 2 | 
| lower_ratio2 | lower limit of 95% CI for ratio 2 | 
| enrich_p1 | two tailed P-value for beta1_sq/R^2 is significantly different from exp1 | 
| enrich_p1_one_tail | one tailed P-value for beta1_sq/R^2 is significantly different from exp1 | 
| enrich_p2 | P-value for beta2_sq/R2 is significantly different from (1-exp1) | 
| enrich_p2_one_tail | one tailed P-value for beta2_sq/R2 is significantly different from (1-exp1) | 
References
Olkin, I. and Finn, J.D. Correlations redux. Psychological Bulletin, 1995. 118(1): p. 155.
Examples
#To get the test statistic for the ratio which is significantly
#different from the expectation, this function estiamtes 
#var (beta1^2/R^2), where 
#beta1^2 and R^2 are regression coefficients and the 
#coefficient of dterminationfrom a multiple regression model,
#i.e. y = x1 * beta1 + x2 * beta2 +e, where y, x1 and x2 are 
#column-standardised.
dat=dat2
nv=length(dat$V1)
v1=c(1)
v2=c(2)
expected_ratio=0.04
output=r2_enrich_beta(dat,v1,v2,nv,expected_ratio)
output
#r2redux output
#output$beta1_sq (beta1_sq)
#0.01118301
#output$beta2_sq (beta2_sq)
#0.004980285
#output$ratio1 (beta1_sq/R^2)
#0.4392572
#output$ratio2 (beta2_sq/R^2)
#0.1956205
#output$ratio_var1 (variance of ratio 1)
#0.08042288
#output$ratio_var2 (variance of ratio 2)
#0.0431134
#output$upper_ratio1 (upper limit of 95% CI for ratio 1)
#0.9950922
#output$lower_ratio1 (lower limit of 95% CI for ratio 1)
#-0.1165778
#output$upper_ratio2 upper limit of 95% CI for ratio 2)
#0.6025904
#output$lower_ratio2 (lower limit of 95% CI for ratio 2)
#-0.2113493
#output$enrich_p1 (two tailed P-value for beta1_sq/R^2 is 
#significantly different from exp1)
#0.1591692
#output$enrich_p1_one_tail (one tailed P-value for beta1_sq/R^2 
#is significantly different from exp1)
#0.07958459
#output$enrich_p2 (two tailed P-value for beta2_sq/R2 is 
#significantly different from (1-exp1))
#0.000232035
#output$enrich_p2_one_tail (one tailed P-value for beta2_sq/R2  
#is significantly different from (1-exp1))
#0.0001160175
r2_var function
Description
This function estimates var(R2(y~x[,v1])) where R2 is the R squared value of the model, where R2 is the R squared value of the model, y is N by 1 matrix having the dependent variable, and x is N by M matrix having M explanatory variables. v1 indicates the ith column in the x matrix (v1 can be multiple values between 1 - M, see Arguments below)
Usage
r2_var(dat, v1, nv)
Arguments
| dat | N by (M+1) matrix having variables in the order of cbind(y,x) | 
| v1 | This can be set as v1=c(1), v1=c(1,2) or possibly with more values | 
| nv | Sample size | 
Value
This function will test the null hypothesis for R2. To get the test statistics for R2(y~x[,v1]). The outputs are listed as follows.
| rsq | R2 | 
| var | Variance of R2 | 
| r2_based_p | P-value under the null hypothesis, i.e. R2=0 | 
| upper_r2 | Upper limit of 95% CI for R2 | 
| lower_r2 | Lower limit of 95% CI for R2 | 
Examples
#To get the test statistics for R2(y~x[,1])
dat=dat1
nv=length(dat$V1)
v1=c(1)
output=r2_var(dat,v1,nv)
output
#r2redux output
#output$rsq (R2)
#0.03836254
#output$var (variance of R2) 
#0.0001436128
#output$r2_based_p (P-value under the null hypothesis, i.e. R2=0)
#1.188162e-10
#output$upper_r2 (upper limit of 95% CI for R2)
#0.06433782
#output$lower_r2 (lower limit of 95% CI for R2)
#0.01764252
#To get the test statistic for R2(y~x[,1]+x[,2]+x[,3])
dat=dat1
nv=length(dat$V1)
v1=c(1,2,3) 
r2_var(dat,v1,nv)
#r2redux output
#output$rsq (R2)
#0.03836254
#output$var (variance of R2)
#0.0001436128
#output$r2_based_p (R2 based P-value)
#1.188162e-10
#output$upper_r2 (upper limit of 95% CI for R2)
#0.06433782
#output$lower_r2 (lower limit of 95% CI for R2)
#0.0176425
#When comparing two independent sets of PGSs 
#Let’s assume dat1$V1 and dat2$V2 are independent for this example
#(e.g. male PGS vs. female PGS)
nv=length(dat1$V1)
v1=c(1)
output1=r2_var(dat1,v1,nv)
nv=length(dat2$V1)
v1=c(1)
output2=r2_var(dat2,v1,nv)
#To get the difference between two independent sets of PGSs 
r2_diff_independent=abs(output1$rsq-output2$rsq)
#To get the variance of the difference between two independent sets of PGSs 
var_r2_diff_independent= output1$var+output2$var
sd_r2_diff_independent=sqrt(var_r2_diff_independent)
#To get p-value (following eq. 15 in the paper)
chi=r2_diff_independent^2/var_r2_diff_independent
p_value=pchisq(chi,1,lower.tail=FALSE)
#to get 95% CI (following eq. 15 in the paper)
uci=r2_diff_independent+1.96*sd_r2_diff_independent
lci=r2_diff_independent-1.96*sd_r2_diff_independent
r_diff function
Description
This function estimates var(R(y~x[,v1]) - R(y~x[,v2])) where R is the correlation between y and x, y is N by 1 matrix having the dependent variable, and x is N by M matrix having M explanatory variables. v1 or v2 indicates the ith column in the x matrix (v1 or v2 can be multiple values between 1 - M, see Arguments below)
Usage
r_diff(dat, v1, v2, nv)
Arguments
| dat | N by (M+1) matrix having variables in the order of cbind(y,x) | 
| v1 | This can be set as v1=c(1) or v1=c(1,2) | 
| v2 | This can be set as v2=c(2), v2=c(3), v2=c(1,3) or v2=c(3,4) | 
| nv | Sample size | 
Value
This function will estimate significant difference between two PGS (either dependent or independent and joint or single). To get the test statistics for the difference between R(y~x[,v1]) and R(y~x[,v2]). (here we define R_1=R(y~x[,v1])) and R_2=R(y~x[,v2]))). The outputs are listed as follows.
| r1 | R_1 | 
| r2 | R_2 | 
| var1 | Variance of R_1 | 
| var2 | variance of R_2 | 
| var_diff | Variance of difference between R_1 and R_2 | 
| r2_based_p | P-value for significant difference between R_1 and R_2 for two tailed test | 
| r_based_p_one_tail | P-value for significant difference between R_1 and R_2 for one tailed test | 
| mean_diff | Differences between R_1 and R_2 | 
| upper_diff | Upper limit of 95% CI for the difference | 
| lower_diff | Lower limit of 95% CI for the difference | 
Examples
#To get the test statistics for the difference between R(y~x[,1]) and 
#R(y~x[,2]). (here we define R_1=R(y~x[,1])) and R_2=R(y~x[,2])))
dat=dat1
nv=length(dat$V1)
v1=c(1)
v2=c(2)
output=r_diff(dat,v1,v2,nv)
output
#r2redux output
#output$r1 (R_1)
#0.1958636
#output$r2 (R_2)
#0.197006
#output$var1 (variance of R_1)
#0.0009247466
#output$var2 (variance of R_1)
#0.0001451358
#output$var_diff (variance of difference between R_1 and R_2)
#3.65286e-06
#output$r_based_p (two tailed p-value for significant difference between R_1 and R_2)
#0.5500319
#output$r_based_p_one_tail (one tailed p-value 
#0.2750159
#output$mean_diff
#-0.001142375 (differences between R2_1 and R2_2)
#output$upper_diff (upper limit of 95% CI for the difference)
#0.002603666
#output$lower_diff (lower limit of 95% CI for the difference)
#-0.004888417
#To get the test statistics for the difference between R(y~x[,1]+[,2]) and 
#R(y~x[,2]). (here R_1=R(y~x[,1]+x[,2]) and R_2=R(y~x[,1]))
nv=length(dat$V1)
v1=c(1,2)
v2=c(2)
output=r_diff(dat,v1,v2,nv)
output
#output$r1
#0.1974001
#output$r2
#0.197006
#output$var1
#0.0009235848
#output$var2
#0.0009238836
#output$var_diff
#3.837451e-06
#output$r2_based_p
#0.8405593
#output$mean_diff
#0.0003940961
#output$upper_diff
#0.004233621
#output$lower_diff
#-0.003445429
#Note: If the directions are not consistent, for instance, if one correlation
#is positive (R_1) and another is negative (R_2), or vice versa, it is 
#crucial to approach the interpretation of the comparative test with caution. 
#This caution is especially emphasized when applying r_diff() 
#in a nested model comparison involving a joint model