GxEScanR is designed to efficiently run genome-wide association study (GWAS) and genome-wide by environment interaction study (GWEIS) scans using imputed genotypes stored in the BinaryDosage format. The phenotype to be analyzed can either be a continuous or binary trait. The GWEIS scan performs multiple tests that can be used in two-step methods.
There are five models that can be fit using GxEScanR when running a GWEIS, plus one that must be fit before running the GWEIS.
\(Y\) represents the outcome or phenotype
\(X\) represents the confounding covariates
\(E\) represents the covariate that will have its genetic interaction tested
\(G\) represents the genetic data
\(\beta\) represents the effect estimate that will be fitted in the model
This model contains the phenotype, confounding covariates, and the interaction covariate. It must be fit prior to running the GWEIS using the glm() routine.
\[Y = \beta_X X + \beta_E E\]
This model contains all the values from the environment-only model and adds the genetic data.
\[Y = \beta_X X + \beta_E E + \beta_G G\]
\(\beta_G\) is the only term tested
in this model and is represented by bg_ge in the
output.
This model contains all the values from the gene-only model and adds the gene-environment interaction term.
\[Y = \beta_X X + \beta_E E + \beta_G G + \beta_{GxE} GE\]
\(\beta_G\) and \(\beta_{GxE}\) are both tested for this
model as well as a joint test for both terms. They are represented by
bg_gxe, bgxe, and joint,
respectively, in the output.
This model tests for a relationship between the environmental value of interest and the gene. If \(E\) is coded 0,1, a logistic model is fitted; otherwise a linear model is fitted.
\[E = \beta_X X + \beta_G G\]
\(\beta_G\) is tested in this model
and is represented by bg_eg in the output.
This model is the same as model 3 except it is run on cases only.
\(\beta_G\) is tested and is
represented by bg_case in the output. This model can only
be run on a dichotomous phenotype.
This model is the same as model 3 except it is run on controls only.
\(\beta_G\) is tested and is
represented by bg_ctrl in the output. This model can only
be run on a dichotomous phenotype.
There are five steps to running a GWEIS using GxEScanR:
GxEScanR uses data stored in the BinaryDosage format. The
BinaryDosage package converts files from the VCF format to the
BinaryDosage format using the vcftobd() routine. The
getbdinfo() routine is then used to load the information
about the BinaryDosage file needed to run the GWEIS.
The following example shows how to convert a VCF file to BinaryDosage
format. Note that vcftobd() requires the
vcfppR package.
vcffile <- system.file("extdata", "gendata.vcf.gz", package = "GxEScanR")
exampledir <- tempdir()
bdosefile <- file.path(exampledir, "gendata.bdose")
BinaryDosage::vcftobd(vcffile = vcffile, bdose_file = bdosefile)
bdinfo <- BinaryDosage::getbdinfo(bdosefile)This vignette uses a pre-converted BinaryDosage file included with the package.
The subject data consists of subject IDs used to link the genetic data, and the phenotype and covariates used in the GWEIS. The data included with GxEScanR has two phenotypes (one continuous, one dichotomous) and two covariates (x1 and x2).
Important: All values for phenotypes and covariates must be numeric; factors are not allowed.
Important: All subject IDs must be character values.
Important: When using a dichotomous phenotype, it must be coded 0,1.
subjectfile <- system.file("extdata", "subdata.rds", package = "GxEScanR")
subjectdata <- readRDS(subjectfile)
head(subjectdata)
#> subid y_linear y_logistic x1 x2
#> 1 subject1 7.939120 1 0 0
#> 2 subject2 5.965786 0 1 0
#> 3 subject3 3.047969 0 1 0
#> 4 subject4 1.173026 0 0 0
#> 5 subject5 7.691592 1 1 0
#> 6 subject6 3.931234 0 1 0The base model is fit using glm() using data that contains only the subject IDs, phenotype, and covariates of interest — no genetic data.
It is important to:
bdinfo$samples).The formula passed to glm() must list the covariate whose interaction
with the genetic data is being tested last. In this
example we test the interaction with x1, so the formula is
y ~ x2 + x1.
# Remove y_logistic from the subject data
lineardata <- subjectdata[, c(1, 2, 4, 5)]
# Keep only subjects with complete data
lineardata <- lineardata[complete.cases(lineardata), ]
# Keep only subjects with genetic data
lineardata <- lineardata[lineardata$subid %in% bdinfo$samples$sid, ]
# Fit the linear model
linearmodel <- glm(y_linear ~ x2 + x1, data = lineardata)# Remove y_linear from the subject data
logisticdata <- subjectdata[, c(1, 3, 4, 5)]
# Keep only subjects with complete data
logisticdata <- logisticdata[complete.cases(logisticdata), ]
# Keep only subjects with genetic data
logisticdata <- logisticdata[logisticdata$subid %in% bdinfo$samples$sid, ]
# Fit the logistic model
logisticmodel <- glm(y_logistic ~ x2 + x1, family = binomial, data = logisticdata)Memory must be allocated before running the GWEIS using
gweis.mem(). This pre-computation makes the scan
substantially faster. Pass the fitted base model, the subject IDs from
that model’s data, and the list of tests to perform.
Typical test sets:
"bg_ge",
"bg_gxe", "bgxe", "joint""bg_ge",
"bg_gxe", "bgxe", "joint",
"bg_eg", "bg_case",
"bg_ctrl"# Continuous outcome
linearmem <- gweis.mem(gemdl = linearmodel,
subids = lineardata$subid,
tests = c("bg_ge", "bg_gxe", "bgxe", "joint"))
# Dichotomous outcome
logisticmem <- gweis.mem(gemdl = logisticmodel,
subids = logisticdata$subid,
tests = c("bg_ge", "bg_gxe", "bgxe", "joint",
"bg_eg", "bg_case", "bg_ctrl"))Pass the memory object from gweis.mem(), the
bdinfo object, a vector of SNP indices or IDs to test, and
an output file path to rungweis(). Results are written as a
tab-delimited text file.
snpindex <- 1:nrow(bdinfo$snps)
# Continuous outcome
linearresults <- file.path(exampledir, "linear.txt")
rungweis(gweismem = linearmem,
bdinfo = bdinfo,
snps = snpindex,
outfilename = linearresults)
# Dichotomous outcome
logisticresults <- file.path(exampledir, "logistic.txt")
rungweis(gweismem = logisticmem,
bdinfo = bdinfo,
snps = snpindex,
outfilename = logisticresults)Read the output file with read.table(). The following
describes the output columns, illustrated with the first three SNPs.
lineardf <- read.table(linearresults, header = TRUE, sep = "\t")
logisticdf <- read.table(logisticresults, header = TRUE, sep = "\t")Every row contains SNP ID (snpid), chromosome
(chr), position (loc), reference allele
(ref), alternate allele (alt), and alternate
allele frequency (aaf). For a dichotomous outcome,
aaf_case and aaf_ctrl are also included.
| snpid | chr | loc | ref | alt | aaf |
|---|---|---|---|---|---|
| chr22:10527916:T:G | chr22 | 10527916 | T | G | 0.6494005 |
| chr22:10648779:G:T | chr22 | 10648779 | G | T | 0.0854975 |
| chr22:10648794:G:A | chr22 | 10648794 | G | A | 0.0536727 |
| snpid | chr | loc | ref | alt | aaf | aaf_case | aaf_ctrl |
|---|---|---|---|---|---|---|---|
| chr22:10527916:T:G | chr22 | 10527916 | T | G | 0.6494005 | 0.6264272 | 0.6540861 |
| chr22:10648779:G:T | chr22 | 10648779 | G | T | 0.0854975 | 0.0882864 | 0.0849287 |
| chr22:10648794:G:A | chr22 | 10648794 | G | A | 0.0536727 | 0.0457573 | 0.0552871 |
bg_gebg_ge is the \(\beta_G\) estimate from model 1.
bg_ge_lrt is the LRT 1 df chi-squared statistic for \(\beta_G = 0\). Output when
"bg_ge" is in tests.
| bg_ge | bg_ge_lrt |
|---|---|
| -0.8900621 | 5.2558824 |
| -0.4766408 | 0.6737040 |
| -0.6481754 | 0.8070534 |
bg_gxe, bgxe,
jointbg_gxe / bg_gxe_lrt: \(\beta_G\) estimate and LRT from model 2.
Output when "bg_gxe" is in tests.bgxe / bgxe_lrt: \(\beta_{GxE}\) estimate and LRT. Output when
"bgxe" is in tests.joint_lrt: 2 df LRT for \(\beta_G = 0\) and \(\beta_{GxE} = 0\) jointly. Output when
"joint" is in tests.| bg_gxe | bg_gxe_lrt | bgxe | bgxe_lrt | joint_lrt |
|---|---|---|---|---|
| -1.4864649 | 8.4363261 | 1.4016489 | 3.2015268 | 8.4574092 |
| -0.4591966 | 0.3719000 | -0.0430555 | 0.0013243 | 0.6750284 |
| -0.6509083 | 0.5512644 | 0.0084781 | 0.0000301 | 0.8070835 |
bg_eg, bg_case,
bg_ctrlEach test produces a \(\beta_G\)
estimate and a 1 df LRT statistic. Output when "bg_eg",
"bg_case", or "bg_ctrl" are in
tests, respectively.
| bg_eg | bg_eg_lrt | bg_case | bg_case_lrt | bg_ctrl | bg_ctrl_lrt |
|---|---|---|---|---|---|
| -0.3460221 | 0.7862570 | 0.8264053 | 0.8541469 | -0.4673483 | 1.1215720 |
| -0.0867130 | 0.0219140 | -0.7545895 | 0.4064202 | 0.0815513 | 0.0144966 |
| -0.2255839 | 0.0949399 | -0.4952261 | 0.0361789 | -0.0220965 | 0.0008210 |