This vignette demonstrates how to compute diagFDR
diagnostics from a MaxQuant msms.txt
file.
MaxQuant reports a PSM-level score (Andromeda Score) and
a decoy indicator (Reverse). diagFDR can
reconstruct target–decoy counting (TDC) q-values from
the score and then compute stability and calibration diagnostics on the
competed PSM universe.
The typical workflow is:
msms.txt from MaxQuant.msms.txt into a dfdr_tbl (PSM-level)
using read_dfdr_maxquant_msms().dfdr_run_all() and inspect
headline/stability/calibration diagnostics.We start with a simulated dataset that is similar to the
dfdr_tbl returned by
read_dfdr_maxquant_msms(). Any software producing outputs
that can be mapped to columns id, is_decoy,
q, pep, run, and
score can similarly be used.
library(diagFDR)
set.seed(10)
n <- 8000
toy <- data.frame(
id = as.character(seq_len(n)),
is_decoy = sample(c(FALSE, TRUE), n, replace = TRUE, prob = c(0.98, 0.02)),
run = sample(paste0("run", 1:3), n, replace = TRUE),
score = c(rnorm(n * 0.98, mean = 7, sd = 1), rnorm(n * 0.02, mean = 5, sd = 1))[seq_len(n)],
pep = NA_real_
)
# Reconstruct q-values by simple TDC from score (for toy data only):
# Here we mimic a typical "higher score is better" setting.
toy <- toy[order(toy$score, decreasing = TRUE), ]
toy$D_cum <- cumsum(toy$is_decoy)
toy$T_cum <- cumsum(!toy$is_decoy)
toy$FDR_hat <- (toy$D_cum + 1) / pmax(toy$T_cum, 1)
toy$q <- rev(cummin(rev(toy$FDR_hat)))
toy <- toy[, c("id","is_decoy","q","pep","run","score")]
x_toy <- as_dfdr_tbl(
toy,
unit = "psm",
scope = "global",
q_source = "toy TDC from score",
q_max_export = 1,
provenance = list(tool = "toy")
)
diag <- dfdr_run_all(xs = list(MaxQuant_PSM = x_toy), alpha_main = 0.01)diag$tables$headline
#> # A tibble: 1 × 24
#> alpha T_alpha D_alpha FDR_hat CV_hat FDR_minus1 FDR_plus1 FDR_minusK FDR_plusK
#> <dbl> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0.01 0 0 NA Inf NA NA NA NA
#> # ℹ 15 more variables: k2sqrtD <int>, FDR_minus2sqrtD <dbl>,
#> # FDR_plus2sqrtD <dbl>, list <chr>, D_alpha_win <int>, effect_abs <dbl>,
#> # IPE <dbl>, flag_Dalpha <chr>, flag_CV <chr>, flag_Dwin <chr>,
#> # flag_IPE <chr>, flag_FDR <chr>, flag_equalchance <chr>, status <chr>,
#> # interpretation <chr>diag$plots$elasticity
#> Warning: Removed 6 rows containing missing values or values outside the scale range
#> (`geom_line()`).
#> Warning: Removed 6 rows containing missing values or values outside the scale range
#> (`geom_point()`).diag$tables$equal_chance_pooled
#> # A tibble: 1 × 12
#> qmax_export low_lo low_hi N_test N_D_test pi_D_hat effect_abs ci95_lo ci95_hi
#> <dbl> <dbl> <dbl> <int> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0.2 0.5 0 0 NA NA NA NA
#> # ℹ 3 more variables: p_value_binom <dbl>, pass_minN <lgl>, list <chr>
diag$plots$equal_chance__MaxQuant_PSMThe code below shows how to run diagFDR directly
from a MaxQuant msms.txt file.
library(diagFDR)
mq_path <- "path/to/msms.txt"
# Read msms.txt and reconstruct TDC q-values using MaxQuant Score and Reverse indicator.
# - Reverse == "+" is treated as a decoy indicator.
# - Score is assumed "higher is better".
# - q-values are computed using FDR(i) = (D(i)+add_decoy)/T(i) and q(i)=min_{j>=i} FDR(j).
x_mq <- read_dfdr_maxquant_msms(
path = mq_path,
pep_mode = "sanitize", # or "drop" if PEP contains values >1
exclude_contaminants = TRUE,
add_decoy = 1L,
unit = "psm",
scope = "global",
provenance = list(tool = "MaxQuant", file = basename(mq_path))
)
# Run diagnostics
diag <- dfdr_run_all(
xs = list(MaxQuant_PSM = x_mq),
alpha_main = 0.01,
alphas = c(1e-4, 5e-4, 1e-3, 2e-3, 5e-3, 1e-2, 2e-2, 5e-2, 1e-1, 2e-1),
eps = 0.2,
win_rel = 0.2,
truncation = "warn_drop",
low_conf = c(0.2, 0.5)
)
# Inspect headline diagnostics
diag$tables$headline
# Export tables/plots + human-readable summary
dfdr_write_report(
diag,
out_dir = "diagFDR_maxquant_out",
formats = c("csv", "png", "manifest", "readme", "summary")
)
# Optional: render a single HTML report (requires rmarkdown in Suggests)
dfdr_render_report(diag, out_dir = "diagFDR_maxquant_out")
library(diagFDR)
mq_path <- "path/to/msms.txt"
# Read msms.txt and reconstruct TDC q-values
x_mq <- read_dfdr_maxquant_msms(
path = mq_path,
pep_mode = "sanitize",
exclude_contaminants = TRUE,
add_decoy = 1L,
unit = "psm",
scope = "global",
provenance = list(tool = "MaxQuant", file = basename(mq_path))
)
# Run diagnostics with automatic pseudo p-value computation
# This will use score_to_pvalue(method="decoy_ecdf") internally
diag <- dfdr_run_all(
xs = list(MaxQuant_PSM = x_mq),
alpha_main = 0.01,
compute_pseudo_pvalues = TRUE, # generates pseudo p-values from scores
pseudo_pvalue_method = "decoy_ecdf", # Most defensible method for arbitrary scores
p_stratify = "run" # Optional: stratify p-value diagnostics by run
)
# diag will contain:
# - All standard target-decoy diagnostics
# - PLUS p-value calibration plots and tables
# - PLUS Storey pi0 diagnostics
# - PLUS BH comparison diagnostics
# Inspect headline + p-value calibration
diag$tables$headline
diag$tables$p_calibration_summary
# Export everything
dfdr_write_report(
diag,
out_dir = "diagFDR_maxquant_out",
formats = c("csv", "png", "manifest", "readme", "summary")
)
# Render HTML report (will include p-value diagnostics section)
dfdr_render_report(diag, out_dir = "diagFDR_maxquant_out")##Interpretation of pseudo p-value diagnostics
When compute_pseudo_pvalues = TRUE with method = “decoy_ecdf”:
Pseudo p-values are computed as right-tail probabilities under the decoy score distribution
π₀ estimate: Estimates the proportion of true nulls; should be stable across λ
BH comparison: Compares BH-FDR to TDA-FDR; agreement supports consistent FDR control
Important: These are diagnostic pseudo p-values, not guaranteed null p-values. They provide:
✓ Additional calibration perspectives
✓ Cross-validation between BH and TDA procedures
✓ Stratified stability checks (e.g., by run)
Caveat: Pseudo p-value calibration cannot detect scoring pathologies that affect both targets and decoys equally.
MaxQuant Score is not a p-value. However, you can
construct pseudo-p-values to run p-value-based
calibration and BH-stability diagnostics. The most defensible option for
arbitrary scores is to use the empirical decoy null tail
(method = "decoy_ecdf"), which maps scores to right-tail
probabilities under the decoy distribution.
# Create pseudo-p-values from the decoy score tail and rerun diagnostics with p-value checks.
x_mq$p <- score_to_pvalue(
score = x_mq$score,
method = "decoy_ecdf",
is_decoy = x_mq$is_decoy
)
attr(x_mq, "meta")$p_source <- "score_to_pvalue(method='decoy_ecdf' on MaxQuant Score)"
diag_p <- dfdr_run_all(
xs = list(MaxQuant_PSM = x_mq),
alpha_main = 0.01,
p_stratify = "run" # optional stratification if run column is meaningful
)
dfdr_write_report(diag_p, out_dir = "diagFDR_maxquant_out_with_p", formats = c("csv","png","manifest","readme","summary"))
dfdr_render_report(diag_p, out_dir = "diagFDR_maxquant_out_with_p")Universe / scope:
read_dfdr_maxquant_msms() produces a PSM-level universe;
q-values are reconstructed by score-based TDC. Protein-level claims
require additional inference.
Stability: D_alpha,
CV_hat, and D_alpha_win quantify whether the
operating cutoff is well supported by decoys. Very strict cutoffs can
yield sparse decoy tails.
Calibration: equal-chance diagnostics are internal plausibility checks under the target–decoy model. Agreement with expectations supports (but does not prove) correct FDR control.
Pseudo-p-values derived from the decoy tail can be used for additional calibration/stability diagnostics, but should be interpreted as diagnostic tools unless null calibration is theoretically guaranteed.