Introduction to TwoCutoff

Overview

This vignette introduces the TwoCutoff package, a software tool used in Alzheimer’s disease research to determine biomarker cutoff values. The package uses a two-stage process: first, it identifies biomarker threshold values that separate diseased and non-diseased individuals; second, it evaluates their diagnostic performance. Instead of using a single cutoff, the method defines two cutoffs, creating three groups: non-diseased individuals, diseased individuals, and an intermediate ‘gray zone’ for cases that cannot be classified with enough certainty. The package includes functions for:

adjust_score() – This function implements a two-stage disease risk modeling framework for biomarker analysis. First, it uses a Gaussian finite mixture model to estimate the probability that each subject belongs to the diseased or non-diseased population based on biomarker values. Then, it applies an XGBoost model to adjust this probability using clinical confounders, producing a final confounder-adjusted disease risk score.
derive_cutoffs_percentile() – This function derives lower and upper biomarker cutoffs using percentile-based thresholds from confounder-adjusted disease risk groups. The lower cutoff is estimated from the upper percentile of low-risk subjects to define a rule-out threshold, while the upper cutoff is estimated from the lower percentile of high-risk subjects to define a rule-in threshold, allowing classification into non-diseased, diseased, and indeterminate (gray-zone) groups.
derive_cutoffs_sensspec() – This function is a Receiver Operating Characteristic (ROC)-based cutoff function derives lower and upper biomarker cutoffs using ROC analysis of the confounder-adjusted risk scores. The lower cutoff is chosen to achieve high sensitivity for ruling out disease, while the upper cutoff is chosen to achieve high specificity for ruling in disease, allowing classification into non-diseased, diseased, and indeterminate (gray-zone) groups.
evaluate_performance() – This function evaluates the diagnostic performance of the derived biomarker cutoffs by comparing predicted classifications against the true disease status. The function computes key performance metrics including sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and the percentage of subjects classified in the indeterminate (gray-zone) group.
compare_performance() – This function compares percentile-based and ROC-based cutoff methods by summarizing their derived biomarker thresholds and diagnostic performance metrics. The function evaluates and contrasts sensitivity, specificity, PPV, NPV, and the proportion of indeterminate (gray-zone) subjects for each approach.
plot_two_cutoff() – This function visualizes the two-cutoff classification framework by displaying biomarker values together with the lower (rule-out) and upper (rule-in) thresholds. Subjects are colored according to their predicted classification as non-diseased, diseased, or indeterminate (gray-zone), allowing visual assessment of cutoff performance and group separation. The function can also display a confusion matrix summarizing the agreement between predicted classifications and true disease status.
dca_analysis() – This function evaluates the clinical usefulness of the confounder-adjusted risk model using Decision Curve Analysis (DCA). The function measures net benefit across a range of decision thresholds and compares the predictive model against default clinical strategies such as treating all subjects or treating none. This helps determine whether the model provides improved clinical decision-making by balancing true positive detections against unnecessary false positive classifications.

Workflow Overview

The diagram below shows the overall pipeline implemented in our package:

C:9hmIP5cac4c4e460b_to_TwoCutoff.R

We demonstrate the workflow using simulated data representing an Alzheimer’s disease biomarker study. The binary outcome variable (DISEASE) indicates disease status, while the phosphorylated tau (PTAU) biomarke is generated with higher mean values in diseased subjects than in non-diseased subjects. This setup mimics real clinical biomarker behavior, where elevated biomarker concentrations are associated with increased probability of disease. Additional variables including AGE, SEX, body mass index (BMI), hypertension (HTN), and diabetes mellitus (DM) are included as potential confounders in the risk-adjustment model.


 n <- 1000

 # Set seed for reproducibility
 set.seed(123)

 # Outcome
 DISEASE <- rbinom(n, 1, 0.35)

 # Biomarker depends on disease
 PTAU <- rnorm(n, mean = 25 + 10 * DISEASE, sd = 8)

 # Confounders (optional)
 AGE <- round(rnorm(n, 35, 7))
 SEX <- sample(c(0, 1), n, replace = TRUE)
 BMI <- rnorm(n, 25, 4)
 HTN <- rbinom(n, 1, 0.3)
 DM <- rbinom(n, 1, 0.2)

 df <- data.frame(PTAU, AGE, SEX, BMI, HTN, DM, DISEASE)

C:9hmIP5cac4c4e460b_to_TwoCutoff.R

Adjusted Risk Model

The adjust_score() function implements a two-stage statistical and machine learning framework for estimating disease risk from biomarker data while accounting for clinical confounders.

The method is designed for situations where:

A biomarker shows overlap between diseased and non-diseased individuals,
Additional variables such as AGE, SEX, BMI, HTN, or DM may influence disease prediction.

Stage 1 — Finite Mixture Model (FMM)

In the first stage, the biomarker distribution is modeled using a two-component Gaussian mixture model.

The assumption is that the observed biomarker values arise from two latent populations:

A non-diseased group,
A diseased group.

The model estimates:

The mean biomarker level for each group,
The variability within each group,
The proportion of subjects belonging to each population.

The Expectation-Maximization (EM) algorithm is used to iteratively estimate these parameters.

After estimating the mixture model parameters for every subject, the model calculates a posterior probability of disease:

\[P(\text{Disease} \mid x)=\hat{p}(x)=\dfrac{\hat{\pi}\cdot N(x\mid \hat{\mu_2}, \hat{\sigma_2})}{\hat{\pi}\cdot N(x\mid \hat{\mu_2}, \hat{\sigma_2}) + (1-\hat{\pi})\cdot N(x\mid \hat{\mu_1}, \hat{\sigma_1})},\] where:

\(x\) is observed biomarker value,
\(N(x\mid \mu, \sigma)\) is Gaussian density evaluated at \(x\),
Component 1 corresponds to the non-diseased group,
Component 2 corresponds to the diseased group,
\(\pi\) is the mixing proportion for the diseased component.

Output:

This probability is stored as p_disease.

Interpretation:

Values near 0 indicate subjects likely to be non-diseased,
Values near 1 indicate subjects likely to be diseased,
Intermediate values represent uncertain or “gray-zone” subjects.

Thus, the first stage transforms the raw biomarker into a probabilistic disease score.

Stage 2 — Confounder-Adjusted Risk Model

In the second stage, the function builds a supervised binary classification model using XGBoost to estimate a confounder-adjusted probability of disease. The binary outcome variable (0 = non-diseased, 1 = diseased) is modeled using the biomarker-derived posterior probability (p_disease) from Stage 1 together with user-specified confounders such as AGE, SEX, BMI, HTN, or DM.

This stage considers other clinical factors besides the biomarker when estimating disease risk.

The function supports two training strategies:

Using the entire dataset for model fitting (validate = FALSE),
Performing a train/test split (validate = TRUE) to evaluate predictive performance on held-out data.

When validation is enabled:

The dataset is partitioned into training and testing subsets,
The XGBoost model is trained on the training data,
Performance is evaluated on unseen test data using AUC.

Model complexity is optimized through cross-validation with early stopping.

By default, the function uses predefined XGBoost hyperparameters including:

max_depth = 3
eta = 0.05
subsample = 0.8
colsample_bytree = 0.8

However, users may also provide their own hyperparameter settings through the xgb_params argument to customize the learning process and tune model performance.

The final output of this stage is the adjusted_risk score, representing the confounder-adjusted probability of disease for each subject, with values ranging from 0 to 1.

Function Usage

adjust_score(
  data_set,
  biomarker,
  outcome,
  confounders = NULL,
  k = 2,
  maxit = 1000,
  epsilon = 1e-08,
  CompCase = FALSE,
  xgb_params = NULL,
  validate = FALSE,
  test_size = 0.2,
  seed = 123
)

Key arguments include:

data_set: Input dataset containing biomarker, outcome, and confounders.
biomarker: Name of the biomarker column.
outcome: Binary disease outcome (0 = non-diseased, 1 = diseased).
confounders: Optional vector of confounder variable names.
validate: Whether to perform train/test validation.
xgb_params: Optional user-defined XGBoost hyperparameters.

We now apply the adjust_score() function to estimate biomarker-based disease probabilities and confounder-adjusted risk scores.

res <- adjust_score(
  df,
  biomarker = "PTAU", outcome = "DISEASE",
  confounders = c("AGE","SEX","BMI","HTN","DM"))
#> WARNING! NOT CONVERGENT! 
#> number of iterations= 1000
#> Setting levels: control = 0, case = 1
#> Setting direction: controls < cases
head(res$data[,c("PTAU","p_disease","adjusted_risk")])
#>       PTAU p_disease adjusted_risk
#> 1 20.18486 0.1915690     0.2355698
#> 2 27.05041 0.4768597     0.3722855
#> 3 33.21428 0.6628730     0.4702801
#> 4 41.00849 0.9143770     0.5774739
#> 5 22.92667 0.2647666     0.3495702
#> 6 24.23882 0.3673505     0.2907061

C:9hmIP5cac4c4e460b_to_TwoCutoff.R

Percentile-Based Cutoff Derivation

The derive_cutoffs_percentile() function derives lower and upper biomarker cutoffs using percentile-based thresholds from the confounder-adjusted disease risk scores generated by adjust_score().

The method creates a three-zone diagnostic framework:

A rule-out zone for subjects unlikely to have disease,
A rule-in zone for subjects likely to have disease,
An intermediate indeterminate (gray-zone) for uncertain cases.

Method Overview

The function first separates subjects into low-risk and high-risk groups using the adjusted_risk scores.

Low-Risk Group and Rule-Out Cutoff

Subjects with adjusted_risk < low_thresh are assigned to the low-risk group. By default low_thresh = 0.05, meaning subjects with estimated disease probability below 5% are considered unlikely to have disease. The lower biomarker cutoff (c_L) is then defined as the 80th percentile of biomarker values within this low-risk group.

High-Risk Group and Rule-In Cutoff

Subjects with adjusted_risk >= high_thresh are assigned to the high-risk group. By default high_thresh = 0.70, meaning subjects with estimated disease probability of at least 70% are considered likely to have disease. The upper biomarker cutoff (c_H) is defined as the 20th percentile of biomarker values within this high-risk group. This cutoff is intended to maximize specificity for ruling in disease.

Fallback Behavior for Small Datasets

When fewer than five subjects are available in either group, the function automatically uses fallback cutoffs based on the overall biomarker distribution:

5th percentile for the lower cutoff (c_L), 95th percentile for the upper cutoff (c_H).

In small datasets, the number of subjects below low_thresh = 0.05 or above high_thresh = 0.70 may be too small to estimate reliable cutoff values. In such cases, users may consider selecting different low_thresh or high_thresh values depending on dataset size.

Classification Rules

After estimating the two cutoffs:

Biomarker values below c_L are classified as "Negative",
Biomarker values above c_H are classified as "Positive",
Values between the two cutoffs are classified as "Indeterminate".

Function Usage

derive_cutoffs_percentile(
  adjust_result,
  biomarker,
  low_thresh = 0.05,
  high_thresh = 0.70
)

Key arguments include:

adjust_result: Output object from adjust_score().
biomarker: Name of biomarker column.
low_thresh: Threshold defining low-risk subjects.
high_thresh: Threshold defining high-risk subjects.

We now derive percentile-based biomarker cutoffs using the adjusted disease risk scores.

cutoffs_p <- derive_cutoffs_percentile(
  res,
  biomarker = "PTAU"
)
#> Number of cases below low_thresh = 0.05 is too small, so fallback to 5th percentile of biomarker was used. Consider using another low_thresh value if your dataset is small.
#> Number of cases above high_thresh = 0.7 is too small, so fallback to 95th percentile of biomarker was used. Consider using another high_thresh value if your dataset is small.

cutoffs_p$c_L
#> [1] 13.43517
cutoffs_p$c_H
#> [1] 43.82378

table(cutoffs_p$class)
#> 
#> Indeterminate      Negative      Positive 
#>           900            50            50

C:9hmIP5cac4c4e460b_to_TwoCutoff.R

ROC-Based Cutoff Derivation

The derive_cutoffs_sensspec() function derives biomarker cutoffs using ROC analysis of the confounder-adjusted generated by adjust_score().

Unlike the percentile-based method, this approach selects cutoffs according to target diagnostic performance measures:

High sensitivity for ruling out disease,
High specificity for ruling in disease.

The method again creates three diagnostic regions:

A rule-out zone,
A rule-in zone,
An indeterminate (gray-zone) region between the two cutoffs.

ROC Curve Construction

The function first computes a ROC curve using:

adjusted_risk as the predictor,
The observed disease status (0 = non-diseased, 1 = diseased) as the reference outcome.

The ROC curve evaluates the ability of these risk scores to discriminate between diseased and non-diseased subjects across a range of possible classification thresholds.

For each possible risk threshold:

Subjects with adjusted_risk greater than or equal to the threshold are classified as diseased,
Subjects below the threshold are classified as non-diseased.

At every threshold, the function calculates:

Sensitivity (True Positive Rate): The proportion of diseased subjects correctly identified.
Specificity (True Negative Rate): The proportion of non-diseased subjects correctly identified.

The ROC curve summarizes the trade-off between sensitivity and specificity across all thresholds.

Selection of Rule-Out and Rule-In Thresholds

After constructing the ROC curve, the function identifies two clinically meaningful thresholds on the adjusted_risk scale:

Rule-Out Threshold (`T_L`)

The lower threshold is selected to achieve a target sensitivity (sens_target).

By default:

sens_target = 0.80

meaning the threshold is chosen so that at least 80% of diseased subjects are correctly identified. This threshold prioritizes minimizing false negatives and is therefore used for ruling out disease.

Rule-In Threshold (`T_H`)

The upper threshold is selected to achieve a target specificity (spec_target). By default:

spec_target = 0.80

meaning the threshold is chosen so that at least 80% of non-diseased subjects are correctly identified. This threshold prioritizes minimizing false positives and is therefore used for ruling in disease.

Mapping Risk Thresholds Back to Biomarker Values

The ROC analysis produces thresholds on the adjusted_risk probability scale rather than the original biomarker scale. To obtain clinically interpretable biomarker cutoffs, the function maps these thresholds back to biomarker values. When the relationship between biomarker values and adjusted risk is approximately monotone, the function fits a LOESS regression model:

adjusted_risk ~ biomarker

and uses interpolation to estimate biomarker values corresponding to the selected ROC thresholds. If a non-monotone relationship is detected between biomarker values and adjusted risk scores, the function automatically switches to isotonic regression to enforce a monotone relationship and improve stability of the cutoff estimation process.

Classification Rules

After estimating the lower and upper cutoffs:

Biomarker values below c_L are classified as "Negative",
Biomarker values above c_H are classified as "Positive",
Values between the two cutoffs are classified as "Indeterminate".

Function Usage

derive_cutoffs_sensspec(
  adjust_result,
  biomarker,
  outcome,
  sens_target = 0.8,
  spec_target = 0.8
)

Key arguments include:

adjust_result: Output object from adjust_score().
biomarker: Name of biomarker column.
outcome: Binary disease outcome variable.
sens_target: Desired sensitivity for the rule-out cutoff.
spec_target: Desired specificity for the rule-in cutoff.

We now derive ROC-based biomarker cutoffs using the adjusted disease risk scores.

cutoffs_s <- derive_cutoffs_sensspec(
  res,
  biomarker = "PTAU",
  outcome = "DISEASE",
  sens_target = 0.80,
  spec_target = 0.80
)
#> Warning in derive_cutoffs_sensspec(res, biomarker = "PTAU", outcome =
#> "DISEASE", : Sensitivity target not achievable -> using 5th percentile of
#> adjusted risk as fallback threshold.
#> Warning in derive_cutoffs_sensspec(res, biomarker = "PTAU", outcome =
#> "DISEASE", : Specificity target not achievable -> using 95th percentile of
#> adjusted risk as fallback threshold.

cutoffs_s$c_L
#> [1] 13.05203
cutoffs_s$c_H
#> [1] 45.36341

table(cutoffs_s$class)
#> 
#> Indeterminate      Negative      Positive 
#>           921            46            33

C:9hmIP5cac4c4e460b_to_TwoCutoff.R

Performance Evaluation

The evaluate_performance() function summarizes the diagnostic performance of the derived two-cutoff classification system.

The function compares the predicted classification results against the true disease labels and computes commonly used diagnostic accuracy measures.

Performance Metrics

The following metrics are calculated:

Sensitivity
Specificity
Positive Predictive Value (PPV): The proportion of predicted positive subjects who truly have disease.
Negative Predictive Value (NPV): The proportion of predicted negative subjects who are truly non-diseased.
Percent Indeterminate: The percentage of subjects classified into the gray-zone between the lower and upper cutoffs.

Interpretation

The two-cutoff approach aims to classify subjects only when the prediction is more certain, because subjects with biomarker values between the lower and upper cutoffs are placed in the indeterminate group instead of being forced into positive or negative classifications. As a result:

Diagnostic accuracy may improve,
PPV and NPV may become more reliable, but more subjects may be placed in the indeterminate (gray-zone) group.

This means the method prefers more reliable classifications, even if some subjects remain unclassified.

Function Usage

evaluate_performance(
  cutoff_result,
  outcome
)

Key arguments include:

cutoff_result: Output object from derive_cutoffs_percentile() or derive_cutoffs_sensspec().
outcome: Binary disease outcome variable.

We now evaluate the diagnostic performance of the percentile-based cutoff method.

perf_p <- evaluate_performance(
  cutoffs_p,
  outcome = "DISEASE"
)

C:9hmIP5cac4c4e460b_to_TwoCutoff.R

#> Warning: 'xfun::attr()' is deprecated.
#> Use 'xfun::attr2()' instead.
#> See help("Deprecated")

#> Warning: 'xfun::attr()' is deprecated.
#> Use 'xfun::attr2()' instead.
#> See help("Deprecated")

Sensitivity	Specificity	PPV	NPV	Percent_Indeterminate
1	0.806	0.76	1	90

C:9hmIP5cac4c4e460b_to_TwoCutoff.R We can similarly evaluate the ROC-based cutoff method.

perf_s <- evaluate_performance(
  cutoffs_s,
  outcome = "DISEASE"
)

C:9hmIP5cac4c4e460b_to_TwoCutoff.R

#> Warning: 'xfun::attr()' is deprecated.
#> Use 'xfun::attr2()' instead.
#> See help("Deprecated")

#> Warning: 'xfun::attr()' is deprecated.
#> Use 'xfun::attr2()' instead.
#> See help("Deprecated")

Sensitivity	Specificity	PPV	NPV	Percent_Indeterminate
1	0.885	0.818	1	92.1

C:9hmIP5cac4c4e460b_to_TwoCutoff.R

Comparison of Cutoff Methods

The compare_performance() function summarizes and compares the diagnostic performance of the percentile-based and ROC-based cutoff approaches.

The function combines:

Lower and upper biomarker cutoffs,
Sensitivity,
Specificity,
PPV,
NPV,
Percentage of indeterminate subjects,

into a single summary table for easier comparison between methods.

Interpretation

Different cutoff derivation methods may produce different trade-offs between diagnostic accuracy and the proportion of indeterminate subjects.

For example:

A method with higher sensitivity may classify fewer diseased subjects as negative,
A method with higher specificity may reduce false positives,
A larger indeterminate zone may improve classification reliability but leave more subjects unclassified.

Function Usage

compare_performance(
  percentile_result,
  sensspec_result,
  outcome
)

Key arguments include:

percentile_result: Output object from derive_cutoffs_percentile().
sensspec_result: Output object from derive_cutoffs_sensspec().
outcome: Binary disease outcome variable.

We now compare the percentile-based and ROC-based cutoff methods.

comparison <- compare_performance(
  cutoffs_p,
  cutoffs_s,
  outcome = "DISEASE"
)

C:9hmIP5cac4c4e460b_to_TwoCutoff.R

#> Warning: 'xfun::attr()' is deprecated.
#> Use 'xfun::attr2()' instead.
#> See help("Deprecated")

#> Warning: 'xfun::attr()' is deprecated.
#> Use 'xfun::attr2()' instead.
#> See help("Deprecated")

Method	c_L	c_H	Sensitivity	Specificity	PPV	NPV	Percent_Indeterminate
Percentile-based	13.435	43.824	1	0.806	0.760	1	90.0
ROC-based	13.052	45.363	1	0.885	0.818	1	92.1

C:9hmIP5cac4c4e460b_to_TwoCutoff.R ## Visualization of Two-Cutoff Classification

The plot_two_cutoff() function provides a graphical visualization of the two-cutoff biomarker classification framework.

The function displays:

Individual biomarker values,
The lower rule-out cutoff (c_L),
The upper rule-in cutoff (c_H),
Predicted subject classifications.

The visualization helps assess how effectively the derived biomarker cutoffs separate diseased and non-diseased subjects.

Classification Regions

Subjects are classified into three diagnostic groups according to their biomarker values:

Biomarker values below the lower cutoff (c_L) are classified as "Negative",
Biomarker values above the upper cutoff (c_H) are classified as "Positive",
Biomarker values between the two cutoffs are classified as "Indeterminate" (gray-zone).

This framework avoids forcing uncertain subjects into positive or negative categories when biomarker values fall within overlapping regions.

Plot Structure

The function creates a scatter-style visualization using jittered biomarker values:

The x-axis represents the true disease groups,
The y-axis represents biomarker values,
Horizontal dashed lines indicate the lower and upper cutoff thresholds,
Points are colored according to predicted classification.

By default:

Red points represent "Negative" classifications,
Blue points represent "Positive" classifications,
Gray points represent "Indeterminate" classifications.

Confusion Matrix Visualization

When add_table = TRUE, the function additionally displays a confusion matrix table below the plot.

The confusion table summarizes agreement between:

The true disease status,
The predicted two-cutoff classification groups.

This allows quick visual assessment of:

Correct positive classifications,
Correct negative classifications,
Indeterminate classifications,
Potential false positive and false negative assignments.

Confusion tables are currently supported only for single-panel visualizations.

Multi-Panel Visualization

The function also supports side-by-side comparison of multiple cutoff strategies by allowing multiple pairs of lower and upper cutoffs.

This is useful for comparing:

Percentile-based cutoffs,
ROC-based cutoffs,
Alternative threshold selection strategies.

Each panel displays a separate cutoff configuration while sharing the same biomarker dataset.

Function Usage

plot_two_cutoff(
  cutoff_result,
  biomarker,
  outcome,
  panel_title = NULL,
  negative_label = "Control",
  positive_label = "Disease",
  add_table = NULL
)

Key arguments include:

cutoff_result: Output object from derive_cutoffs_percentile() or derive_cutoffs_sensspec().
biomarker: Name of biomarker column.
outcome: Binary disease outcome variable.
panel_title: Optional titles for multi-panel comparisons.
negative_label: Label for non-diseased subjects.
positive_label: Label for diseased subjects.
add_table: Whether to display the confusion matrix table.

Single-Method Visualization

We first visualize the percentile-based cutoff classification.

plot_two_cutoff(
  cutoffs_p,
  biomarker = "PTAU",
  outcome = "DISEASE"
)

C:9hmIP5cac4c4e460b_to_TwoCutoff.R

Comparison of Multiple Cutoff Methods

The function can also compare multiple cutoff strategies side-by-side.

Below, we compare percentile-based and ROC-based cutoff methods using the same biomarker dataset.

# Create combined plotting object
cut_plot <- cutoffs_p

cut_plot$c_L <- c(cutoffs_p$c_L, cutoffs_s$c_L)
cut_plot$c_H <- c(cutoffs_p$c_H, cutoffs_s$c_H)

plot_two_cutoff(
  cut_plot,
  biomarker = "PTAU",
  outcome = "DISEASE",
  panel_title = c("Percentile-based", "ROC-based")
)

C:9hmIP5cac4c4e460b_to_TwoCutoff.R

Interpretation

The visualization allows direct comparison of how different cutoff derivation methods partition subjects into:

Non-diseased,
Diseased,
Indeterminate (gray-zone)

groups.

Differences between panels may reflect alternative trade-offs between sensitivity, specificity, and gray-zone size.

Decision Curve Analysis

The dca_analysis() function evaluates the clinical usefulness of the confounder-adjusted risk model using Decision Curve Analysis (DCA).

Unlike traditional performance metrics such as sensitivity or specificity, DCA evaluates whether using a prediction model improves clinical decision-making across different risk thresholds.

Net Benefit

Decision Curve Analysis is based on the concept of net benefit (NB), which balances:

Correctly identified diseased subjects (true positives),
Incorrectly identified non-diseased subjects (false positives).

The net benefit at threshold \(t\) is defined as:

\[ NB(t)=\frac{TP(t)}{N}-\frac{FP(t)}{N}\cdot\frac{t}{1-t} \]

where:

\(TP(t)\) is the number of true positives at threshold \(t\),
\(FP(t)\) is the number of false positives at threshold \(t\),
\(N\) is the total number of subjects.

Higher net benefit values indicate better clinical utility.

Strategies Compared

The function compares three clinical decision strategies:

Model — decisions based on the confounder-adjusted risk scores,
Treat All — assume all subjects are diseased,
Treat None — assume no subjects are diseased.

The prediction model is considered clinically useful when its net benefit curve lies above both the "Treat All" and "Treat None" strategies.

Function Usage

dca_analysis(
  adjust_result,
  outcome,
  thresholds = seq(0.01, 0.60, by = 0.01)
)

Key arguments include:

adjust_result: Output object from adjust_score().
outcome: Binary disease outcome variable.
thresholds: Vector of clinical risk thresholds used for DCA evaluation.

Decision Curve Analysis Example

dca_out <- dca_analysis(
  res,
  outcome = "DISEASE"
)

head(dca_out$data)
#>   Threshold  NB_model    NB_all NB_none
#> 1      0.01 0.3414141 0.3414141       0
#> 2      0.02 0.3346939 0.3346939       0
#> 3      0.03 0.3278351 0.3278351       0
#> 4      0.04 0.3208333 0.3208333       0
#> 5      0.05 0.3136842 0.3136842       0
#> 6      0.06 0.3063830 0.3063830       0

print(dca_out$plot)

C:9hmIP5cac4c4e460b_to_TwoCutoff.R

Interpretation

The blue "Model" curve represents the net benefit obtained when clinical decisions are based on the confounder-adjusted risk model.

The green "Treat All" curve represents the strategy of treating all subjects as diseased, while the red "Treat None" curve represents the strategy of treating no subjects.

The model is clinically useful at thresholds where the "Model" curve lies above both "Treat All" and "Treat None".

In this example, the model maintains a consistently higher net benefit across a broad range of risk thresholds, indicating improved clinical decision-making compared with the default strategies.

At lower thresholds, the "Treat All" strategy performs similarly to the model because most subjects would be classified as positive regardless of predicted risk. However, as the threshold increases, the net benefit of "Treat All" decreases substantially due to increasing false positive classifications.

In contrast, the prediction model maintains positive net benefit across a wide range of risk thresholds, indicating better identification of diseased and non-diseased subjects compared with default clinical strategies.

Introduction to TwoCutoff

2026-06-20

Overview

Workflow Overview

Adjusted Risk Model

Stage 1 — Finite Mixture Model (FMM)

Stage 2 — Confounder-Adjusted Risk Model

Function Usage

Percentile-Based Cutoff Derivation

Method Overview

Low-Risk Group and Rule-Out Cutoff

High-Risk Group and Rule-In Cutoff

Fallback Behavior for Small Datasets

Classification Rules

Function Usage

ROC-Based Cutoff Derivation

ROC Curve Construction

Selection of Rule-Out and Rule-In Thresholds

Rule-Out Threshold (T_L)

Rule-In Threshold (T_H)

Mapping Risk Thresholds Back to Biomarker Values

Classification Rules

Function Usage

Performance Evaluation

Performance Metrics

Interpretation

Function Usage

Comparison of Cutoff Methods

Interpretation

Function Usage

Classification Regions

Plot Structure

Confusion Matrix Visualization

Multi-Panel Visualization

Function Usage

Single-Method Visualization

Comparison of Multiple Cutoff Methods

Interpretation

Decision Curve Analysis

Net Benefit

Strategies Compared

Function Usage

Decision Curve Analysis Example

Interpretation

Rule-Out Threshold (`T_L`)

Rule-In Threshold (`T_H`)