The PLUCR package provides tools for estimating
treatment probabilities that maximize a primary clinical outcome while
controlling for the occurrence of adverse events. This dual-objective
framework is particularly relevant in medical or decision-making
settings where benefits must be balanced against potential harms.
Many existing methods derive treatment policies based on the sign of the Conditional Average Treatment Effect (CATE). In this setup, individuals with a positive CATE are assigned treatment, as it is expected to be beneficial; those with a non-positive CATE are not treated. While effective in simple settings, this strategy becomes inadequate when multiple outcomes must be considered, particularly when adverse events are ignored.
PLUCR addresses this limitation by framing the policy
learning task as a constrained optimization problem. The goal is to
learn a probability that not only improves the primary outcome but also
respects pre-specified constraints on adverse effects. The method
implemented in PLUCR estimates treatment probabilities by
adapting the Conditional Average Treatment Effect (CATE) to ensure
compliance with a predefined constraint on adverse events. This is
achieved through an iterative procedure that alternates between
optimizing the policy and correcting its estimator, ultimately leading
to a constrained-optimal policy.
This section demonstrates a basic example of how to use the
PLUCR package. We begin by generating a synthetic data set
and applying the main function, main_algorithm(), to learn
an optimal policy under adverse event constraints.
You can install the development version from your local source:
We generate synthetic observational data using the built-in
generate_data() function. This function also produces the
complete data structure (which is typically unavailable in real-world
applications and used here for evaluation purposes only) as well as the
oracular treatment effect functions delta_mu() and
delta_nu().
n <- 3000 # Sample size
ncov <- 10L # Number of covariates
scenario_mu <- "Linear"
scenario_nu <- "Linear"
# Generate synthetic scenario
exp <- PLUCR::generate_data(n,
ncov,
scenario_mu = scenario_mu,
scenario_nu = scenario_nu,
seed=2025)
# # Observational data (to be used for training)
df_obs <- exp[[2]]In this step, we extract the necessary variables from the
observational data set and define the parameters required by the
main_algorithm() function. This includes specifying the
covariates, treatment indicator, outcomes, and adverse events.
# Extract normalized numerical covariates as a matrix
# ---------------------------------------------------------------
# Reminder:
# - Ensure categorical variables are one-hot encoded
# (dummy variables) prior to inclusion.
# ---------------------------------------------------------------
X <- df_obs %>%
dplyr::select(starts_with("X."))%>%
as.matrix()
# ---------------------------------------------------------------
# Optional: Normalize covariates
# Applying phi(X) rescales each column of the covariate matrix X
# (e.g., centering and scaling) to improve numerical stability and
# ensure comparability across features.
# ---------------------------------------------------------------
# X <- phi(X)
# Binary treatment assignment (0 = control, 1 = treated)
A <- df_obs$Treatment
# Primary outcome, assumed to be scaled to [0, 1]
Y <- df_obs$Y
# Note: For a normalization of the primary outcome, you can use:
# Y <- (df_obs$Y - min(df_obs$Y))/(max(df_obs$Y)-min(df_obs$Y))
# Binary indicator of adverse events
# (0 = No adverse event, 1 = adverse event)
Xi <- df_obs$XiWe now define the parameters to be passed to
main_algorithm(). These include the learners used in
SuperLearner estimation, the directory for saving results,
and tuning parameters that control constraint enforcement and
optimization precision.
# SuperLearner library used for estimating nuisance functions
# You can customize this list. The default includes:
# - SL.mean: simple mean predictor
# - SL.glm: generalized linear model
# - SL.ranger: random forest
# - SL.grf: generalized random forest
# - SL.xgboost: gradient boosting
SL.library <- c("SL.mean", "SL.glm", "SL.ranger", "SL.grf", "SL.xgboost")
# Insert a root directory where output files, results and images will be saved
root.path <- "results"
# An increasing sequence of non-negative numeric scalars controlling
# the penalty for violating the constraint (seq(1,8, by=1) by default).
Lambda_candidates <- seq(1, 10, by=1)
# A vector of non-negative scalars controlling the sharpness of the
# treatment probability function (c(0, 0.05, 0.1, 0.25, 0.5) by default)
Beta_candidates <- c(0, 0.05, 0.1, 0.25, 0.5)
# Constraint tolerance (alpha): maximum allowed increase in adverse event risk
# Example: 0.2 means the policy may increase adverse event probability by up to 20%
alpha.tolerance <- 0.2
# Optimization precision: Controls the granularity of the policy search.
# Smaller values lead to finer approximations but require more computation (default is 0.025).
# For a quick toy example, you may prefer using 0.05 instead.
optimization.precision <- 0.025
# Parameter indicating whether to center the treatment policy to ensure that
# probability of treatment assignment for null treatment effect is 0.5.
centered <- FALSEWith all variables and parameters defined, we now apply the
main_algorithm() from the PLUCR package. This
function executes the full estimation pipeline, consisting of the
following steps:
1- Estimation of nuisance components using the
SuperLearner ensemble framework
2- Learning a treatment policy that maximizes the estimated policy value while enforcing the estimated constraint on the probability of adverse events. This is accomplished through an alternating optimization procedure that corrects the baseline estimator of the objective function.
The output of this procedure includes:
-A constrained-optimal policy, which achieves optimality while satisfying the pre-specified constraint on the expected rate of adverse events.
# ===============================================================
# Run the Main Algorithm to Learn the Optimal Treatment Rule
# ===============================================================
output <- PLUCR::main_algorithm(X=X, A=A, Y=Y, Xi=Xi,
Lambdas = Lambda_candidates,
alpha=alpha.tolerance,
B= Beta_candidates,
precision=optimization.precision,
centered=centered,
SL.library = SL.library,
root.path=root.path)
# ---------------------------------------------------------------
# Extract learned parameters from output
# Depending on configuration, the algorithm may return:
# - A list with (1) initial estimate theta_0 and (2) final estimate theta_final
# - A single theta object if constraint satisfied at lambda=0
# ---------------------------------------------------------------
if(is.list(output)){
theta_0 <- output[[1]]
theta_final <- output[[2]]
}else{
theta_final <- output
}The main function returns a list of matrices (theta_0
and theta_final), or theta_0 alone. These
matrices are used to construct the optimal treatment rule in two steps,
as follows:
# ===============================================================
# Derive the Optimal Treatment Rule from Final Parameters
# ===============================================================
# ---------------------------------------------------------------
# Step 1. Construct the decision function `psi` using the `make_psi`
# It maps covariates X to [-1,1] scores for treatment assignment.
# ---------------------------------------------------------------
psi_final <- PLUCR::make_psi(theta_final)
# Obtain the optimal treatment rule by applying `sigma_beta` to the optimal `beta` associated to `theta_final`
# ---------------------------------------------------------------
# Step 2. Translate `psi(X)` into treatment probabilities
# Apply the treatment probability function `sigma_beta`,
# controlled by beta.
# ---------------------------------------------------------------
optimal_treatment_probability <- PLUCR::sigma_beta(psi_final(X), beta=attr(theta_final, "beta"))# ===============================================================
# Evaluate the Optimal Treatment Rule using normalized data
# ===============================================================
# ---------------------------------------------------------------
# Step 1. Load pre-required objects (algorithm outputs)
# ---------------------------------------------------------------
folds <- readRDS(file.path(root.path,"Folds","folds.rds")) # Folds for splitting data
mu.hat.nj <- readRDS(file.path(root.path,"Mu.hat","mu.hat.nj.rds")) # Primary outcome model
nu.hat.nj <- readRDS(file.path(root.path,"Nu.hat","nu.hat.nj.rds")) # Adverse event model
ps.hat.nj <- readRDS(file.path(root.path,"PS.hat","ps.hat.nj.rds")) # Propensity score model
# ---------------------------------------------------------------
# Step 2. Extract the test set for policy evaluation
# (thereby ensuring unbiased estimates of treatment performance)
# ---------------------------------------------------------------
X_test <- X[folds[[3]],]
A_test <- A[folds[[3]]]
Y_test <- Y[folds[[3]]]
Xi_test <- Xi[folds[[3]]]
# ---------------------------------------------------------------
# Step 3. Extract test nuisance functions
# ---------------------------------------------------------------
mu0_test <- function(a,x){mu.hat.nj(a,x,3)}
nu0_test <- function(a,x){nu.hat.nj(a,x,3)}
prop_score_test <- function(a,x){ps.hat.nj(a,x,3)}
# ---------------------------------------------------------------
# Step 4. Evaluate the learned optimal treatment rule
# ---------------------------------------------------------------
eval_plucr <- process_results(theta=theta_final, X=X_test, A=A_test, Y=Y_test,
Xi=Xi_test, mu0=mu0_test, nu0=nu0_test,
prop_score=prop_score_test,
lambda=attr(theta_final, "lambda"),
alpha=alpha.tolerance,
beta=attr(theta_final, "beta"),
centered=centered)
# ---------------------------------------------------------------
# Optional: Oracular evaluation (synthetic scenarios only)
# Uses the known data-generating process.
# ---------------------------------------------------------------
# Extract oracular features (only available in synthetic settings)
df_complete <- exp[[1]] # Complete data including counterfactuals c(Y(0),Y(1), xi(0), xi(1))
delta_mu <- exp[[3]] # Oracular delta_mu function (treatment effect over Y conditioned on X)
delta_nu <- exp[[4]] # Oracular delta_nu function (treatment effect over xi conditioned on X)
# Oracular evaluation
eval_plucr_oracular <- PLUCR::oracular_process_results(theta=theta_final, ncov=ncov,
scenario_mu = scenario_mu,
scenario_nu = scenario_nu,
lambda = attr(theta_final, "lambda"),
alpha = alpha.tolerance,
beta= attr(theta_final, "beta"))We can further learn a threshold that translates the treatment probabilities into binary choices.
# ===============================================================
# Learn Decision Threshold and Derive Treatment Assignments
# ===============================================================
# ---------------------------------------------------------------
# Step 1. Learn the decision threshold t
# ---------------------------------------------------------------
t <- learn_threshold(theta_final, X, A, Y, Xi, folds, alpha.tolerance)
# ---------------------------------------------------------------
# Step 2. Convert treatment probabilities into binary decisions
# Assign treatment if the individualized probability exceeds t.
# ---------------------------------------------------------------
optimal_treatment_decision <- ifelse(optimal_treatment_probability>t, 1, 0)For non-normalized data, we can as well provide an evaluation in data scale for both probabilistic policy and its binary counterpartt
# ===============================================================
# Evaluate the Optimal Treatment Rule using non-normalized data
# ===============================================================
# ---------------------------------------------------------------
# Step 1. Train pre-required objects (algorithm outputs)
# ---------------------------------------------------------------
Y_non_normalized <- # Insert non scaled primary outcome
covariates <- # Insert X prior to `phi(X)`
mu.hat.nj <- estimate_real_valued_mu(Y= Y_non_normalized, A= A, X=covariates, folds, SL.library = SL.library, V=2L)
nu.hat.nj <- estimate_nu(Xi= Xi, A= A, X=covariates, folds, SL.library = SL.library, V=2L, threshold = 0.01)
# Nuisances trained on testing subset for unbiased evaluation
mu0_test <- function(a,x){mu.hat.nj(a,x,ff=3)}
nu0_test <- function(a,x){nu.hat.nj(a,x,ff=3)}
# Data testing subset
covariates_test <- covariates[folds[[3]],]
Y_non_test <- Y_non_normalized[folds[[3]]]
proba_pi <- optimal_treatment_probability[folds[[3]]]
pi_decision <- optimal_treatment_decision[folds[[3]]]
# Step 2. Evaluate probabilistic policy and derived decision
# ---------------------------------------------------------------
# Evaluation of optimal treatment probabilities
policy_value_proba <- V_Pn(proba_pi,
y1 = mu0_test(a = rep(1, nrow(covariates_test)), covariates_test),
y0 = mu0_test(a = rep(0, nrow(covariates_test)), covariates_test))
constraint_value_proba <- mean(proba_pi*(nu0_test(a = rep(1, nrow(covariates_test)), covariates_test) - nu0_test(a = rep(0, nrow(covariates_test)), covariates_test))) - alpha.tolerance
print(c(policy_value_proba, constraint_value_proba))
# Evaluation of optimal treatment decisions derived from probabilities
policy_value_decision <- V_Pn(pi_decision,
y1 = mu0_test(a = rep(1, nrow(covariates_test)), covariates_test),
y0 = mu0_test(a = rep(0, nrow(covariates_test)), covariates_test))
constraint_value_decision <- mean(pi_decision*(nu0_test(a = rep(1, nrow(covariates_test)), covariates_test) - nu0_test(a = rep(0, nrow(covariates_test)), covariates_test))) - alpha.tolerance
print(c(policy_value_decision, constraint_value_decision))As an alternative, we can apply the
naive_approach_algorithm, which relies on baseline nuisance
parameter estimators and omits the correction step implemented in
main_algorithm(). This function assumes that the nuisance
components—specifically the outcome regression, the adverse event
regression and the propensity score model, have already been estimated
and saved in their respective folders (Mu.hat,
Nu.hat, PS.hat).
Using these pre-computed components, the algorithm directly learns a treatment policy that, maximizes the estimated policy value, and enforces the estimated constraint on the probability of adverse events, using the baseline nuisance estimates.
The output includes:
An unconstrained policy, which maximizes the primary outcome without regard to adverse events.
A constrained-optimal policy, which satisfies the specified constraint on adverse event probability using the uncorrected nuisance estimators.
# ===============================================================
# Run the Naive Algorithm to Learn the Optimal Treatment Rule
# ===============================================================
naive_output <- naive_approach_algorithm(X=X, A=A, Y=Y, Xi=Xi,
folds= folds,
mu0_train = mu0_train,
mu0_test = mu0_test,
nu0_train = nu0_train,
nu0_test = nu0_test,
prop_score_train = prop_score_train,
prop_score_test = prop_score_test,
alpha = alpha.tolerance,
centered=centered,
precision=optimization.precision,
root.path = root.path)
# ---------------------------------------------------------------
# Extract learned parameters from output
# Depending on configuration, the algorithm may return:
# - A list with (1) initial estimate theta_0 and (2) final estimate theta_final
# - A single theta object if constraint satisfied at lambda=0
# ---------------------------------------------------------------
if(is.list(naive_output)){
theta_naive <- naive_output[[2]]
}else{
theta_naive <- naive_output[[1]]
}
# =================================================================
# Evaluate the Optimal Treatment Rule according to Naive Approach
# =================================================================
eval_naive <- process_results(theta=theta_naive, X=X_test, A=A_test, Y=Y_test,
Xi=Xi_test, mu0=mu0_test, nu0=nu0_test,
prop_score=prop_score_test,
lambda=attr(theta_naive, "lambda"),
alpha=alpha.tolerance,
beta=attr(theta_naive, "beta"),
centered=centered)
# ---------------------------------------------------------------
# Optional: Oracular evaluation (synthetic scenarios only)
# Uses the known data-generating process.
# ---------------------------------------------------------------
eval_naive_oracular <- PLUCR::oracular_process_results(theta=theta_naive, ncov=ncov,
scenario_mu = scenario_mu,
scenario_nu = scenario_nu,
lambda = attr(theta_naive, "lambda"),
alpha = alpha.tolerance,
beta= attr(theta_naive, "beta"))In this illustrative setting, we operate under a controlled,
simulated environment in which the true data-generating mechanisms,
denoted by delta_mu (for the primary outcome) and
delta_nu (for the adverse event process), are fully known.
This allows us to implement an oracular benchmark using the
oracular_approach_algorithm.
Unlike real-world applications, where these nuisance components must be estimated from data, the oracular approach bypasses this step and leverages the known functions directly. As such, it provides a useful reference point for evaluating the performance of data-driven methods. The algorithm computes a treatment policy that maximizes the expected primary outcome, and satisfies a constraint on the expected rate of adverse events, based on the true underlying mechanisms.
The output includes:
An unconstrained optimal policy, derived solely from the objective of maximizing the primary outcome.
A constrained-optimal policy, which enforces the specified constraint on adverse event probability using the known functions.
This approach serves as a theoretical upper bound on performance and offers a meaningful benchmark for assessing the quality of both the naïve and corrected estimators presented earlier.
# ===============================================================
# Run the Orcular Algorithm to Learn the Optimal Treatment Rule
# ===============================================================
oracular_output <- oracular_approach_algorithm(X=X, A=A, Y=Y, Xi=Xi,
folds=folds, ncov=ncov,
delta_Mu=delta_mu, delta_Nu= delta_nu,
alpha = alpha.tolerance,
centered=centered,
precision = optimization.precision,
scenario_mu=scenario_mu,
scenario_nu=scenario_nu,
root.path = root.path)
# ---------------------------------------------------------------
# Extract learned parameters from output
# Depending on configuration, the algorithm may return:
# - A list with (1) initial estimate theta_0 and (2) final estimate theta_final
# - A single theta object if constraint satisfied at lambda=0
# ---------------------------------------------------------------
if(is.list(oracular_output)){
theta_oracular <- oracular_output[[2]]
}else{
theta_oracular <- oracular_output[[1]]
}
# =================================================================
# Evaluate oracularly the Optimal Treatment Rule
# =================================================================
eval_oracular <- oracular_process_results(theta=theta_oracular, ncov=ncov,
scenario_mu = scenario_mu,
scenario_nu = scenario_nu,
lambda = attr(theta_oracular, "lambda"),
alpha = alpha.tolerance,
beta= attr(theta_oracular, "beta"))The function synthetic_data_plot() function provides an
initial visualization of the treatment effects for both the primary
outcome and the adverse event across the covariate space.
# ---------------------------------------------------------------
# Plot treatment effect functions:
# - delta_Mu: for primary outcome function
# - delta_Nu: for adverse event function
# The plot helps to interpret how treatment impacts outcomes
# under the chosen scenario.
# ---------------------------------------------------------------
fig <- PLUCR::synthetic_data_plot(delta_Mu=delta_mu,
delta_Nu=delta_nu,
root.path=root.path,
name=paste0(scenario_mu,"-", scenario_nu))
print(fig)In addition, we can visualize the learned treatment policies by
plotting treatment probabilities against selected covariates that allow
for meaningful two-dimensional representation, using the function
visual_treatment_plot.
# ===============================================================
# Visualize Learned Treatment Policies:
# Plot individualized treatment probabilities as a function of
# two selected covariates (X.1 and X.2) to enable 2-D visualization
# of the treatment probabilities.
# ===============================================================
if(is.list(output)){
PLUCR::visual_treatment_plot(make_psi(theta_0)(X),
lambda = attr(theta_0, "lambda"),
beta = attr(theta_0, "beta"),
centered = centered,
Var_X_axis = df_obs$X.1, Var_Y_axis = df_obs$X.2,
root.path = root.path, name = "Initial")
}
PLUCR::visual_treatment_plot(make_psi(theta_naive)(X),
lambda = attr(theta_naive, "lambda"),
beta = attr(theta_naive, "beta"),
centered = centered,
Var_X_axis = df_obs$X.1, Var_Y_axis = df_obs$X.2,
root.path = root.path, name = "Naive")
PLUCR::visual_treatment_plot(make_psi(theta_final)(X),
lambda = attr(theta_final, "lambda"),
beta = attr(theta_final, "beta"),
centered = centered,
Var_X_axis = df_obs$X.1, Var_Y_axis = df_obs$X.2,
root.path = root.path, name = "Final")
PLUCR::visual_treatment_plot(make_psi(theta_oracular)(X),
lambda = attr(theta_oracular, "lambda"),
beta = attr(theta_oracular, "beta"),
centered = centered,
Var_X_axis = df_obs$X.1, Var_Y_axis = df_obs$X.2,
root.path = root.path, name = "Oracular")Finally, we compare treatment recommendations across approaches by evaluating their estimated policy values and associated constraint violations.
# ===============================================================
# Visualize metrics for different outcomes
# ===============================================================
eval_0 <- process_results(theta_0, X_test, A_test, Y_test, Xi_test, mu0_test, nu0_test, prop_score_test, lambda=attr(theta_0, "lambda"), alpha.tolerance, beta=attr(theta_0, "beta"), centered)[[1]]
if(is.list(output)){
data <- tibble(
method = c("Theta.0", "Theta.naive.PLUCR", "Theta.PLUCR", "Theta.Oracle.PLCUR"),
policy_value = c(eval_0$policy_value,
eval_naive_oracular$policy_value,
eval_plucr_oracular$policy_value,
eval_oracular$policy_value),
constraint = c(eval_0$constraint,
eval_naive_oracular$constraint,
eval_plucr_oracular$constraint,
eval_oracular$constraint))
}else{
data <- tibble(
method = c("Theta_naive", "Theta_final", "Theta_oracular"),
policy_value = c(eval_naive_oracular$policy_value,
eval_plucr_oracular$policy_value,
eval_oracular$policy_value),
constraint = c(eval_naive_oracular$constraint,
eval_plucr_oracular$constraint,
eval_oracular$constraint))
}
# Call the function
plot_metric_comparison(data, metrics = select(data, -"method") %>% colnames(),
techniques = data$method, root.path = root.path)In a simulated setting where the true data-generating mechanisms are
known, we can evaluate the oracle performance of each parameter value,
theta_naive, theta_0, and
theta_final, by computing their corresponding outcomes
using the oracular_process_results() function, as
demonstrated for theta_oracular.
Several binary treatment decisions can be derived from a single treatment probability recommendation. For instance, one may draw individual treatment assignments from a Bernoulli distribution with conditional mean given by the learned treatment probabilities. Alternatively, a decision threshold can be introduced, above which patients are assigned treatment and below which they are not.
The main_algorithm() function is designed to abstract
away the technical details while still allowing flexibility through its
parameters. Internally, it carries out a constrained policy learning
procedure, broken down into three main stages, each performed on a
different fold.
Before estimating nuisance functions and optimizing the policy, the algorithm performs three important preliminary tasks:
Creates the folds where results will be saved.
Verifies that the input variables (X, A, Y and Xi) meet the required format and assumptions, such as outcome ranges, binary treatment/adverse event values.
Creates 3 cross-validation folds to enable cross-fitting.
Once the variables have been validated and the cross-fitting folds are defined, the algorithm proceeds to estimate the nuisance parameters. These models are each trained using the pre-defined folds to enable cross-fitting, which improves robustness and reduces overfitting.
The key nuisance parameters estimated are:
Each model is fit separately and the estimates are saved for later use in policy learning.
# Estimate primary outcome model E[Y | A, X]
mu.hat.nj <- PLUCR::estimate_mu(
Y = Y, A = A, X = X, folds = folds,
SL.library = SL.library, V = 2L
)
saveRDS(mu.hat.nj, file = file.path(root.path, "Mu.hat", "mu.hat.nj.rds"))
# Estimate adverse event model E[Xi | A, X]
nu.hat.nj <- PLUCR::estimate_nu(
Xi = Xi, A = A, X = X, folds = folds,
SL.library = SL.library, V = 2L
)
saveRDS(nu.hat.nj, file = file.path(root.path, "Nu.hat", "nu.hat.nj.rds"))
# Estimate propensity score model P[A | X]
ps.hat.nj <- PLUCR::estimate_ps(
A = A, X = X, folds = folds,
SL.library = SL.library, V = 2L
)
saveRDS(ps.hat.nj, file = file.path(root.path, "PS.hat", "ps.hat.nj.rds"))After estimating the nuisance parameters, the next step is to organize the data into training and testing sets. This allows us to:
# Training functions & data (first and second folds)
# functions (first fold)
mu0_train <- function(a,x){mu.hat.nj(a=a,x=x,ff=1)}
nu0_train <- function(a,x){nu.hat.nj(a=a,x=x,ff=1)}
prop_score_train <- function(a,x){ps.hat.nj(a,x,1)}
# data (second fold)
X_train <- X[folds[[2]],]
A_train <- A[folds[[2]]]
Y_train <- Y[folds[[2]]]
Xi_train <- Xi[folds[[2]]]
# Testing functions & data (third fold)
# functions
mu0_test <- function(a,x){mu.hat.nj(a,x,3)}
nu0_test <- function(a,x){nu.hat.nj(a,x,3)}
prop_score_test <- function(a,x){ps.hat.nj(a,x,3)}
# data (third fold)
X_test <- X[folds[[3]],]
A_test <- A[folds[[3]]]
Y_test <- Y[folds[[3]]]
Xi_test <- Xi[folds[[3]]]In this step, we run the core algorithm of the PLUCR
package — the alternating optimization procedure
implemented in the Optimization_Estimation() function. This
procedure learns an optimal treatment policy that maximizes the expected
primary outcome while controlling the probability of adverse events.
The procedure is executed over a grid of \((\lambda, \beta)\) values, where:
Before evaluating the entire grid, the algorithm first solves the problem with \(\lambda = 0\) in order to check whether the unconstrained optimal policy already satisfies the constraint. If it does, no further optimization over the grid is necessary.
During the search over \(\lambda\), once the constraint is satisfied for the first time, the procedure saves the corresponding results at each \(\beta\) value. The search over \(\lambda\) then stops when the upper bound of the confidence interval for the constraint value drops below zero.
Alternating Optimization Procedure
Whether with \(\lambda = 0\) or \(\lambda > 0\), the procedure iterates as follows:
Initial estimation of the plug-in objective function \(L(\psi)\) using \(\Delta\mu_n^k\) and \(\Delta\nu_n^k\) derived from,
mu0_train).nu0_train).Policy optimization: Frank- Wolfe algorithm Minimize \(L^k_n(\psi)\) to obtain a new candidate policy \(\psi_k\).
Update nuisance parameters:
update_mu_XA, update_nu_XA).Repeat until convergence or until the maximum number of iterations is reached.
# Start by testing the unconstrained case with lambda = 0
saved <- FALSE # Flag to track whether a valid result has already been saved
combinations <- NULL # Matrix to store (beta, lambda) pairs that satisfy the constraint
beta_0 <- NULL # Arbitrary beta value since constraint is not evaluated yet
lambda <- 0
# Run optimization with lambda = 0 (no constraint)
out <- PLUCR::Optimization_Estimation(mu0=mu0_train, nu0=nu0_train,
prop_score=prop_score_train,
X=X_train, A=A_train, Y=Y_train, Xi=Xi_train,
lambda=0, alpha=alpha, precision=precision, beta=0.05,
centered=centered, tol=tol, max_iter=max_iter)
# Extract the final policy from the iteration
theta_0 <- out$theta_collection[[length(out$theta_collection)]]
attr(theta_0, "lambda") <- 0 # Save the lambda as an attribute
# Evaluate the policy on the test data for all possible betas
for (beta in B){
res_0 <- process_results(theta_0, X_test, A_test, Y_test, Xi_test,
mu0_test, nu0_test, prop_score_test, lambda=0,
alpha, beta, centered)
# Loop to check constraint satisfaction
if (res_0[[1]]$constraint < 0) {
min_constraint_lambda0 <- res_0[[1]]$constraint
if (res_0[[1]]$lwr_bound_policy_value > max_policy_value) {
beta_0 <- beta
max_policy_value <- res_0[[1]]$lwr_bound_policy_value
saveRDS(theta_0, file = file.path(root.path, "Theta_opt", paste0(beta, "_", 0, ".rds")))
attr(theta_0, "lambda") <- lambda
attr(theta_0, "beta") <- beta
saveRDS(out, file=file.path(root.path,"Intermediate",paste0(beta,"_",0,".rds")))
saveRDS(res_0[[1]], file=file.path(root.path,"Evaluation",paste0(beta,"_",0,".rds")))
combinations <- rbind(combinations, c(beta_0, 0))
saved <- TRUE}
}else{
if(res_0[[1]]$constraint<min_constraint_lambda0){
min_constraint_lambda0 <- res_0[[1]]$constraint
beta_0 <- beta
}
}
if(res_0[[2]]<0){
warning(sprintf(paste("The constraint was already satisfied for lambda=0.")))
attr(theta_0, "beta") <- beta_0
return(theta_0)
}
}
attr(theta_0, "beta") <- beta_0 # Save the beta as an attribute
# Begin training over the grid: find for each beta value the optimal lambda
for (beta in B){
saved <- FALSE
for (lambda in Lambdas){
# Run alternated procedure for each beta and lambda combination
out <- PLUCR::Optimization_Estimation(mu0=mu0_train, nu0=nu0_train,
prop_score=prop_score_train,
X=X_train, A=A_train, Y=Y_train, Xi=Xi_train,
lambda=lambda, alpha=alpha, precision=precision,
beta=beta, centered=centered,
tol=tol, max_iter=max_iter)
# Extract final policy
theta_opt <- out$theta_collection[[length(out$theta_collection)]]
### Evaluating
res <- process_results(theta_opt, X_test, A_test, Y_test, Xi_test,
mu0_test, nu0_test, prop_score_test, lambda,
alpha, beta, centered)
if (!saved && res[[1]]$constraint < 0) {
saveRDS(res[[1]], file=file.path(root.path,"Evaluation",
paste0(beta, "_", lambda,".rds")))
saveRDS(out, file=file.path(root.path,"Intermediate",
paste0(beta,"_",lambda,".rds")))
saveRDS(theta_opt, file = file.path(root.path, "Theta_opt",
paste0(beta, "_", lambda, ".rds")))
attr(theta_opt, "lambda") <- lambda
attr(theta_opt, "beta") <- beta
saved <- TRUE
combinations <- rbind(combinations, c(beta, lambda))
}
# Stop search if constraint's upper CI bound is negative
if(res[[2]]<0){
break
}
}
}
# Select the optimal combination (beta, lambda)
files_del <- file.path(root.path,"Intermediate")
unlink(files_del, recursive = TRUE)
# Select the optimal combination (beta, lambda)
optimal_combination <- get_opt_beta_lambda(combinations,root.path)
beta <- optimal_combination$beta
lambda <- optimal_combination$lambda
theta_keep <- paste0(beta, "_", lambda, ".rds")
# Delete unwanted files in Theta_opt
theta_files <- list.files(file.path(root.path, "Theta_opt"))
theta_to_delete <- theta_files[basename(theta_files) != theta_keep]
file.remove(file.path(root.path,"Theta_opt",theta_to_delete))
# Delete unwanted files in Evaluation
eval_files <- list.files(file.path(root.path, "Evaluation"))
eval_to_delete <- eval_files[basename(eval_files) != theta_keep]
file.remove(file.path(root.path,"Evaluation", eval_to_delete))
# Load the corresponding theta for the optimal (beta, lambda) combination
theta_final <- readRDS(file = file.path(root.path, "Theta_opt",
paste0(optimal_combination$beta, "_",
optimal_combination$lambda, ".rds")))
# Save the optimal combiation (lambda, beta) as an attribute
attr(theta_final, "lambda") <- optimal_combination$lambda
attr(theta_final, "beta") <- optimal_combination$beta
# Compute the optimal treatment probabilities with sigma_beta
psi_values <- make_psi(theta_final)(X_test)
optimal_treatment_rule <- sigma_beta(psi_values, beta = attr(theta_final, "beta"))
# Calculate proportion over 0.5 and warn the user if the treatment probabilities are too low.
prop_over_0.50 <- mean(optimal_treatment_rule > 0.5)
if(prop_over_0.50<0.1){
warning(sprintf(
paste("Only %.1f%% of the test set has an optimal treatment probability above 0.5.",
"This may indicate that your tolerance for adverse events (alpha) is too strict.",
"Consider relaxing it if treatment is being under-assigned."), 100 * prop_over_0.50))
}
print(list(theta_0, theta_final))