Getting Started with PFCI

Overview

The PFCI package implements Penalized Fast Causal Inference, a two-stage procedure for learning causal structure in high-dimensional settings with potential latent variables and selection bias. The method combines graphical lasso screening with the FCI algorithm to produce a Partial Ancestral Graph (PAG) that is substantially faster than standard FCI/RFCI while maintaining accuracy under sparsity.

Installation

PFCI is available on CRAN. It requires pcalg and graph from Bioconductor for its core functionality:

install.packages("PFCI")

# Required Bioconductor dependencies
install.packages("BiocManager")
BiocManager::install(c("pcalg", "graph", "RBGL", "Rgraphviz"))

Basic workflow

The standard three-step workflow is simulate, fit, evaluate:

library(PFCI)

# Step 1: simulate a sparse DAG with p = 100 nodes
sim <- simulate_pfci_toy(p = 100, n = 100, edge_prob = 0.02, seed = 1)

# Step 2: fit PFCI
fit <- pfci_fit(sim$X, alpha = 0.05)
print(fit)

# Step 3: evaluate against ground truth
met <- pfci_metrics(sim, fit)
met

The print(fit) call reports runtime and tuning parameters. The met list contains SHD, F1, MCC, Precision, Recall, and Time.

Plotting the PAG

plot_pag(fit)

Latent confounders

To simulate and evaluate under latent confounding use the simulate_with_latent and metrics_with_latent functions:

sim_lat <- simulate_with_latent(p_obs = 100, gamma = 0.05, n = 100,
                                seed_graph = 1, seed_data = 2)
fit_lat <- pfci_fit(sim_lat$X, alpha = 0.05)
metrics_with_latent(sim_lat, fit_lat)

Scaling behaviour

PFCI is approximately 3x faster than RFCI at p = 1000 while maintaining equal or better F1 and MCC. See Table 1 of Pal, Ghosh, and Yang (2025) for full simulation results across p = 100 to p = 1000.

Reference

Pal, S., Ghosh, D., and Yang, S. (2025). Penalized FCI for Causal Structure Learning in a Sparse DAG for Biomarker Discovery in Parkinson’s Disease. Annals of Applied Statistics. doi:10.48550/arXiv.2507.00173