Getting Started with smsncut

Renato de Paula, Helena Mouriño, Tiago Dias Domingues

2026-05-14

Overview

smsncut implements the parametric decision-theoretic framework for optimal diagnostic cutoff selection under scale mixtures of skew-normal (SMSN) distributions, described in:

de Paula, R., Mouriño, H., and Dias Domingues, T. (2026). Parametric ROC analysis and optimal cutoff selection under scale mixtures of skew-normal distributions: a decision-theoretic framework with asymptotic inference. arXiv:2605.07829. https://doi.org/10.48550/arXiv.2605.07829.

The package supports the skew-normal (SN) and skew-t (ST) families and provides tools for model fitting, ROC analysis, optimal cutoff estimation, asymptotic inference, and Monte Carlo validation.


1. Unconstrained parametrisation

All functions use the unconstrained vector:

Family theta Natural parameters
SN c(xi, log(omega), alpha) location, scale, skewness
ST c(xi, log(omega), alpha, log(nu-2)) + degrees of freedom
theta_sn <- c(0, log(1), 1.5)
smsn_density(c(-1, 0, 1, 2), theta_sn, "SN")
#> [1] 0.03233077 0.39894228 0.45161068 0.10783617

theta_st <- c(0, log(1), 1.5, log(8 - 2))
smsn_density(c(-1, 0, 1, 2), theta_st, "ST")
#> [1] 0.03820408 0.38669902 0.41701108 0.12293816

2. Model fitting

smsn_fit uses BFGS for the MLE and Richardson extrapolation (via numDeriv::hessian) for the observed Fisher information matrix, which is critical for accurate variance estimation (de Paula et al., 2026, Section 6.2).

set.seed(42)
x0 <- sn::rst(300, xi = 0, omega = 1, alpha = 1, nu = 8)
x1 <- sn::rsn(300, xi = 2, omega = 1, alpha = 1.5)

fit0 <- smsn_fit(x0, family = "ST")
fit1 <- smsn_fit(x1, family = "SN")
cat("Group 0 (ST) MLE:", round(fit0$par, 3), "\n")
#> Group 0 (ST) MLE: -0.224 0.114 1.619 2.648
cat("Group 1 (SN) MLE:", round(fit1$par, 3), "\n")
#> Group 1 (SN) MLE: 2.085 -0.079 1.198

3. Admissible interval and boundary conditions

ab <- cutoff_admissible_interval(
  fit0$par, fit1$par, family0 = "ST", family1 = "SN"
)
cat("Interval: [", round(ab["a"], 3), ",", round(ab["b"], 3), "]\n")
#> Interval: [ -1.457 , 4.678 ]

bc <- cutoff_boundary_check(
  ab["a"], ab["b"], fit0$par, fit1$par, "ST", "SN",
  lambda0 = 1, lambda1 = 3, pi0 = 0.9, pi1 = 0.1
)
cat("Boundary conditions satisfied:", bc$satisfied, "\n")
#> Boundary conditions satisfied: TRUE

4. Optimal cutoff

res <- cutoff_optimal(
  fit0$par, fit1$par, "ST", "SN",
  lambda0 = 1, lambda1 = 3, pi0 = 0.9, pi1 = 0.1,
  interval = ab
)
cat("Optimal cutoff c*:", round(res$cutoff, 4), "\n")
#> Optimal cutoff c*: 1.9301
cat("Risk at c*:       ", round(res$risk,   4), "\n")
#> Risk at c*:        0.1131
cat("Youden cutoff:    ", round(cutoff_youden(fit0$par, fit1$par, "ST", "SN",
                                               interval = ab), 4), "\n")
#> Youden cutoff:     1.5845

5. Local identifiability diagnostic

diag_val <- cutoff_identifiability(
  res$cutoff, fit0$par, fit1$par, "ST", "SN",
  lambda0 = 1, lambda1 = 3, pi0 = 0.9, pi1 = 0.1
)
cat("|d phi/dc| at c* =", round(diag_val, 4), "\n")
#> |d phi/dc| at c* = 0.3068

6. Asymptotic variance and Wald confidence interval

n0 <- length(x0); n1 <- length(x1)

V_hat <- cutoff_variance(
  res$cutoff, fit0$par, fit1$par, "ST", "SN",
  lambda0 = 1, lambda1 = 3, pi0 = 0.9, pi1 = 0.1,
  hess0 = fit0$hessian, hess1 = fit1$hessian, n0 = n0, n1 = n1
)

ci <- cutoff_ci(res$cutoff, V_hat, n = n0 + n1)
cat("95% Wald CI: [", round(ci["lower"], 4), ",", round(ci["upper"], 4), "]\n")
#> 95% Wald CI: [ 1.829 , 2.0312 ]

7. ROC curve and AUC

auc_val <- smsn_auc(fit0$par, fit1$par, "ST", "SN")
cat("Estimated AUC:", round(auc_val, 4), "\n")
#> Estimated AUC: 0.9594

r_grid <- seq(0.01, 0.99, by = 0.01)
tpr    <- smsn_roc(r_grid, fit0$par, fit1$par, "ST", "SN")
plot(r_grid, tpr, type = "l", lwd = 2,
     xlab = "False positive rate", ylab = "True positive rate",
     main = paste0("Parametric ROC (AUC = ", round(auc_val, 3), ")"))
abline(0, 1, lty = 2, col = "grey60")


8. Reproducing Table 3 (Scenario SN1, n = 200)

set.seed(123)
result <- mc_simulate_scenario(
  theta0_true = c(0, log(1), 1),  theta1_true = c(2, log(1), 1.5),
  family0 = "SN", family1 = "SN", c_true = 1.6101,
  lambda0 = 1, lambda1 = 1, pi0 = 0.5, pi1 = 0.5,
  n0 = 100L, n1 = 100L, B = 2000L  # use B=2000 for paper results
)

cat("Success rate:", result$success_rate, "\n")
cat("Mean c_hat:  ", round(result$mean_c_hat, 4), "\n")
cat("Bias:        ", round(result$bias,       4), "\n")
cat("RMSE:        ", round(result$rmse,       4), "\n")
cat("Coverage:    ", round(result$coverage,   3), "\n")  
cat("Ratio:       ", round(result$ratio,      3), "\n")  
cat("Bracket rate:", round(result$bracket_rate, 3), "\n")