INCVCommunityDetection implements Inductive
Node-Splitting Cross-Validation (INCV) for selecting the number
of communities in Stochastic Block Models (SBM). The package also
provides competing methods — CROISSANT, Edge
Cross-Validation (ECV), and Node Cross-Validation
(NCV) — for comprehensive model selection in network
analysis.
We start by generating a network from a planted-partition SBM with 3 communities, 150 nodes, within-community connection probability 0.5, and between-community probability 0.05.
library(INCVCommunityDetection)
set.seed(42)
net <- community.sim(k = 3, n = 150, n1 = 50, p = 0.5, q = 0.05)
table(net$membership)
#>
#> 1 2 3
#> 50 50 50The adjacency matrix is a 150 × 150 binary symmetric matrix:
dim(net$adjacency)
#> [1] 150 150
ord <- order(net$membership)
image(net$adjacency[ord, ord],
main = "Adjacency matrix (3-community SBM, reordered)",
xlab = "Node", ylab = "Node")The main function nscv.f.fold() partitions nodes into
f folds and uses spectral clustering on the training
subgraph. Held-out nodes are assigned to communities based on their
connections to training nodes, and the held-out negative log-likelihood
and MSE are computed.
result <- nscv.f.fold(net$adjacency, k.vec = 2:6, f = 5)
result$k.loss # K selected by neg-log-likelihood
#> [1] 3
result$k.mse # K selected by MSE
#> [1] 3We can inspect the full CV loss curve:
plot(2:6, result$cv.loss, type = "b", pch = 19,
xlab = "Number of communities (K)",
ylab = "CV Negative Log-Likelihood",
main = "INCV f-fold: CV loss by K")
abline(v = result$k.loss, lty = 2, col = "red")An alternative is to use repeated random node splits instead of fixed folds:
result2 <- nscv.random.split(net$adjacency, k.vec = 2:6,
split = 0.66, ite = 20)
result2$k.chosen
#> [1] 3plot(2:6, result2$cv.loss, type = "b", pch = 19,
xlab = "Number of communities (K)",
ylab = "CV Negative Log-Likelihood",
main = "INCV random-split: CV loss by K")
abline(v = result2$k.chosen, lty = 2, col = "red")ECV holds out random edges and evaluates the predictive fit of a blockmodel reconstruction. It jointly selects between SBM and DCBM.
| Method | Function | Splits | Selects K | Selects model type |
|---|---|---|---|---|
| INCV f-fold | nscv.f.fold() |
Nodes into f folds | Yes | No (SBM only) |
| INCV random | nscv.random.split() |
Random node split | Yes | No (SBM only) |
| ECV | ECV.for.blockmodel() |
Random edge holdout | Yes | Yes (SBM vs DCBM) |
| NCV | NCV.for.blockmodel() |
Node folds | Yes | Yes (SBM vs DCBM) |
| CROISSANT | croissant.blockmodel() |
Overlapping subsamples | Yes | Yes (SBM vs DCBM) |
The building blocks are also available directly:
For more realistic simulations, community.sim.sbm()
generates networks where block probabilities decay with community
distance:
sessionInfo()
#> R version 4.5.2 (2025-10-31)
#> Platform: x86_64-apple-darwin20
#> Running under: macOS Sonoma 14.6.1
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
#>
#> locale:
#> [1] C/en_US/en_US/C/en_US/en_US
#>
#> time zone: America/Los_Angeles
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] INCVCommunityDetection_0.1.0
#>
#> loaded via a namespace (and not attached):
#> [1] Matrix_1.7-4 mvnfast_0.2.8 gtable_0.3.6
#> [4] jsonlite_2.0.0 compiler_4.5.2 Rcpp_1.1.1
#> [7] slam_0.1-55 parallel_4.5.2 cluster_2.1.8.2
#> [10] jquerylib_0.1.4 scales_1.4.0 yaml_2.3.12
#> [13] fastmap_1.2.0 lattice_0.22-7 ggplot2_4.0.2
#> [16] R6_2.6.1 knitr_1.51 zigg_0.0.2
#> [19] bslib_0.10.0 RColorBrewer_1.1-3 rlang_1.1.7
#> [22] cachem_1.1.0 ClusterR_1.3.6 xfun_0.56
#> [25] sass_0.4.10 S7_0.2.1 RcppParallel_5.1.11-2
#> [28] otel_0.2.0 viridisLite_0.4.3 cli_3.6.5
#> [31] digest_0.6.39 grid_4.5.2 irlba_2.3.7
#> [34] gmp_0.7-5.1 mclust_6.1.2 lifecycle_1.0.5
#> [37] vctrs_0.7.1 Rfast_2.1.5.2 data.table_1.18.2.1
#> [40] IMIFA_2.2.0 RSpectra_0.16-2 evaluate_1.0.5
#> [43] glue_1.8.0 farver_2.1.2 rmarkdown_2.30
#> [46] matrixStats_1.5.0 tools_4.5.2 htmltools_0.5.9