| Title: | Logistic Biplot Estimation Using Machine Learning Algorithms |
| Version: | 1.1.1 |
| Date: | 2026-04-30 |
| Description: | Implements methods for fitting logistic biplot models to multivariate binary data. The logistic biplot represents individuals as points and binary variables as directed vectors in a low-dimensional subspace; the orthogonal projection of each individual onto a variable vector approximates the expected probability that the corresponding characteristic is present. Available fitting methods include conjugate gradient algorithms, a coordinate descent Majorization-Minimization (MM) algorithm, and a block coordinate descent algorithm based on data projection that supports matrices with missing values and allows new individuals to be projected as supplementary rows without refitting the model. A cross-validation procedure is provided to select the number of latent dimensions k. References: Babativa-Marquez and Vicente-Villardon (2021) <doi:10.3390/math9162015>; Vicente-Villardon and Galindo (2006, ISBN:9780470973196). |
| License: | MIT + file LICENSE |
| Encoding: | UTF-8 |
| LazyData: | true |
| RoxygenNote: | 7.3.3 |
| Depends: | R (≥ 4.1.0) |
| Imports: | optimx, RSpectra |
| Suggests: | testthat (≥ 3.0.0), knitr, rmarkdown, dplyr (≥ 1.0.0), tidyr (≥ 1.1.0), ggplot2 (≥ 3.3.2), ggrepel, pracma, mvtnorm |
| URL: | https://github.com/jgbabativam/BiplotML |
| BugReports: | https://github.com/jgbabativam/BiplotML/issues |
| NeedsCompilation: | no |
| Packaged: | 2026-05-03 14:09:14 UTC; giova |
| Author: | Jose Giovany Babativa-Marquez
|
| Maintainer: | Jose Giovany Babativa-Marquez <jgbabativam@unal.edu.co> |
| Repository: | CRAN |
| Date/Publication: | 2026-05-08 15:42:17 UTC |
Fit a Binary Logistic Biplot
Description
Estimates the intercept vector \mu, the row-marker matrix A, and the
column-marker matrix B of a logistic biplot model using the optimization
algorithm selected by the user.
Usage
LogBip(
x,
k = 5,
method = "MM",
type = NULL,
plot = TRUE,
maxit = NULL,
endsegm = 0.9,
label.ind = FALSE,
col.ind = NULL,
draw = c("biplot", "ind", "var"),
random_start = FALSE,
L = 0,
cv_LogBip = FALSE
)
Arguments
x |
A binary matrix (or a matrix with |
k |
Number of dimensions. Default is |
method |
Fitting algorithm. One of |
type |
Update formula for the conjugate gradient method: |
plot |
Logical; if |
maxit |
Maximum number of iterations. Defaults to |
endsegm |
End point of the variable segment on the probability scale.
The segment starts at 0.5 and ends at this value. Default is |
label.ind |
Logical; if |
col.ind |
Color for the row markers. Passed to |
draw |
Which graph to draw: |
random_start |
Logical; if |
L |
Ridge penalization parameter. Default is |
cv_LogBip |
Logical; indicates whether the function is being called
internally by |
Details
The following fitting methods are available:
Conjugate gradient (CG): Set method = "CG" and choose the update
formula via type:
-
type = 1— Fletcher–Reeves -
type = 2— Polak–Ribiere -
type = 3— Hestenes–Stiefel -
type = 4— Dai–Yuan
Coordinate descent MM: Set method = "MM" to use the iterative
coordinate descent Majorization-Minimization algorithm.
Projection-based algorithm (PDLB): Set method = "PDLB" when the
binary matrix contains missing values, or when the row coordinates of new
(supplementary) individuals need to be estimated without refitting the model.
See Babativa-Marquez & Vicente-Villardon (2022) for details.
BFGS: Set method = "BFGS" to use the Broyden–Fletcher–
Goldfarb–Shanno quasi-Newton method.
Value
An object of class BiplotML (a named list) containing:
AhatData frame of row-marker coordinates.
BhatData frame of column-marker coordinates, including the intercept column
bb0.methodCharacter string identifying the fitting method used.
loss_functionVector of loss-function values at each iteration (MM and PDLB methods only).
iterationsNumber of iterations performed (MM and PDLB methods only).
impute_xImputed binary matrix (PDLB method only).
Author(s)
Giovany Babativa <jgbabativam@unal.edu.co>
References
Babativa-Marquez, J. G., & Vicente-Villardon, J. L. (2026). Logistic biplot with missing data. In process.
Babativa-Marquez, J. G., & Vicente-Villardon, J. L. (2021). Logistic biplot by conjugate gradient algorithms and iterated SVD. Mathematics, 9(16), 2015. doi:10.3390/math9162015
Nash, J. C. (2011). Unifying optimization algorithms to aid software system users: optimx for R. Journal of Statistical Software, 43(9), 1–14.
Nash, J. C. (2014). On best practice optimization methods in R. Journal of Statistical Software, 60(2), 1–14.
Nocedal, J., & Wright, S. (2006). Numerical Optimization (2nd ed.). Springer.
Vicente-Villardon, J. L., & Galindo, M. P. (2006). Logistic biplots. In M. Greenacre & J. Blasius (Eds.), Multiple Correspondence Analysis and Related Methods (pp. 503–521). Chapman & Hall.
See Also
Examples
data("Methylation")
# Fit using the coordinate descent MM algorithm
res_MM <- LogBip(x = Methylation, method = "MM", maxit = 1000)
# Fit using the PDLB algorithm with simulated missing data
set.seed(12345)
n <- nrow(Methylation); p <- ncol(Methylation)
miss <- matrix(rbinom(n * p, 1, 0.2), n, p)
miss <- ifelse(miss == 1, NA, miss)
x_miss <- Methylation + miss
res_PDLB <- LogBip(x = x_miss, method = "PDLB", maxit = 1000)
DNA Methylation Binary Data
Description
A binary matrix of DNA methylation measurements for a sample of individuals. Each row represents an individual and each column a CpG site; a value of 1 indicates methylation and 0 indicates no methylation.
Usage
Methylation
Format
A binary matrix with 50 rows (individuals) and 13 columns (CpG sites).
Source
Publicly available methylation data used for illustrative purposes.
Examples
data("Methylation")
dim(Methylation)
Cross-Validation for Logistic Biplot
Description
Performs k-fold cross-validation for a logistic biplot model across a range of dimensions, enabling selection of the optimal number of latent dimensions.
Usage
cv_LogBip(
data,
k = 0:5,
K = 7,
method = "MM",
type = NULL,
plot = TRUE,
maxit = NULL
)
Arguments
data |
A binary matrix. |
k |
Integer vector of dimensions to evaluate. Default is |
K |
Number of folds. Default is |
method |
Fitting algorithm: |
type |
Update formula for the CG method (see |
plot |
Logical; if |
maxit |
Maximum number of iterations. Defaults to |
Value
A data frame with columns k, cv-error (mean
cross-validation error, in percent), and train-error (mean training
error, in percent).
Author(s)
Giovany Babativa <jgbabativam@unal.edu.co>
References
Bro, R., Kjeldahl, K., & Smilde, A. K. (2008). Cross-validation of component models: a critical look at current methods. Analytical and Bioanalytical Chemistry, 390(5), 1241–1251.
Wold, S. (1978). Cross-validatory estimation of the number of components in factor and principal components models. Technometrics, 20(4), 397–405.
See Also
LogBip, pred_LB, fitted_LB,
simBin
Examples
set.seed(1234)
x <- simBin(n = 100, p = 50, k = 3, D = 0.5, C = 20)
# Cross-validation using the MM algorithm
cv_MM <- cv_LogBip(data = x$X, k = 0:5, method = "MM", maxit = 1000)
# Cross-validation using the PDLB algorithm
cv_PB <- cv_LogBip(data = x$X, k = 0:5, method = "PDLB", maxit = 1000)
Fitted Values for a Logistic Biplot
Description
Computes the fitted (predicted) matrix for a logistic biplot model on either the logit (log-odds) scale or the probability scale.
Usage
fitted_LB(object, type = c("link", "response"))
Arguments
object |
An object of class |
type |
Scale of the fitted values: |
Value
A numeric matrix of fitted values with the same dimensions as the original binary matrix.
Author(s)
Giovany Babativa <jgbabativam@unal.edu.co>
Examples
data("Methylation")
LB <- LogBip(Methylation, plot = FALSE)
Theta <- fitted_LB(LB, type = "link") # log-odds scale
Pi <- fitted_LB(LB, type = "response") # probability scale
Fit a Binary Logistic Biplot via Gradient Descent
Description
Estimates the row-marker matrix A and the column-marker matrix
B of a binary logistic biplot using a simple (batch) gradient
descent algorithm. This function is mainly provided for pedagogical purposes
and benchmarking; the MM and CG methods in LogBip are
generally faster and more reliable.
Usage
gradientDesc(
x,
k = 2,
rate = 0.001,
converg = 0.001,
max_iter,
plot = FALSE,
...
)
Arguments
x |
A binary matrix. |
k |
Number of dimensions. Default is |
rate |
Learning rate |
converg |
Convergence tolerance: the algorithm stops when the relative
change in the loss function is below this value. Default is |
max_iter |
Maximum number of iterations. |
plot |
Logical; if |
... |
Additional arguments (currently unused). |
Details
The model is
\mathrm{logit}(\pi_{ij}) =
\log\!\left(\frac{\pi_{ij}}{1-\pi_{ij}}\right) =
\mu_j + \sum_{s=1}^k b_{js}\,a_{is} = \mu_j + \mathbf{a}_i^\top \mathbf{b}_j.
The gradient with respect to the full parameter vector is
\nabla\ell =
\left(\frac{\partial\ell}{\partial\boldsymbol{\mu}},\,
\frac{\partial\ell}{\partial\mathbf{A}},\,
\frac{\partial\ell}{\partial\mathbf{B}}\right) =
\left((\boldsymbol{\Pi}-\mathbf{X})^\top,\;
(\boldsymbol{\Pi}-\mathbf{X})\mathbf{B},\;
(\boldsymbol{\Pi}-\mathbf{X})^\top\mathbf{A}\right).
Value
An object of class BiplotML (a named list) containing:
AhatEstimated row-marker matrix.
BhatEstimated column-marker matrix (including intercepts).
methodCharacter string
"Gradient Descent".
Author(s)
Giovany Babativa <jgbabativam@unal.edu.co>
References
Vicente-Villardon, J. L., & Galindo, M. P. (2006). Logistic biplots. In M. Greenacre & J. Blasius (Eds.), Multiple Correspondence Analysis and Related Methods (pp. 503–521). Chapman & Hall.
See Also
Examples
data("Methylation")
set.seed(02052020)
outGD <- gradientDesc(x = Methylation, k = 2, max_iter = 10000, plot = TRUE)
Compare Optimization Algorithms for Binary Logistic Biplot Estimation
Description
Fits the binary logistic biplot model using multiple optimization algorithms and returns a summary of their computation time, convergence status, and number of function evaluations, facilitating algorithm selection.
Usage
performanceBLB(xi, k = 2, L = 0, method = NULL, maxit = NULL)
Arguments
xi |
A binary matrix. |
k |
Number of dimensions. Default is |
L |
Ridge penalization parameter. Default is |
method |
Algorithm group to compare: |
maxit |
Maximum number of iterations per algorithm. |
Details
The following algorithm groups are available via the method argument:
-
1— Derivative-free methods: Nelder-Mead, UOBYQA, NEWUOA. -
2— Gradient methods (default): CG, Rcgmin. -
3— Quasi-Newton methods: BFGS, L-BFGS-B, nlm, nlminb. -
4— All of the above.
Value
A data frame with one row per algorithm and columns:
methodAlgorithm name.
evaluatFinal value of the objective function.
convergenceConvergence status.
fevalsNumber of function evaluations.
timeElapsed computation time.
Author(s)
Giovany Babativa <jgbabativam@unal.edu.co>
References
Nash, J. C. (2011). Unifying optimization algorithms to aid software system users: optimx for R. Journal of Statistical Software, 43(9), 1–14.
Nash, J. C. (2014). On best practice optimization methods in R. Journal of Statistical Software, 60(2), 1–14.
Vicente-Villardon, J. L., & Galindo, M. P. (2006). Logistic biplots. In M. Greenacre & J. Blasius (Eds.), Multiple Correspondence Analysis and Related Methods (pp. 503–521). Chapman & Hall.
See Also
Examples
data("Methylation")
set.seed(123456)
# Gradient methods (default)
performanceBLB(xi = Methylation)
performanceBLB(xi = Methylation, maxit = 150)
# Derivative-free methods
performanceBLB(xi = Methylation, method = 1)
performanceBLB(xi = Methylation, method = 1, maxit = 100)
# Quasi-Newton methods
performanceBLB(xi = Methylation, method = 3)
performanceBLB(xi = Methylation, method = 3, maxit = 100)
# All methods
performanceBLB(xi = Methylation, method = 4)
Plot a Binary Logistic Biplot
Description
Produces a ggplot2-based logistic biplot from a BiplotML
object fitted with LogBip. Supports coloring and
shaping of row markers by a categorical variable, filled arrowheads,
dashed reference lines that span the full plot area, and flexible axis-limit control via
xylim, xlim, and ylim.
Usage
plotBLB(
x,
dim = c(1, 2),
col.ind = NULL,
col.var = "#0E185F",
label.ind = FALSE,
draw = c("biplot", "ind", "var"),
titles = NULL,
ellipses = FALSE,
endsegm = 0.75,
repel = FALSE,
xylim = NULL,
xlim = NULL,
ylim = NULL,
escala = NULL
)
Arguments
x |
An object of class |
dim |
Integer vector of length 2 specifying which dimensions to plot.
Default is |
col.ind |
Optional vector of the same length as the number of rows in
the original data, used to color and shape the row markers by a
categorical variable (e.g., |
col.var |
Color for the variable arrows. Default is |
label.ind |
Logical; if |
draw |
Which graph to draw. One of |
titles |
Main title for the plot. If |
ellipses |
Logical; if |
endsegm |
End point of the variable arrow on the probability scale.
The arrow starts at |
repel |
Logical; if |
xylim |
Numeric vector of length 2 specifying a symmetric range
applied to both axes, e.g., |
xlim |
Numeric vector of length 2 specifying the range of the
x-axis independently, e.g., |
ylim |
Numeric vector of length 2 specifying the range of the
y-axis independently, e.g., |
escala |
Positive numeric scalar. Multiplicative factor applied to
the row marker coordinates ( |
Details
Variable vectors are drawn as arrows from the point where the predicted
probability equals 0.5 to the point where it equals endsegm.
Short arrows indicate a rapid increase in the probability of the
corresponding characteristic. The orthogonal projection of a row marker
onto a variable's arrow approximates the probability that the
characteristic is present for that individual.
The three arguments that control axis limits are evaluated in the following order of priority:
-
xlimandylim(independent limits for each axis). -
xylim(symmetric limits applied to both axes). Automatic limits derived from all plotted elements.
The escala argument multiplies the row marker coordinates before
plotting so that they are visually comparable to the variable arrows,
which are expressed in the original parameter units. It only affects the
display, not the stored coordinates.
Value
A ggplot2 object that can be further customised with
standard ggplot2 functions (e.g., theme(), labs()).
Author(s)
Giovany Babativa <jgbabativam@unal.edu.co>
References
Meulman, J. J., & Heiser, W. J. (1983). The Display of Bootstrap Solutions in Multidimensional Scaling (Technical memorandum). Bell Laboratories.
Vicente-Villardon, J. L., & Galindo, M. P. (2006). Logistic biplots. In M. Greenacre & J. Blasius (Eds.), Multiple Correspondence Analysis and Related Methods (pp. 503–521). Chapman & Hall.
See Also
Examples
data("Methylation")
set.seed(123456)
res <- LogBip(x = Methylation, method = "MM", maxit = 1000, plot = FALSE)
Predict Binary Responses from a Logistic Biplot
Description
Predicts the binary response matrix from a fitted logistic biplot and computes the optimal classification threshold for each variable by minimising the Balanced Error Rate (BER).
Usage
pred_LB(object, x, ncuts = 100)
Arguments
object |
An object of class |
x |
The original binary matrix used to fit the model. |
ncuts |
Number of equally spaced threshold candidates in |
Details
The optimal threshold for variable j is the value \alpha_j \in [0,1]
that minimises the Balanced Error Rate:
BER_j = 1 - \frac{1}{2}
\left(\frac{TP_j}{TP_j + FN_j} + \frac{TN_j}{TN_j + FP_j}\right),
where TP, TN, FP, and FN denote true positives,
true negatives, false positives, and false negatives, respectively.
Value
A named list of class BiplotML with components:
thresholdsData frame with the optimal threshold and minimum BER for each variable.
predictXPredicted binary matrix.
fittedConfusion matrix (sensitivity, specificity, global accuracy) for each variable.
BEROverall Balanced Error Rate (in percent).
Author(s)
Giovany Babativa <jgbabativam@unal.edu.co>
Examples
data("Methylation")
LB <- LogBip(Methylation, plot = FALSE)
out <- pred_LB(LB, Methylation)
Fit a Binary Logistic Biplot with Missing Data via Block Coordinate Descent
Description
Estimates the intercept vector \mu, the row-marker matrix A,
and the column-marker matrix B using a data-projection model with a
block coordinate descent algorithm. Missing values in the binary matrix are
imputed iteratively during model fitting. This function also allows new
individuals to be projected as supplementary rows without refitting the model,
since the row markers are derived directly from the estimated column markers.
This is the low-level function called by LogBip when
method = "PDLB".
Usage
proj_LogBip(x, k = 5, max_iters = 1000, random_start = FALSE, epsilon = 1e-05)
Arguments
x |
A binary matrix, possibly containing |
k |
Number of dimensions. Default is |
max_iters |
Maximum number of iterations. Default is |
random_start |
Logical; if |
epsilon |
Convergence tolerance for the relative decrease in the loss
function. Default is |
Value
A named list with components:
muEstimated intercept vector of length
p.AEstimated row-marker matrix (
n \times k).BEstimated column-marker matrix (
p \times k).x_estImputed binary matrix (missing entries replaced by fitted values).
iterNumber of iterations performed.
loss_functVector of normalised loss-function values at each iteration.
Author(s)
Giovany Babativa <jgbabativam@unal.edu.co>
References
Babativa-Marquez, J. G., & Vicente-Villardon, J. L. (2026). Logistic biplot with missing data. In process.
Babativa-Marquez, J. G., & Vicente-Villardon, J. L. (2021). Logistic biplot by conjugate gradient algorithms and iterated SVD. Mathematics, 9(16), 2015. doi:10.3390/math9162015
Vicente-Villardon, J. L., & Galindo, M. P. (2006). Logistic biplots. In M. Greenacre & J. Blasius (Eds.), Multiple Correspondence Analysis and Related Methods (pp. 503–521). Chapman & Hall.
See Also
Examples
data("Methylation")
set.seed(12345)
n <- nrow(Methylation); p <- ncol(Methylation)
miss <- matrix(rbinom(n * p, 1, 0.2), n, p)
miss <- ifelse(miss == 1, NA, miss)
x_miss <- Methylation + miss
out <- proj_LogBip(x = x_miss, k = 2, max_iters = 1000)
Fit a Binary Logistic Biplot via Coordinate Descent MM Algorithm
Description
Estimates the intercept vector \mu, the row-marker matrix A,
and the column-marker matrix B using an iterative coordinate descent
Majorization-Minimization (MM) algorithm. This is the low-level function
called by LogBip when method = "MM".
Usage
sdv_MM(
x,
k = 5,
iterations = 1000,
truncated = TRUE,
random = FALSE,
epsilon = 1e-04
)
Arguments
x |
A binary matrix with no missing values. |
k |
Number of dimensions. Default is |
iterations |
Maximum number of iterations. Default is |
truncated |
Logical; if |
random |
Logical; if |
epsilon |
Convergence tolerance. The algorithm stops when the relative
decrease in the loss function is below this value. Default is |
Value
A named list with components:
muEstimated intercept vector of length
p.AEstimated row-marker matrix (
n \times k).BEstimated column-marker matrix (
p \times k).iterationsNumber of iterations performed.
loss_funcVector of normalised loss-function values at each iteration.
Author(s)
Giovany Babativa <jgbabativam@unal.edu.co>
References
Babativa-Marquez, J. G., & Vicente-Villardon, J. L. (2021). Logistic biplot by conjugate gradient algorithms and iterated SVD. Mathematics, 9(16), 2015. doi:10.3390/math9162015
Vicente-Villardon, J. L., & Galindo, M. P. (2006). Logistic biplots. In M. Greenacre & J. Blasius (Eds.), Multiple Correspondence Analysis and Related Methods (pp. 503–521). Chapman & Hall.
See Also
Examples
data("Methylation")
out <- sdv_MM(x = Methylation)
Simulate a Multivariate Binary Matrix
Description
Simulates a binary data matrix from a logistic biplot latent variable model with known parameters, useful for benchmarking and cross-validation studies.
Usage
simBin(n, p, k, D, C = 1)
Arguments
n |
Number of rows (individuals). |
p |
Number of columns (variables). |
k |
Number of underlying latent dimensions. |
D |
Sparsity control: the marginal probability of a 1 in the population. A value close to 0 or 1 yields a sparse or dense matrix, respectively. |
C |
Variance scaling factor for the row scores. Default is |
Value
A named list with components:
XSimulated binary matrix (
n \times p).PMatrix of true Bernoulli probabilities (
n \times p).ThetaMatrix of true log-odds (natural parameters).
ATrue row-marker matrix (
n \times k).BTrue column-marker matrix (
p \times k), orthonormal.muTrue intercept vector of length
p.DObserved proportion of ones in X.
nNumber of rows.
pNumber of columns.
Author(s)
Giovany Babativa <jgbabativam@unal.edu.co>
See Also
Examples
x <- simBin(n = 100, p = 50, k = 3, D = 0.5)