Help for package cABCanalysis

Type:

Package

Title:

Computed ABC Analysis

Version:

1.0

Description:

Identify the most relative data points by dividing a numeric data set into three classes A, B, and C, where class A items are the "import few", class C items are the "trivial many" with class B items being something in between, resembling the idea of the Pareto principle. This ABC classification is done using an ABC curve, which plots cumulative "Yield" against "Effort", similar to a Lorenz curve. Class borders are then precisely mathematically defined on that curve, aiding in interpretation. Based on: Ultsch A, Lotsch J (2015) "Computed ABC Analysis for rational Selection of most informative Variables in multivariate Data". PLoS ONE 10(6): e0129767. <doi:10.1371/journal.pone.0129767>.

Depends:

R (≥ 2.10.0)

Imports:

ggplot2, plotrix, grDevices, graphics, stats, utils

LazyData:

true

Suggests:

datasets, testthat (≥ 3.0.0)

License:

GPL-3

URL:

https://github.com/AndreHDev/cABC_Analysis

Encoding:

UTF-8

Date:

2026-04-20

RoxygenNote:

7.3.3

NeedsCompilation:

Config/testthat/edition:

Packaged:

2026-04-24 14:03:32 UTC; andre

Author:

Jorn Lotsch

[aut], André Himmelspach

[aut, cre]

Maintainer:

André Himmelspach <himmelspach@med.uni-frankfurt.de>

Repository:

CRAN

Date/Publication:

2026-04-28 19:00:33 UTC

SwissInhabitants in 1900

Description

Number of inhabitants in the 2896 villages of Switzerland in the year 1900.

Usage

data("SwissInhabitants")

Format

A numeric vector of length 2896 containing the population counts.

Details

This data set consists of the number of inhabitants in the 2896 communes (cities and villages) in Switzerland in 1900. The data is unordered for anonymity reasons.

Source

Schuler, M., Ullmann, D. (2002). Eidgenossische Volkszahlung: Bevoelkerungsentwicklung der Gemeinden. Bundesamt fur Statistik, Neuchatel, Switzerland.

References

Behnisch, M., Ultsch, A. (2010). Population Patterns in Switzerland 1850-2000. In: Gaul, W. et al (Eds), Advances in Data Analysis, Data Handling and Business Intelligence, Springer, Heidelberg, pp. 163-173.

Examples

data(SwissInhabitants)
summary(SwissInhabitants)

ABC Classification

Description

Divides a numeric dataset into three classes (A, B, and C) using ABC analysis. The classification is based on geometric properties of the ABC curve and identifies regions of high, balanced, and low efficiency. Class interpretation:

A:	Low effort, high yield (Pareto items)
B:	Balanced effort and yield
C:	High effort, low yield (submarginal items)

Usage

cABC_analysis(Data, PlotIt = FALSE, useGGPlot = TRUE)

Arguments

Data

Positive numeric vector which is not uniformly distributed. If matrix or dataframe then the first column will be used.

PlotIt

Logical. If TRUE, an ABC plot is generated.

useGGPlot

Logical, default TRUE. If TRUE a ggplot2 plot is produced; if FALSE a base-R plot is produced. Only relevant when PlotIt = TRUE.

Details

Calculation of Boundaries is done on the ABC Curve (see cABC_curve) with:

Pareto Point:	The point with minimal distance to (0,1) -> A\|B Boundary
Breakeven Point:	The point where slope equals to 1
Juren Point:	The point with minimal distance to (BreakevenPoint_x,1) -> B\|C Boundary

For more calculation details see: Ultsch A, Lotsch J (2015) "Computed ABC Analysis for rational Selection of most informative Variables in multivariate Data". PLoS ONE 10(6): e0129767. <doi:10.1371/journal.pone.0129767>.

Data cleaning: Before classification, non-numeric values and NAs are coerced to 0, negative values are set to 0. A warning is issued when items are converted. If a matrix or data frame is supplied, only the first column is used.

Degenerate inputs (single point, all-identical values, very small datasets) are caught before curve fitting, see cABC_handle_specials for the full behavior. Boundary duplicate values that span two classes after classification are resolved by cABC_postprocess_classes. In both cases a warning is issued when a special case is triggered.

Value

A list containing:

Aind, Bind, Cind: Integer vectors of indices (into the original Data) for items assigned to classes A, B, and C respectively. In special-case returns (single point or all-identical), only Aind is populated; Bind and Cind are integer(0).
ABexchanged: Logical; TRUE if the Pareto point and Break-even point were swapped to maintain coordinate logic (i.e. the Break-even point was to the left of the Pareto point on the curve).
A, B, C: c(x, y) coordinates for the Pareto point (A), the Break-even point (B), and the Submarginal point (C). NULL in special-case returns.
smallestAData: Cumulative yield at the boundary of Class A. NULL in special-case returns.
smallestBData: Cumulative yield at the boundary of Class B. NULL in special-case returns.
AlimitIndInInterpolation: Index of the A boundary in the interpolated [p, ABC] curve. NULL in special-case returns.
BlimitIndInInterpolation: Index of the C boundary in the interpolated [p, ABC] curve. NULL in special-case returns.
p: Numeric vector of effort values (x-axis) of the interpolation curve. NULL in special-case returns.
ABC: Numeric vector of yield values (y-axis) of the interpolation curve. NULL in special-case returns.
ABLimit: Data value closest to the threshold separating Class A from Class B. NULL in special-case returns.
BCLimit: Data value closest to the threshold separating Class B from Class C. NULL in special-case returns.

Author(s)

André Himmelspach (01/2026)

Examples

data("SwissInhabitants")
abc <- cABC_analysis(SwissInhabitants, PlotIt = TRUE)

# Extract the data belonging to each class
A <- abc$Aind; B <- abc$Bind; C <- abc$Cind
Agroup <- SwissInhabitants[A]
Bgroup <- SwissInhabitants[B]
Cgroup <- SwissInhabitants[C]

cABC Curve Computation

Description

Computes cumulative percentage of largest data (effort) and cumulative percentages of sum of largest Data (yield) with monotone hyman spline interpolation used to generate in between points.

Usage

cABC_curve(Data, p)

Arguments

Data

Numeric vector/matrix. First column used if matrix. Only positive values used.

p

Optional x-values for spline interpolation. Default: finer grid for large datasets.

Value

List containing: Curve: Data frame with Effort (x) and Yield (y) of interpolated curve Slope: Data frame with p (x-values) and cABC (first derivative)

Handle Special Cases Before ABC Classification

Description

Checks for degenerate input conditions that would make a standard ABC analysis undefined or unreliable, and returns an early result or warning where appropriate. This function is called by cABC_analysis before the ABC curve is computed.

Usage

cABC_handle_specials(Data)

Arguments

Data

Named numeric vector of positive values (already cleaned by cABC_analysis: no NAs, no non-positives, names preserved).

Details

The following special cases are handled:

Single data point: If only one positive value remains after cleaning, it is assigned to Class A and a warning is issued. The returned list has Aind = 1 and all other fields empty/NULL.
All values identical: If every data point has the same value, all items are considered equally important. They are all assigned to Class A (not Class B as the warning text historically stated) and a warning is issued. The returned list has Aind set to all indices and all other fields empty/NULL.
Very small dataset: If three or fewer positive values remain after cleaning, a warning is issued that the ABC classification may be unstable, but processing continues normally and NULL is returned so that cABC_analysis proceeds with the standard algorithm.

Value

NULL if no special case applies (normal processing should continue). Otherwise a named list with the same structure as the return value of cABC_analysis, where only Aind (and optionally Bind, Cind) are populated and all curve-related fields are NULL or empty.

cABC Plot

Description

Draws an ABC curve together with identity and optional uniform reference curves, ABC set boundaries (A-B and B-C), labels, and point counts.

Usage

cABC_plot(
  CurveData,
  CleanData,
  Boundaries,
  Set_counts,
  x_vals,
  y_vals,
  LineType = 0,
  LineWidth = 3,
  ShowUniform = TRUE,
  Plot_title = "ABC plot",
  defaultAxes = FALSE
)

Arguments

CurveData

Data about the ABC Curve as returned by ABC_curve

CleanData

Clean original input data.

Boundaries

A list with numeric vectors A, B, and C, each of length 2, giving the x/y coordinates of the ABC boundaries.

Set_counts

A list with elements nA, nB, nC giving the number of observations in sets A, B, and C.

x_vals

Numeric vector of x coordinates of original data points.

y_vals

Numeric vector of y coordinates of original data points.

LineType

Integer. If 0 (default), the ABC curve is drawn as a line.

LineWidth

Numeric. Line width for the ABC curve. Default is 3.

ShowUniform

Logical. If TRUE (default), the uniform reference curve is drawn in addition to the identity and ABC curves.

Plot_title

Character string. Title of the plot. Default is "ABC plot".

defaultAxes

Logical. If TRUE (default FALSE), base R axes are drawn by plot(). If FALSE, custom axes with ticks at 0–1 in steps of 0.1 are drawn.

Details

The plot always uses a square coordinate system with both axes ranging from 0 to 1. The diagonal y = 1 - x (equilibrium line) and the identity line y = x are drawn as references. ABC set boundaries (A|B and B|C) are visualized with stars and orthogonal boundary lines.

Value

Base R Plot

cABC_plotGG

Description

ggplot2 version matching base R cABC_plot

Usage

cABC_plotGG(
  CurveData,
  CleanData,
  Boundaries,
  Set_counts,
  x_vals,
  y_vals,
  LineWidth = 1.25,
  ShowUniform = TRUE,
  Plot_title = "ABC plot"
)

Arguments

CurveData

Data about the ABC Curve as returned by ABC_curve

CleanData

Clean original input data.

Boundaries

A list with numeric vectors A, B, and C, each of length 2, giving the x/y coordinates of the ABC boundaries.

Set_counts

A list with elements nA, nB, nC giving the number of observations in sets A, B, and C.

x_vals

Numeric vector of x coordinates of original data points.

y_vals

Numeric vector of y coordinates of original data points.

LineWidth

Numeric. Line width for the ABC curve. Default is 3.

ShowUniform

Logical. If TRUE (default), the uniform reference curve is drawn in addition to the identity and ABC curves.

Plot_title

Character string. Title of the plot. Default is "ABC plot".

Details

Value

ggplot2 object

Post-process ABC Classes to Resolve Boundary Duplicates

Description

After the initial class assignment in cABC_analysis, it is possible for data points with the same value to be split across two or even all three classes (A, B, C) because the geometric boundary cuts through a run of identical values. This function detects such duplicates and consolidates all occurrences of an ambiguous value into a single class using a deterministic tie-breaking strategy.

Usage

cABC_postprocess_classes(Aind, Bind, Cind, Data, sorted_data, ABLimit, BCLimit)

Arguments

Aind

Integer vector of indices currently assigned to Class A.

Bind

Integer vector of indices currently assigned to Class B.

Cind

Integer vector of indices currently assigned to Class C.

Data

Named numeric vector of the (unsorted) input data, as cleaned by cABC_analysis.

sorted_data

Numeric vector; Data sorted in decreasing order (used internally for boundary reference).

ABLimit

Numeric scalar; the data value closest to the A/B boundary threshold (as computed in cABC_analysis).

BCLimit

Numeric scalar; the data value closest to the B/C boundary threshold (as computed in cABC_analysis).

Details

Tie-breaking rules:

The class that contains the most occurrences of the duplicate value wins outright.
If all three classes are tied, the duplicate value is compared to both boundary limits. It is assigned to whichever boundary (ABLimit or BCLimit) it is closest to, then placed in the class above that boundary (i.e. closer to AB → A if dup_val >= ABLimit, else B; closer to BC → B if dup_val >= BCLimit, else C). If equidistant from both boundaries it is assigned to B.
If exactly two classes are tied, the pair determines the rule:
- A vs B: compare to ABLimit; >= ABLimit → A, otherwise → B.
- B vs C: compare to BCLimit; >= BCLimit → B, otherwise → C.
- A vs C: always assign to A, since the value was already deemed important enough to appear in the top class.

A warning is issued whenever at least one duplicate boundary value is found, prompting the user to inspect the data and the ABC plot.

Value

A named list with three elements:

Aind: Sorted integer vector of indices for Class A after deduplication.
Bind: Sorted integer vector of indices for Class B after deduplication.
Cind: Sorted integer vector of indices for Class C after deduplication.