| Type: | Package |
| Title: | Computed ABC Analysis |
| Version: | 1.0 |
| Description: | Identify the most relative data points by dividing a numeric data set into three classes A, B, and C, where class A items are the "import few", class C items are the "trivial many" with class B items being something in between, resembling the idea of the Pareto principle. This ABC classification is done using an ABC curve, which plots cumulative "Yield" against "Effort", similar to a Lorenz curve. Class borders are then precisely mathematically defined on that curve, aiding in interpretation. Based on: Ultsch A, Lotsch J (2015) "Computed ABC Analysis for rational Selection of most informative Variables in multivariate Data". PLoS ONE 10(6): e0129767. <doi:10.1371/journal.pone.0129767>. |
| Depends: | R (≥ 2.10.0) |
| Imports: | ggplot2, plotrix, grDevices, graphics, stats, utils |
| LazyData: | true |
| Suggests: | datasets, testthat (≥ 3.0.0) |
| License: | GPL-3 |
| URL: | https://github.com/AndreHDev/cABC_Analysis |
| Encoding: | UTF-8 |
| Date: | 2026-04-20 |
| RoxygenNote: | 7.3.3 |
| NeedsCompilation: | no |
| Config/testthat/edition: | 3 |
| Packaged: | 2026-04-24 14:03:32 UTC; andre |
| Author: | Jorn Lotsch |
| Maintainer: | André Himmelspach <himmelspach@med.uni-frankfurt.de> |
| Repository: | CRAN |
| Date/Publication: | 2026-04-28 19:00:33 UTC |
SwissInhabitants in 1900
Description
Number of inhabitants in the 2896 villages of Switzerland in the year 1900.
Usage
data("SwissInhabitants")
Format
A numeric vector of length 2896 containing the population counts.
Details
This data set consists of the number of inhabitants in the 2896 communes (cities and villages) in Switzerland in 1900. The data is unordered for anonymity reasons.
Source
Schuler, M., Ullmann, D. (2002). Eidgenossische Volkszahlung: Bevoelkerungsentwicklung der Gemeinden. Bundesamt fur Statistik, Neuchatel, Switzerland.
References
Behnisch, M., Ultsch, A. (2010). Population Patterns in Switzerland 1850-2000. In: Gaul, W. et al (Eds), Advances in Data Analysis, Data Handling and Business Intelligence, Springer, Heidelberg, pp. 163-173.
Examples
data(SwissInhabitants)
summary(SwissInhabitants)
ABC Classification
Description
Divides a numeric dataset into three classes (A, B, and C) using ABC analysis. The classification is based on geometric properties of the ABC curve and identifies regions of high, balanced, and low efficiency. Class interpretation:
| A: | Low effort, high yield (Pareto items) |
| B: | Balanced effort and yield |
| C: | High effort, low yield (submarginal items) |
Usage
cABC_analysis(Data, PlotIt = FALSE, useGGPlot = TRUE)
Arguments
Data |
Positive numeric vector which is not uniformly distributed. If matrix or dataframe then the first column will be used. |
PlotIt |
Logical. If |
useGGPlot |
Logical, default |
Details
Calculation of Boundaries is done on the ABC Curve
(see cABC_curve) with:
| Pareto Point: | The point with minimal distance to (0,1) -> A|B Boundary |
| Breakeven Point: | The point where slope equals to 1 |
| Juren Point: | The point with minimal distance to (BreakevenPoint_x,1) -> B|C Boundary |
For more calculation details see: Ultsch A, Lotsch J (2015) "Computed ABC Analysis for rational Selection of most informative Variables in multivariate Data". PLoS ONE 10(6): e0129767. <doi:10.1371/journal.pone.0129767>.
Data cleaning: Before classification, non-numeric values and
NAs are coerced to 0, negative values are set to 0.
A warning is issued when items are converted. If a matrix or data frame is
supplied, only the first column is used.
Degenerate inputs (single point, all-identical values, very small datasets)
are caught before curve fitting, see cABC_handle_specials for
the full behavior. Boundary duplicate values that span two classes after
classification are resolved by cABC_postprocess_classes.
In both cases a warning is issued when a special case is triggered.
Value
A list containing:
- Aind, Bind, Cind
Integer vectors of indices (into the original
Data) for items assigned to classes A, B, and C respectively. In special-case returns (single point or all-identical), onlyAindis populated;BindandCindareinteger(0).- ABexchanged
Logical;
TRUEif the Pareto point and Break-even point were swapped to maintain coordinate logic (i.e. the Break-even point was to the left of the Pareto point on the curve).- A, B, C
c(x, y)coordinates for the Pareto point (A), the Break-even point (B), and the Submarginal point (C).NULLin special-case returns.- smallestAData
Cumulative yield at the boundary of Class A.
NULLin special-case returns.- smallestBData
Cumulative yield at the boundary of Class B.
NULLin special-case returns.- AlimitIndInInterpolation
Index of the A boundary in the interpolated
[p, ABC]curve.NULLin special-case returns.- BlimitIndInInterpolation
Index of the C boundary in the interpolated
[p, ABC]curve.NULLin special-case returns.- p
Numeric vector of effort values (x-axis) of the interpolation curve.
NULLin special-case returns.- ABC
Numeric vector of yield values (y-axis) of the interpolation curve.
NULLin special-case returns.- ABLimit
Data value closest to the threshold separating Class A from Class B.
NULLin special-case returns.- BCLimit
Data value closest to the threshold separating Class B from Class C.
NULLin special-case returns.
Author(s)
André Himmelspach (01/2026)
Examples
data("SwissInhabitants")
abc <- cABC_analysis(SwissInhabitants, PlotIt = TRUE)
# Extract the data belonging to each class
A <- abc$Aind; B <- abc$Bind; C <- abc$Cind
Agroup <- SwissInhabitants[A]
Bgroup <- SwissInhabitants[B]
Cgroup <- SwissInhabitants[C]
cABC Curve Computation
Description
Computes cumulative percentage of largest data (effort) and cumulative percentages of sum of largest Data (yield) with monotone hyman spline interpolation used to generate in between points.
Usage
cABC_curve(Data, p)
Arguments
Data |
Numeric vector/matrix. First column used if matrix. Only positive values used. |
p |
Optional x-values for spline interpolation. Default: finer grid for large datasets. |
Value
List containing: Curve: Data frame with Effort (x) and Yield (y) of interpolated curve Slope: Data frame with p (x-values) and cABC (first derivative)
Handle Special Cases Before ABC Classification
Description
Checks for degenerate input conditions that would make a standard ABC
analysis undefined or unreliable, and returns an early result or warning
where appropriate. This function is called by cABC_analysis before
the ABC curve is computed.
Usage
cABC_handle_specials(Data)
Arguments
Data |
Named numeric vector of positive values (already cleaned by
|
Details
The following special cases are handled:
- Single data point
If only one positive value remains after cleaning, it is assigned to Class A and a warning is issued. The returned list has
Aind = 1and all other fields empty/NULL.- All values identical
If every data point has the same value, all items are considered equally important. They are all assigned to Class A (not Class B as the warning text historically stated) and a warning is issued. The returned list has
Aindset to all indices and all other fields empty/NULL.- Very small dataset
If three or fewer positive values remain after cleaning, a warning is issued that the ABC classification may be unstable, but processing continues normally and
NULLis returned so thatcABC_analysisproceeds with the standard algorithm.
Value
NULL if no special case applies (normal processing should
continue). Otherwise a named list with the same structure as the return
value of cABC_analysis, where only Aind (and optionally
Bind, Cind) are populated and all curve-related fields are
NULL or empty.
cABC Plot
Description
Draws an ABC curve together with identity and optional uniform reference curves, ABC set boundaries (A-B and B-C), labels, and point counts.
Usage
cABC_plot(
CurveData,
CleanData,
Boundaries,
Set_counts,
x_vals,
y_vals,
LineType = 0,
LineWidth = 3,
ShowUniform = TRUE,
Plot_title = "ABC plot",
defaultAxes = FALSE
)
Arguments
CurveData |
Data about the ABC Curve as returned by ABC_curve |
CleanData |
Clean original input data. |
Boundaries |
A list with numeric vectors A, B, and C, each of length 2, giving the x/y coordinates of the ABC boundaries. |
Set_counts |
A list with elements nA, nB, nC giving the number of observations in sets A, B, and C. |
x_vals |
Numeric vector of x coordinates of original data points. |
y_vals |
Numeric vector of y coordinates of original data points. |
LineType |
Integer. If 0 (default), the ABC curve is drawn as a line. |
LineWidth |
Numeric. Line width for the ABC curve. Default is 3. |
ShowUniform |
Logical. If TRUE (default), the uniform reference curve is drawn in addition to the identity and ABC curves. |
Plot_title |
Character string. Title of the plot. Default is "ABC plot". |
defaultAxes |
Logical. If TRUE (default FALSE), base R axes are drawn by plot(). If FALSE, custom axes with ticks at 0–1 in steps of 0.1 are drawn. |
Details
The plot always uses a square coordinate system with both axes ranging from 0 to 1. The diagonal y = 1 - x (equilibrium line) and the identity line y = x are drawn as references. ABC set boundaries (A|B and B|C) are visualized with stars and orthogonal boundary lines.
Value
Base R Plot
cABC_plotGG
Description
ggplot2 version matching base R cABC_plot
Usage
cABC_plotGG(
CurveData,
CleanData,
Boundaries,
Set_counts,
x_vals,
y_vals,
LineWidth = 1.25,
ShowUniform = TRUE,
Plot_title = "ABC plot"
)
Arguments
CurveData |
Data about the ABC Curve as returned by ABC_curve |
CleanData |
Clean original input data. |
Boundaries |
A list with numeric vectors A, B, and C, each of length 2, giving the x/y coordinates of the ABC boundaries. |
Set_counts |
A list with elements nA, nB, nC giving the number of observations in sets A, B, and C. |
x_vals |
Numeric vector of x coordinates of original data points. |
y_vals |
Numeric vector of y coordinates of original data points. |
LineWidth |
Numeric. Line width for the ABC curve. Default is 3. |
ShowUniform |
Logical. If TRUE (default), the uniform reference curve is drawn in addition to the identity and ABC curves. |
Plot_title |
Character string. Title of the plot. Default is "ABC plot". |
Details
The plot always uses a square coordinate system with both axes ranging from 0 to 1. The diagonal y = 1 - x (equilibrium line) and the identity line y = x are drawn as references. ABC set boundaries (A|B and B|C) are visualized with stars and orthogonal boundary lines. Shows individual points if they are less then 20.
Value
ggplot2 object
Post-process ABC Classes to Resolve Boundary Duplicates
Description
After the initial class assignment in cABC_analysis, it is possible
for data points with the same value to be split across two or even all three
classes (A, B, C) because the geometric boundary cuts through a run of
identical values. This function detects such duplicates and consolidates all
occurrences of an ambiguous value into a single class using a deterministic
tie-breaking strategy.
Usage
cABC_postprocess_classes(Aind, Bind, Cind, Data, sorted_data, ABLimit, BCLimit)
Arguments
Aind |
Integer vector of indices currently assigned to Class A. |
Bind |
Integer vector of indices currently assigned to Class B. |
Cind |
Integer vector of indices currently assigned to Class C. |
Data |
Named numeric vector of the (unsorted) input data, as cleaned by
|
sorted_data |
Numeric vector; |
ABLimit |
Numeric scalar; the data value closest to the A/B boundary
threshold (as computed in |
BCLimit |
Numeric scalar; the data value closest to the B/C boundary
threshold (as computed in |
Details
Tie-breaking rules:
The class that contains the most occurrences of the duplicate value wins outright.
If all three classes are tied, the duplicate value is compared to both boundary limits. It is assigned to whichever boundary (
ABLimitorBCLimit) it is closest to, then placed in the class above that boundary (i.e. closer to AB → A ifdup_val >= ABLimit, else B; closer to BC → B ifdup_val >= BCLimit, else C). If equidistant from both boundaries it is assigned to B.If exactly two classes are tied, the pair determines the rule:
-
A vs B: compare to
ABLimit;>= ABLimit→ A, otherwise → B. -
B vs C: compare to
BCLimit;>= BCLimit→ B, otherwise → C. -
A vs C: always assign to A, since the value was already deemed important enough to appear in the top class.
-
A warning is issued whenever at least one duplicate boundary value is found, prompting the user to inspect the data and the ABC plot.
Value
A named list with three elements:
- Aind
Sorted integer vector of indices for Class A after deduplication.
- Bind
Sorted integer vector of indices for Class B after deduplication.
- Cind
Sorted integer vector of indices for Class C after deduplication.