Spectral pre-processing recipes

Leonardo Ramirez-Lopez and Claudio Orellano

2026-06-25

1 Overview

Spectral preprocessing is a critical step in near-infrared (NIR) calibration workflows. Raw spectral data often contains systematic variations, noise, and artifacts that can obscure the true relationship between spectra and reference properties. The proximetricsR package provides a flexible, composable system for building and applying preprocessing pipelines.

1.1 Key concepts

This separation of specification and execution enables reproducible, device-aware preprocessing pipelines that can be stored, shared, and applied consistently.

2 Setup

library("proximetricsR")
data("NIRcannabis")
X <- NIRcannabis$spc

3 Preprocessing constructors

The prep_*() functions create preprocessing step objects. Each constructor validates its parameters and encodes algorithm-specific information. The order in which constructors are passed to preprocess_recipe() defines the execution order.

3.1 Resampling: prep_resample()

Resampling interpolates spectra to a new wavelength or wavenumber grid.

ProxiMate mode (user-defined grid):

prep_resample(grid = c(1001, 1700, 2))
- prep_resample 
    min_wav: 1001; max_wav: 1700; resolution: 2

ProxiScout mode (NeoSpectra standard grid):

prep_resample(grid = "proxiscout")
- prep_resample 

Resampling is often the first step to standardize wavelength grids across different instruments.

3.2 Smoothing: prep_smooth()

Smoothing reduces high-frequency noise while preserving spectral features.

Savitzky-Golay (ProxiScout compatible):

prep_smooth(w = 11, p = 3, algorithm = "savitzky-golay")
- prep_smooth 
    w: 11; p: 3; algorithm: 'savitzky-golay'

Moving average (ProxiMate compatible):

prep_smooth(w = 7, algorithm = "moving-average")
- prep_smooth 
    w: 7; algorithm: 'moving-average'

3.3 Standard Normal Variate: prep_snv()

SNV (Standard Normal Variate) normalizes each spectrum independently by centering and scaling:

\[SNV_i = \frac{x_i - \bar{x}_i}{s_i}\]

where \(\bar{x}_i\) and \(s_i\) are the mean and standard deviation of the \(i\)-th spectrum.

prep_snv()
- prep_snv 

SNV corrects for multiplicative effects (e.g., baseline offsets, path length variations) and is device-agnostic.

3.4 Derivatives: prep_derivative()

Derivatives enhance spectral differences and reduce baseline effects.

Savitzky-Golay (ProxiScout):

prep_derivative(m = 1, w = 11, p = 3, algorithm = "savitzky-golay")
- prep_derivative 
    m: 1; w: 11; p: 3; algorithm: 'savitzky-golay'

Gap-Segment (ProxiScout):

prep_derivative(m = 2, w = 9, p = 3, algorithm = "gap-segment")
- prep_derivative 
    m: 2; w: 9; p: 3; algorithm: 'gap-segment'

NIRWise PLUS compatible (ProxiMate):

prep_derivative(m = 1, w = 5, p = 11, algorithm = "nwp")
- prep_derivative 
    m: 1; w: 5; p: 11; algorithm: 'nwp'

Parameters: - m: derivative order (1 or 2) - w: window/gap size (positive odd integer) - p: polynomial order (Savitzky-Golay) or smoothing window (gap-segment, nwp) - algorithm: choice of method

3.5 Detrending: prep_detrend()

Detrending removes wavelength-dependent baseline effects by fitting and removing a polynomial trend (ProxiScout only):

prep_detrend(p = 2)
- prep_detrend 
    p: 2

For the full Barnes et al. (1989) procedure (SNV + detrending), chain prep_snv() before prep_detrend().

3.6 Reflectance/Absorbance conversion: prep_transform()

Convert between reflectance and absorbance using Beer’s Law (ProxiScout only):

\[A = -\log_{10}(R)\]

prep_transform(to = "absorbance")
- prep_transform 
    to: 'absorbance'

3.7 Wavelength trimming: prep_wav_trim()

Retain only a specified wavelength band and/or remove constant-valued edge columns (ProxiScout only):

prep_wav_trim(band = c(1000, 1800), trim_constant_edges = TRUE)
- prep_wav_trim 
    band: 1000, 1800; trim_constant_edges: TRUE

4 Building preprocessing recipes

The preprocess_recipe() function combines constructors into an ordered pipeline. Order matters: preprocessing steps are applied in the order specified.

4.1 Device compatibility

Different BUCHI devices support different preprocessing steps:

ProxiMate supports: - prep_resample() with user-defined grids - prep_smooth() with moving-average algorithm - prep_snv() - prep_derivative() with nwp algorithm

ProxiScout supports: - prep_resample() with “proxiscout” grid - prep_smooth() with savitzky-golay algorithm - prep_snv() - prep_derivative() with savitzky-golay or gap-segment algorithms - prep_detrend() - prep_transform() - prep_wav_trim()

4.2 Building recipes

Single preprocessing step (SNV only):

SNV is device-agnostic, so device is optional:

recipe_snv <- preprocess_recipe(prep_snv())
recipe_snv
Spectral preprocessing recipe (device: "unspecified"): 
 - Step 1: prep_snv

Multiple steps (requires device):

recipe_ps <- preprocess_recipe(
  prep_smooth(w = 7, p = 1, algorithm = "savitzky-golay"),
  prep_snv(),
  prep_derivative(m = 1, w = 5, p = 2, algorithm = "savitzky-golay"),
  device = "proxiscout"
)
recipe_ps
Spectral preprocessing recipe (device: "proxiscout"): 
 - Step 1: prep_smooth
    w: 7; p: 1; algorithm: 'savitzky-golay'
 - Step 2: prep_snv
 - Step 3: prep_derivative
    m: 1; w: 5; p: 2; algorithm: 'savitzky-golay'

ProxiMate-specific recipe:

recipe_pm <- preprocess_recipe(
  prep_smooth(w = 7, algorithm = "moving-average"),
  prep_snv(),
  prep_derivative(m = 1, w = 5, p = 11, algorithm = "nwp"),
  device = "proximate"
)
recipe_pm
Spectral preprocessing recipe (device: "proximate"): 
 - Step 1: prep_smooth
    w: 7; algorithm: 'moving-average'
 - Step 2: prep_snv
 - Step 3: prep_derivative
    m: 1; w: 5; p: 11; algorithm: 'nwp'

Recipes validate that all steps are compatible with the specified device and raise informative errors if not.

5 Applying recipes with process()

The process() function executes a recipe on spectral data:

X_snv <- process(X, recipe_snv)
dim(X_snv)
[1]  80 234
X_ps <- process(X, recipe_ps)
dim(X_ps)
[1]  80 224

The applied recipe is stored as an attribute and can be retrieved:

applied_recipe <- attr(X_ps, "preprocess_recipe")
applied_recipe
Spectral preprocessing recipe (device: "proxiscout"): 
 - Step 1: prep_smooth
    w: 7; p: 1; algorithm: 'savitzky-golay'
 - Step 2: prep_snv
 - Step 3: prep_derivative
    m: 1; w: 5; p: 2; algorithm: 'savitzky-golay'

6 Practical examples

6.1 Example 1: ProxiMate workflow

A typical ProxiMate workflow for fat/protein prediction:

recipe_pm_fat <- preprocess_recipe(
  prep_smooth(w = 7, algorithm = "moving-average"),
  prep_snv(),
  prep_derivative(m = 1, w = 5, p = 11, algorithm = "nwp"),
  device = "proximate"
)

X_fat_prep <- process(X, recipe_pm_fat)
head(X_fat_prep[, 1:5])
        1025    1028    1031    1034    1037
[1,] -0.0137 -0.0138 -0.0138 -0.0135 -0.0131
[2,] -0.0208 -0.0213 -0.0215 -0.0215 -0.0212
[3,] -0.0184 -0.0186 -0.0187 -0.0185 -0.0181
[4,] -0.0122 -0.0124 -0.0124 -0.0122 -0.0119
[5,] -0.0170 -0.0171 -0.0170 -0.0167 -0.0161
[6,] -0.0134 -0.0135 -0.0135 -0.0132 -0.0128

6.2 Example 2: ProxiScout workflow with detrending

ProxiScout instruments benefit from additional preprocessing steps:

recipe_ps_full <- preprocess_recipe(
  prep_resample(grid = "proxiscout"),
  prep_smooth(w = 7, p = 1, algorithm = "savitzky-golay"),
  prep_snv(),
  prep_detrend(p = 2),
  prep_derivative(m = 1, w = 5, p = 2, algorithm = "savitzky-golay"),
  device = "proxiscout"
)

X_ps_full <- process(X, recipe_ps_full)
dim(X_ps_full)
[1]  80 102

6.3 Example 3: Minimal preprocessing

Sometimes less is more. A minimal recipe with only SNV:

recipe_minimal <- preprocess_recipe(prep_snv())
X_minimal <- process(X, recipe_minimal)

6.4 Example 4: Wavelength band selection

Select a specific wavelength range for ProxiScout:

recipe_band <- preprocess_recipe(
  prep_wav_trim(band = c(1100, 1600)),
  prep_smooth(w = 5, p = 1, algorithm = "savitzky-golay"),
  prep_snv(),
  device = "proxiscout"
)

X_band <- process(X, recipe_band)
colnames(X_band)[c(1, ncol(X_band))]
[1] "1106" "1592"

7 Best practices

7.1 Order matters

Preprocessing steps affect each other. Common orderings:

  1. Smoothing → SNV → Derivative (standard for noise reduction + normalization + enhancement)
  2. Resampling → Smoothing → Derivative → SNV (device-specific resampling first)
  3. SNV → Detrend (full de-trending procedure)

7.2 Device-aware development

Always specify device = "proximate" or device = "proxiscout" when building recipes (except for SNV-only recipes). This ensures recipes are portable and the preprocessing is compatible with the target device.

7.3 Reproducibility

Store recipes alongside calibration models to ensure preprocessing is applied identically during prediction. The process() function attaches the recipe as an attribute for downstream tracking.

7.4 Parameter tuning

Preprocessing parameters are typically tuned during model development:

8 Summary

The preprocessing recipe system in proximetricsR provides a structured, reproducible approach to spectral preprocessing:

This design enables seamless integration with calibration workflows and ensures preprocessing is applied consistently from model development through deployment on BUCHI devices.