Skip to content

Preprocessing API

Spectral preprocessing functions for baseline correction, normalization, and noise reduction.

The foodspec.preprocess module provides tools for cleaning and normalizing spectral data before analysis.

Baseline Correction

baseline_als

Asymmetric Least Squares baseline correction.

Asymmetric least squares (ALS) baseline estimation.

Parameters:

Name Type Description Default
y ndarray

1D signal.

required
lam float

Smoothness penalty lambda.

100000.0
p float

Asymmetry parameter (0-1).

0.01
niter int

Iterations.

10

Returns:

Type Description
ndarray

np.ndarray: Estimated baseline, same shape as y.

Examples:

>>> import numpy as np
>>> y = np.sin(np.linspace(0, 1, 10)) + 0.5
>>> base = baseline_als(y)
>>> base.shape
(10,)

baseline_polynomial

Polynomial fitting baseline correction.

Polynomial baseline fit.

Parameters:

Name Type Description Default
x ndarray

Wavenumbers.

required
y ndarray

Signal.

required
order int

Polynomial order.

3

Returns:

Type Description
ndarray

np.ndarray: Baseline values.

baseline_rubberband

Rubberband baseline correction.

Rubberband baseline via convex hull interpolation.

Parameters:

Name Type Description Default
x ndarray

Wavenumber axis (1D, increasing).

required
y ndarray

Signal values.

required

Returns:

Type Description
ndarray

np.ndarray: Baseline values along x.

Normalization

VectorNormalizer

L2 vector normalization.

Bases: BaseEstimator, TransformerMixin

Vector normalization across the spectral axis.

Parameters:

Name Type Description Default
norm Literal['l1', 'l2', 'max']

Normalization type. One of "l1", "l2", or "max".

'l2'

Examples:

>>> from foodspec.preprocess import VectorNormalizer
>>> import numpy as np
>>> X = np.array([[1, 2, 3], [4, 5, 6]], dtype=float)
>>> normalizer = VectorNormalizer(norm="l2")
>>> X_norm = normalizer.fit_transform(X)
>>> np.allclose(np.linalg.norm(X_norm, axis=1), 1.0)
True

SNVNormalizer

Standard Normal Variate normalization.

Bases: BaseEstimator, TransformerMixin

Standard Normal Variate (SNV) normalization.

Centers each spectrum to zero mean and unit variance. Useful for reducing multiplicative scatter and additive baseline effects in NIR/Raman spectra.

Examples:

>>> from foodspec.preprocess import SNVNormalizer
>>> import numpy as np
>>> X = np.array([[1, 2, 3], [4, 5, 6]], dtype=float)
>>> normalizer = SNVNormalizer()
>>> X_norm = normalizer.fit_transform(X)
>>> np.allclose(X_norm.mean(axis=1), 0.0, atol=1e-12)
True
>>> np.allclose(X_norm.std(axis=1), 1.0)
True

AreaNormalizer

Area-under-curve normalization.

Bases: BaseEstimator, TransformerMixin

Normalize spectra to unit area under the curve.

Uses trapezoidal integration to compute area and scales each spectrum so its integral equals 1.

Examples:

>>> from foodspec.preprocess import AreaNormalizer
>>> import numpy as np
>>> X = np.array([[1, 2, 3], [4, 5, 6]], dtype=float)
>>> normalizer = AreaNormalizer()
>>> X_norm = normalizer.fit_transform(X)
>>> np.allclose(np.trapezoid(X_norm[0]), 1.0, atol=0.1)
True

Smoothing

SavitzkyGolaySmoother

Savitzky-Golay filter for smoothing and derivatives.

Bases: BaseEstimator, TransformerMixin

Savitzky-Golay smoothing filter for spectra.

Fits local polynomial models to smooth spectra while preserving peak shapes.

Parameters:

Name Type Description Default
window_length int

Window size (must be odd and positive).

7
polyorder int

Polynomial order (must be less than window_length).

3

Raises:

Type Description
ValueError

If window_length is even, non-positive, or less than or equal to polyorder, or exceeds the number of points.

Examples:

>>> from foodspec.preprocess import SavitzkyGolaySmoother
>>> import numpy as np
>>> X = np.random.randn(10, 100)
>>> smoother = SavitzkyGolaySmoother(window_length=7, polyorder=3)
>>> X_smooth = smoother.fit_transform(X)
>>> X_smooth.shape == X.shape
True

MovingAverageSmoother

Simple moving average smoothing.

Bases: BaseEstimator, TransformerMixin

Simple moving average smoothing filter.

Parameters:

Name Type Description Default
window_size int

Number of adjacent points to average.

5

Raises:

Type Description
ValueError

If window_size is non-positive or exceeds the spectrum length.

Examples:

>>> from foodspec.preprocess import MovingAverageSmoother
>>> import numpy as np
>>> X = np.random.randn(5, 50)
>>> smoother = MovingAverageSmoother(window_size=5)
>>> X_smooth = smoother.fit_transform(X)
>>> X_smooth.shape == X.shape
True

Noise & Artifact Removal

correct_cosmic_rays

Remove cosmic ray spikes from Raman spectra.

Detect and correct cosmic ray spikes in spectra.

Uses robust z-score computed from local median and MAD (median absolute deviation) to detect spikes. Corrects by replacing spike values with local median.

Parameters:

Name Type Description Default
X ndarray

Spectral data array (n_samples, n_wavenumbers).

required
window int

Rolling window size for spike detection.

5
zscore_thresh float

Threshold for spike detection (default 8.0).

8.0

Returns:

Type Description
(ndarray, CosmicRayReport)

A tuple (X_corrected, report) where report is a CosmicRayReport

(ndarray, CosmicRayReport)

containing total spike count and per-spectrum spike counts.

Examples:

>>> import numpy as np
>>> from foodspec.preprocess.spikes import correct_cosmic_rays
>>> X = np.ones((2, 50))
>>> X[0, 25] = 100  # simulate spike
>>> X_corr, report = correct_cosmic_rays(X, window=5, zscore_thresh=5.0)
>>> report.total_spikes > 0
True

CosmicRayRemover

Advanced cosmic ray removal for Raman spectroscopy.

Bases: BaseEstimator, TransformerMixin

Basic cosmic ray spike removal for Raman spectra.

Detects spikes as points exceeding the local median by sigma_thresh times the local MAD (median absolute deviation) and replaces them by linear interpolation of neighboring points.

Parameters:

Name Type Description Default
window int

Window size for local statistics (default 5).

5
sigma_thresh float

Z-score threshold for spike detection (default 8.0).

8.0

Examples:

>>> from foodspec.preprocess.raman import CosmicRayRemover
>>> import numpy as np
>>> X = np.ones((3, 50))
>>> X[0, 25] = 100  # spike
>>> remover = CosmicRayRemover(window=5, sigma_thresh=5.0)
>>> X_clean = remover.fit_transform(X)
>>> X_clean[0, 25] < 10
True

Spectral Cropping

RangeCropper

Crop spectra to wavenumber ranges.

Bases: BaseEstimator

Crop spectra to a specified wavenumber range.

Parameters:

Name Type Description Default
min_wn float

Minimum wavenumber (inclusive).

required
max_wn float

Maximum wavenumber (inclusive).

required

Raises:

Type Description
ValueError

If min_wn >= max_wn or no points fall in the range.

Examples:

>>> from foodspec.preprocess import RangeCropper
>>> import numpy as np
>>> X = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
>>> wn = np.array([1000, 1100, 1200, 1300])
>>> cropper = RangeCropper(min_wn=1050, max_wn=1250)
>>> X_crop, wn_crop = cropper.transform(X, wn)
>>> wn_crop.tolist()
[1100, 1200]

See Also