Preprocessing API¶
Spectral preprocessing functions for baseline correction, normalization, and noise reduction.
The foodspec.preprocess module provides tools for cleaning and normalizing spectral data before analysis.
Baseline Correction¶
baseline_als¶
Asymmetric Least Squares baseline correction.
Asymmetric least squares (ALS) baseline estimation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y
|
ndarray
|
1D signal. |
required |
lam
|
float
|
Smoothness penalty lambda. |
100000.0
|
p
|
float
|
Asymmetry parameter (0-1). |
0.01
|
niter
|
int
|
Iterations. |
10
|
Returns:
| Type | Description |
|---|---|
ndarray
|
np.ndarray: Estimated baseline, same shape as |
Examples:
>>> import numpy as np
>>> y = np.sin(np.linspace(0, 1, 10)) + 0.5
>>> base = baseline_als(y)
>>> base.shape
(10,)
baseline_polynomial¶
Polynomial fitting baseline correction.
Polynomial baseline fit.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
ndarray
|
Wavenumbers. |
required |
y
|
ndarray
|
Signal. |
required |
order
|
int
|
Polynomial order. |
3
|
Returns:
| Type | Description |
|---|---|
ndarray
|
np.ndarray: Baseline values. |
baseline_rubberband¶
Rubberband baseline correction.
Rubberband baseline via convex hull interpolation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
ndarray
|
Wavenumber axis (1D, increasing). |
required |
y
|
ndarray
|
Signal values. |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
np.ndarray: Baseline values along |
Normalization¶
VectorNormalizer¶
L2 vector normalization.
Bases: BaseEstimator, TransformerMixin
Vector normalization across the spectral axis.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
norm
|
Literal['l1', 'l2', 'max']
|
Normalization type. One of "l1", "l2", or "max". |
'l2'
|
Examples:
>>> from foodspec.preprocess import VectorNormalizer
>>> import numpy as np
>>> X = np.array([[1, 2, 3], [4, 5, 6]], dtype=float)
>>> normalizer = VectorNormalizer(norm="l2")
>>> X_norm = normalizer.fit_transform(X)
>>> np.allclose(np.linalg.norm(X_norm, axis=1), 1.0)
True
SNVNormalizer¶
Standard Normal Variate normalization.
Bases: BaseEstimator, TransformerMixin
Standard Normal Variate (SNV) normalization.
Centers each spectrum to zero mean and unit variance. Useful for reducing multiplicative scatter and additive baseline effects in NIR/Raman spectra.
Examples:
>>> from foodspec.preprocess import SNVNormalizer
>>> import numpy as np
>>> X = np.array([[1, 2, 3], [4, 5, 6]], dtype=float)
>>> normalizer = SNVNormalizer()
>>> X_norm = normalizer.fit_transform(X)
>>> np.allclose(X_norm.mean(axis=1), 0.0, atol=1e-12)
True
>>> np.allclose(X_norm.std(axis=1), 1.0)
True
AreaNormalizer¶
Area-under-curve normalization.
Bases: BaseEstimator, TransformerMixin
Normalize spectra to unit area under the curve.
Uses trapezoidal integration to compute area and scales each spectrum so its integral equals 1.
Examples:
>>> from foodspec.preprocess import AreaNormalizer
>>> import numpy as np
>>> X = np.array([[1, 2, 3], [4, 5, 6]], dtype=float)
>>> normalizer = AreaNormalizer()
>>> X_norm = normalizer.fit_transform(X)
>>> np.allclose(np.trapezoid(X_norm[0]), 1.0, atol=0.1)
True
Smoothing¶
SavitzkyGolaySmoother¶
Savitzky-Golay filter for smoothing and derivatives.
Bases: BaseEstimator, TransformerMixin
Savitzky-Golay smoothing filter for spectra.
Fits local polynomial models to smooth spectra while preserving peak shapes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
window_length
|
int
|
Window size (must be odd and positive). |
7
|
polyorder
|
int
|
Polynomial order (must be less than |
3
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Examples:
>>> from foodspec.preprocess import SavitzkyGolaySmoother
>>> import numpy as np
>>> X = np.random.randn(10, 100)
>>> smoother = SavitzkyGolaySmoother(window_length=7, polyorder=3)
>>> X_smooth = smoother.fit_transform(X)
>>> X_smooth.shape == X.shape
True
MovingAverageSmoother¶
Simple moving average smoothing.
Bases: BaseEstimator, TransformerMixin
Simple moving average smoothing filter.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
window_size
|
int
|
Number of adjacent points to average. |
5
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Examples:
>>> from foodspec.preprocess import MovingAverageSmoother
>>> import numpy as np
>>> X = np.random.randn(5, 50)
>>> smoother = MovingAverageSmoother(window_size=5)
>>> X_smooth = smoother.fit_transform(X)
>>> X_smooth.shape == X.shape
True
Noise & Artifact Removal¶
correct_cosmic_rays¶
Remove cosmic ray spikes from Raman spectra.
Detect and correct cosmic ray spikes in spectra.
Uses robust z-score computed from local median and MAD (median absolute deviation) to detect spikes. Corrects by replacing spike values with local median.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
ndarray
|
Spectral data array (n_samples, n_wavenumbers). |
required |
window
|
int
|
Rolling window size for spike detection. |
5
|
zscore_thresh
|
float
|
Threshold for spike detection (default 8.0). |
8.0
|
Returns:
| Type | Description |
|---|---|
(ndarray, CosmicRayReport)
|
A tuple |
(ndarray, CosmicRayReport)
|
containing total spike count and per-spectrum spike counts. |
Examples:
>>> import numpy as np
>>> from foodspec.preprocess.spikes import correct_cosmic_rays
>>> X = np.ones((2, 50))
>>> X[0, 25] = 100 # simulate spike
>>> X_corr, report = correct_cosmic_rays(X, window=5, zscore_thresh=5.0)
>>> report.total_spikes > 0
True
CosmicRayRemover¶
Advanced cosmic ray removal for Raman spectroscopy.
Bases: BaseEstimator, TransformerMixin
Basic cosmic ray spike removal for Raman spectra.
Detects spikes as points exceeding the local median by sigma_thresh times
the local MAD (median absolute deviation) and replaces them by linear
interpolation of neighboring points.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
window
|
int
|
Window size for local statistics (default 5). |
5
|
sigma_thresh
|
float
|
Z-score threshold for spike detection (default 8.0). |
8.0
|
Examples:
>>> from foodspec.preprocess.raman import CosmicRayRemover
>>> import numpy as np
>>> X = np.ones((3, 50))
>>> X[0, 25] = 100 # spike
>>> remover = CosmicRayRemover(window=5, sigma_thresh=5.0)
>>> X_clean = remover.fit_transform(X)
>>> X_clean[0, 25] < 10
True
Spectral Cropping¶
RangeCropper¶
Crop spectra to wavenumber ranges.
Bases: BaseEstimator
Crop spectra to a specified wavenumber range.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
min_wn
|
float
|
Minimum wavenumber (inclusive). |
required |
max_wn
|
float
|
Maximum wavenumber (inclusive). |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Examples:
>>> from foodspec.preprocess import RangeCropper
>>> import numpy as np
>>> X = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
>>> wn = np.array([1000, 1100, 1200, 1300])
>>> cropper = RangeCropper(min_wn=1050, max_wn=1250)
>>> X_crop, wn_crop = cropper.transform(X, wn)
>>> wn_crop.tolist()
[1100, 1200]
See Also¶
- Preprocessing Methods - Detailed methodology
- Examples - Preprocessing workflows
- Core Module - Dataset structures