Skip to content

Baseline Correction

Purpose: Remove slowly varying background trends that bias peak-based features and ratios.

When to use: Baseline drift from fluorescence (Raman), ATR contact (FTIR), scattering, or instrument response.

Outcome: Clean spectra with peaks above zero baseline, suitable for feature extraction and modeling.


Why Baseline Drift Occurs

Baselines drift because of:

  • Fluorescence (Raman): Broad background from sample excitation that can dwarf weak Raman peaks
  • Scattering/contact (FTIR ATR): Sloping backgrounds from poor crystal contact or refractive-index mismatches
  • Instrument response: Slowly varying offsets from detector aging or optical path changes
  • Sample matrix: Particulates, emulsions, or path-length variations introduce curvature

For notation and symbols used below, see the Glossary.

When (Not) to Correct

  • Correct if: Baseline dominates dynamic range; ratios/areas are biased; spectra share similar baseline shape.
  • Caution if: Peaks are broad and might be mistaken for baseline; very low SNR; automated correction can remove real signal.
  • Visual check: Always inspect before/after; consider preserving a copy of raw data.

Methods in FoodSpec

Asymmetric Least Squares (ALS)

Concept: Fit a smooth baseline minimizing weighted residuals with asymmetric weights that downweight peaks.

Parameters: - lambda_: Smoothness (higher = smoother baseline; typical: 1e4โ€“1e6) - p: Asymmetry (0.001โ€“0.1; lower = more aggressive peak suppression) - max_iter: Iterations (10โ€“100)

When to use: Moderate to strong baseline curvature; mixed peaks/baseline; most common choice.

Pitfalls: Over-smoothing if lambda_ too large; peak clipping if p too small.

Rubberband (Convex Hull)

Concept: Compute lower convex hull of the spectrum and interpolate baseline.

When to use: Spectra with clear gaps between peaks and background; quick, parameter-free.

Pitfalls: Fails if peaks are dense or baseline above hull; sensitive to noise spikes.

Polynomial Baseline

Concept: Fit low-degree polynomial (1โ€“4) to spectrum or background-only regions.

When to use: Mild curvature; simple backgrounds; fast computation.

Pitfalls: Overfitting with high polynomial orders; unsuitable for complex fluorescence.

4. Practical guidance

  • Order in pipeline: Baseline โ†’ smoothing/normalization โ†’ features. Avoid applying after aggressive normalization.
  • Quality checks: Plot before/after; monitor peak heights/areas; compare across replicates.
  • Parameter tuning: Start with moderate lambda_ (e.g., 1e5) and p (~0.001โ€“0.01) for ALS; adjust visually.
  • If over/under-correction persists, see Common problems & solutions.

5. Example (high level)

from foodspec.preprocess.baseline import ALSBaseline

als = ALSBaseline(lambda_=1e5, p=0.001, max_iter=10)
X_corr = als.transform(X_raw)

6. Visuals to include

  • Baseline before/after (synthetic): One spectrum with quadratic drift + noise. Show raw (with drift), true baseline, and corrected signal (e.g., subtract known baseline). Axes: wavenumber (cmโปยน) vs intensity. Purpose: illustrate what a successful correction looks like. See docs/examples/visualization/generate_baseline_before_after.py.

Baseline before/after

  • Drift/noise illustration: Overlay ideal spectrum, noisy spectrum, and drifted spectrum to show why correction is needed. Axes: wavenumber vs intensity. Generated via docs/examples/stats/generate_spectral_artifacts_figures.py.

Illustrative spectra with baseline drift and noise

Reproducible figure generation

  • Run python docs/examples/visualization/generate_baseline_before_after.py for baseline before/after.
  • Run python docs/examples/stats/generate_spectral_artifacts_figures.py for drift/noise illustration.

Summary

  • Baseline correction removes broad backgrounds that bias peak-based features.
  • ALS is a flexible default; rubberband is simple; polynomial fits suit mild curvature.
  • Always inspect corrections; avoid removing true broad features.

When to Use

Use baseline correction when:

  • Fluorescence interference: Raman spectra show broad fluorescent backgrounds obscuring weak peaks
  • Sloping baselines: FTIR-ATR spectra have tilted baselines from poor crystal contact
  • Ratio calculations: Peak ratios are biased by uneven backgrounds
  • Comparative analysis: Baseline drift varies between samples, confounding comparisons
  • PCA/classification: Baseline variability dominates spectral differences and masks chemical variation

When NOT to Use (Common Failure Modes)

Avoid baseline correction or use with extreme caution when:

  • Broad informative features: Sample contains wide absorption bands that could be mistaken for baseline (e.g., water bands, amorphous regions)
  • Very low SNR: Noise-dominated spectra where baseline fitting becomes unstable
  • Dense peak patterns: Closely spaced peaks with no clear baseline regions (rubberband will fail)
  • Already normalized data: Applying baseline correction after standard normal variate (SNV) or other normalizations can remove meaningful variation
  • Negative intensities expected: Some correction methods (aggressive ALS) can produce negative values, invalidating downstream ratio calculations

For Raman spectroscopy (fluorescence backgrounds):

from foodspec.preprocessing import baseline_als
X_corrected = baseline_als(X, lambda_=1e5, p=0.01, max_iter=10)
- lambda_=1e5: Moderate smoothness, handles typical curvature - p=0.01: Conservative asymmetry, preserves real peaks - max_iter=10: Usually sufficient for convergence

For FTIR-ATR (mild slopes):

from foodspec.preprocessing import baseline_polynomial
X_corrected = baseline_polynomial(X, degree=2)
- degree=2: Quadratic baseline for gentle curvature

For quick exploratory analysis:

from foodspec.preprocessing import baseline_rubberband
X_corrected = baseline_rubberband(X)
- No parameters needed; fast but less robust

See Also

API Reference:

Related Methods:

Examples:


When Results Cannot Be Trusted

โš ๏ธ Red flags for baseline correction validity:

  1. Baseline correction removes true spectral features (ALS with low p removes real broad bands)
  2. Aggressive baseline subtraction can eliminate informative signal
  3. Ratio calculations biased by removed features
  4. Fix: Inspect baseline-corrected spectra visually; check that broad true features (e.g., water, baseline) not removed; use moderate ALS parameters (p = 0.01โ€“0.1)

  5. Baseline corrected spectra negative in valid signal regions (spectra have negative intensities post-ALS)

  6. Indicates over-correction; true signal removed
  7. Invalid for intensity-based ratios or downstream modeling
  8. Fix: Reduce ALS aggressiveness (increase p); use gentler smoothing; validate baseline visually

  9. Different baseline methods applied to different samples (some samples ALS, others polynomial)

  10. Inconsistent preprocessing produces incomparable spectra
  11. Ratios and models fail across samples
  12. Fix: Freeze preprocessing parameters before analysis; apply identical method to all samples

  13. Baseline correction applied to over-smoothed spectra (smoothing removes peaks, then baseline correction applied)

  14. Preprocessing order matters; over-smoothing hides information that baseline tries to fit
  15. Compounded information loss
  16. Fix: Apply baseline correction before smoothing; use gentle smoothing after baseline

  17. ALS parameters (lambda, p) chosen by eye ("looks good") without validation

  18. Data-dependent parameter choice overfits; new data may need different parameters
  19. Reproducibility and generalization compromised
  20. Fix: Pre-specify parameters based on spectral characteristics or previous studies; validate on test samples

  21. Baseline features confused with true sample features (matrix baseline treated as signal)

  22. Removing baseline removes chemistry-independent variation but not true matrix peaks
  23. Incomplete baseline removal leaves matrix effects
  24. Fix: Distinguish baseline (smooth, slow trend) from true peaks (sharp, chemical meaning); validate with reference materials

  25. No baseline correction applied to noisy spectra (baseline noise directly affects ratios)

  26. Uneven baseline amplifies noise; ratio variability increased
  27. Statistics and reproducibility suffer
  28. Fix: Always apply baseline correction; use appropriate method (ALS or rolling ball) for spectral type

  29. Spectral regions with true curvature treated as baseline (curvature from sample composition corrected away)

  30. Example: long-wavelength drift in FTIR from sample absorption treated as baseline
  31. True information removed
  32. Fix: Understand physical baseline sources; preserve physically meaningful curvature; inspect corrected spectra

Further reading