Skip to content

Preprocessing: Scatter Correction and Cosmic Ray Removal

Scatter and spike artifacts can mask true chemical signals. This chapter explains scatter-aware corrections (ATR/atmospheric) and spike removal for Raman.

1. Scatter in FTIR/Raman

  • ATR-FTIR: Variable contact and refractive-index mismatch produce sloping baselines and intensity changes.
  • Atmospheric effects: Water/CO₂ bands superimpose on spectra.
  • Raman: Spike-like cosmic rays from high-energy particles.

2. Corrections in FoodSpec (how it works)

Atmospheric correction (FTIR)

  • Concept: Fit/subtract water/CO₂ basis functions; scaled templates are removed from spectra.
  • Use when: Working in open air or with noticeable water/CO₂ bands.
  • Pitfalls: Over-subtraction can distort nearby peaks; validate visually.

Simple ATR correction

  • Concept: Heuristic scaling for effective path-length changes with wavelength and incidence angle.
  • Use when: ATR contact is inconsistent; mild correction is sufficient.
  • Pitfalls: Approximate; not a replacement for rigorous optical modeling.

Scatter-aware normalization (SNV/MSC)

Cosmic ray removal (Raman)

  • Concept: Detect spikes far above local median/derivative thresholds; replace by local interpolation.
  • Use when: Narrow, isolated spikes appear in Raman spectra.
  • Pitfalls: Avoid mistaking narrow real peaks for spikes; tune thresholds conservatively.

3. When to use / not to use

  • Use atmospheric/ATR correction for FTIR when environmental or contact effects are visible.
  • Use cosmic-ray removal for spike artifacts; skip if spectra are already spike-free.
  • Combine with baseline correction and normalization, but inspect results.

4. Example (high level)

from foodspec.preprocess.ftir import AtmosphericCorrector, SimpleATRCorrector
from foodspec.preprocess.raman import CosmicRayRemover

atm = AtmosphericCorrector()
atr = SimpleATRCorrector()
cr = CosmicRayRemover()

X_ft = atm.transform(X_ft)
X_ft = atr.transform(X_ft, wavenumbers=wn)
X_ra = cr.transform(X_ra)

5. Visuals to include

  • FTIR atmospheric correction: Single FTIR spectrum before/after water/CO₂ subtraction; annotate removed bands. Axes: wavenumber vs intensity. Use AtmosphericCorrector + plot_spectra.
  • Raman cosmic ray removal: Raw Raman spectrum with a spike and cleaned version (spike replaced); mark the removed spike. Axes: wavenumber vs intensity. Use CosmicRayRemover + plot_spectra.

Reproducible figure generation

  • Use a helper such as docs/examples/visualization/generate_scatter_cosmic_figures.py to create figures for this chapter:
  • Build a synthetic Raman spectrum with one or two sharp spikes; apply CosmicRayRemover and overlay before/after (save to docs/assets/cosmic_ray_cleanup.png).
  • Take an FTIR spectrum with broad water/CO₂ bands (example oils FTIR or synthetic); apply AtmosphericCorrector (and optional SimpleATRCorrector) and overlay before/after with annotations (save to docs/assets/ftir_atmospheric_correction.png).
  • Optionally show a short PCA scatter of raw vs corrected FTIR spectra to illustrate reduced variance from atmospheric artefacts.

Summary

  • Scatter and atmospheric effects distort baselines/intensities; use SNV/MSC plus targeted corrections.
  • Cosmic ray spikes in Raman should be removed to avoid biasing normalization/peaks.
  • Always validate corrections visually.

When Results Cannot Be Trusted

⚠️ Red flags for scatter correction and cosmic ray removal:

  1. Cosmic rays not removed from spectra (single-pixel spikes affect normalization and averaging)
  2. Cosmic ray (high intensity noise) inflates normalization factors
  3. Averaged spectra have artifacts
  4. Fix: Detect cosmic rays (statistical outliers per wavelength); interpolate or remove; validate visually

  5. Scatter correction method not validated on reference (using MSC without checking it reduces scatter)

  6. Some scatter-correction methods ineffective for specific scatter types
  7. Scatter may remain post-correction
  8. Fix: Apply reference-free QC (compare spectra before/after); validate on samples with known scatter levels

  9. Atmospheric water/CO₂ lines not masked in FTIR (strong H₂O/CO₂ peaks affect ratios)

  10. Water/CO₂ lines dominate certain regions; ratios computed in these regions are meaningless
  11. Information loss
  12. Fix: Mask atmospheric bands before feature extraction; list masked regions; validate peaks not in masked regions

  13. Multiplicative scatter correction (MSC) applied assuming linear reference-sample relationship (breaks if sample composition fundamentally different)

  14. MSC assumes sample and reference vary only in scale/offset
  15. Non-linear scatter or complex matrix breaks assumption
  16. Fix: Validate MSC on test samples; check residuals post-MSC; consider Extended MSC or nonlinear alternatives

  17. Spikes/cosmic rays interpolated without validation (interpolated values treated as real data)

  18. Interpolation introduces artificial spectra
  19. May create false peaks or alter ratios
  20. Fix: Mark interpolated regions; avoid extracting features from interpolated wavelengths; use robust statistics

  21. Scatter not uniform across samples (one sample highly scattering, another clear, corrected with single MSC)

  22. Sample-specific scatter variation not captured by single MSC
  23. Residual scatter remains
  24. Fix: Apply sample-specific scatter correction; check scatter by visual inspection; group similar scatter types

  25. Removal of high-frequency noise (cosmic rays) also removes true high-frequency information

  26. Aggressive spike removal flattens sharp true peaks
  27. Information loss for peak-based features
  28. Fix: Use conservative spike detection (>5 SD from local mean); visualize removed vs. original; preserve sharp features

  29. No validation of scatter correction on known samples (assuming correction works without testing)

  30. Different sample types may respond differently to scatter correction
  31. Overcorrection or undercorrection undetected
  32. Fix: Test scatter correction on reference materials; compare corrected/uncorrected downstream metrics

When to Use

Use scatter correction when:

  • Variable ATR contact: FTIR-ATR spectra show baseline slopes varying between samples
  • Particle size effects: Raman spectra from powdered samples with different scattering intensities
  • Instrument variations: Comparing spectra from different instruments or measurement conditions
  • Path length differences: Transmission FTIR with inconsistent sample thickness
  • Atmospheric interference: FTIR spectra contaminated by water vapor or CO₂ absorption bands

Use cosmic ray removal when:

  • Raman spectroscopy: CCD detectors susceptible to cosmic ray strikes during acquisition
  • Isolated spikes visible: Sharp, narrow intensity spikes not present in replicate measurements
  • Long integration times: Extended exposures increase cosmic ray probability
  • Single-acquisition spectra: No averaging to suppress random spikes

When NOT to Use (Common Failure Modes)

Avoid scatter correction when:

  • Absolute intensities critical: Quantitative analysis requiring intensity calibration
  • Non-linear scatter effects: Complex matrices where linear MSC assumptions fail
  • Homogeneous samples: Well-controlled measurements with minimal scatter variation
  • After concentration normalization: Internal standard correction already accounts for path length
  • Very different sample types: MSC reference spectrum not representative of test samples

Avoid cosmic ray removal when:

  • True narrow peaks present: Sharp Raman bands (e.g., 520 cm⁻¹ Si) could be mistaken for spikes
  • Already averaged spectra: Multiple acquisitions averaged already suppress cosmic rays
  • Low threshold risk: Aggressive spike detection might remove real spectral features
  • FTIR spectroscopy: Cosmic rays not relevant; different artifacts require different methods

For MSC (multiplicative scatter correction):

from foodspec.preprocessing import MSCNormalizer
msc = MSCNormalizer()
msc.fit(X_train)  # Compute reference from training data
X_corrected = msc.transform(X_test)
- Computes mean reference spectrum from training set - Corrects linear scatter and offset effects

For atmospheric correction (FTIR):

from foodspec.preprocessing.ftir import AtmosphericCorrector
atm = AtmosphericCorrector()
X_corrected = atm.transform(X_ftir)
- Removes water vapor and CO₂ absorption bands - Use when spectra show characteristic atmospheric peaks

For cosmic ray removal (Raman):

from foodspec.preprocessing.raman import CosmicRayRemover
cr = CosmicRayRemover(threshold=5.0, window=5)
X_cleaned = cr.transform(X_raman)
- threshold=5.0: Detect spikes >5 SD above local median (conservative) - window=5: Local window for spike detection (preserves narrow peaks)

For simple ATR correction:

from foodspec.preprocessing.ftir import SimpleATRCorrector
atr = SimpleATRCorrector(angle=45)
X_corrected = atr.transform(X_ftir, wavenumbers=wn)
- angle=45: Standard ATR crystal angle - Applies wavelength-dependent correction

See Also

API Reference:

Related Methods:

Examples:

Further reading