Common Problems & Solutions¶

Purpose: Systematically diagnose and fix issues across all FoodSpec workflow stages.
Audience: Users troubleshooting preprocessing, ML, stats, or reporting steps.
Time to read: 20–30 minutes (reference guide; read sections as needed).
Prerequisites: Basic knowledge of your FoodSpec workflow stage.

Quick Problem Index¶

Stage	Problem	Symptoms	Quick Fix
Acquisition	Baseline drift	Sloping/curved baseline	ALS baseline correction with lambda ~1e5
Acquisition	Saturation	Flat-topped peaks	Lower laser power; re-acquire
Acquisition	Wavenumber drift	Peak shifts vs reference	Recalibrate instrument; check `validate_spectrum_set`
Acquisition	Low SNR	Noisy spectra	Longer integration; better optics; smoothing
Metadata	Missing labels	Unknown class IDs	Use `check_missing_metadata`; repair metadata
Metadata	Class imbalance	Poor minority recall	Use F1/PR metrics; resample or weight classes
Metadata	Mislabeled samples	Outlier confusions	Audit via PCA; verify and relabel
Preprocessing	Over-smoothing	Peak loss	Reduce Savitzky–Golay window/order
Preprocessing	Poor baseline removal	Residual slope	Tune ALS lambda; try rubberband baseline
Preprocessing	Scatter not removed	Intensity drift persists	Apply SNV or MSC normalization
ML	Overfitting	High train, low test accuracy	Regularize; simplify; use stratified CV
ML	Data leakage	Unrealistic CV scores	Ensure preprocessing inside Pipeline
ML	Imbalanced predictions	Minority class ignored	Use class_weight or SMOTE; report F1_macro
DL	Diverging loss	NaNs during training	Lower learning rate; add normalization
Stats	Non-normal residuals	Failed assumptions	Use nonparametric tests (Kruskal–Wallis)
Stats	Multiple comparisons	Many marginal p-values	Apply FDR/Tukey correction
Visualization	Unlabeled axes	Ambiguous plots	Label wavenumber (cm⁻¹), intensity (a.u.)
Reporting	Missing configs	Cannot reproduce	Export run_metadata.json; save configs
Workflow	Wrong task→metrics	Irrelevant metrics	Consult workflow design guide; clarify goal

A. Instrument & Acquisition Problems¶

Baseline drift / fluorescence¶

Why: sample fluorescence, laser instability, optics heating
Symptoms: sloping/curved baseline; high low-frequency power
Diagnose: overlay raw spectra; run baseline check after ALS/rubberband; SNR via estimate_snr
Fix: apply baseline correction (ALS/rubberband); reduce laser power/integration time; instrument recalibration
Re-acquire: if baseline consumes dynamic range or varies wildly run-to-run

Saturation / clipping¶

Why: detector overload, too high laser power
Symptoms: flat-topped peaks, abrupt ceiling
Diagnose: inspect raw intensities; histogram of intensities
Fix: lower laser power, shorten integration time; re-acquire if clipping is present

Wavenumber misalignment¶

Why: calibration drift, temperature, instrument change
Symptoms: peak shifts vs references
Diagnose: compare known standards; cross-correlation of spectra
Fix: recalibrate instrument; apply alignment/cropping consistently; re-acquire if shift unstable

Low SNR¶

Why: weak scattering/absorption, poor focus, dirty optics
Symptoms: noisy spectra, unstable ratios
Diagnose: estimate_snr; high-frequency noise; low reproducibility across replicates
Fix: longer integration, more accumulations, better sample prep/optics cleaning; smoothing; re-acquire if SNR too low

B. Dataset & Metadata Problems¶

Missing or inconsistent metadata¶

Why: incomplete logs, manual entry errors
Symptoms: unknown labels, mismatched sample IDs
Diagnose: check_missing_metadata; cross-check unique counts; joins fail
Fix: repair metadata files; enforce required columns; re-export if gaps persist

Class imbalance¶

Why: rare adulteration/spoilage cases
Symptoms: high accuracy, poor minority recall
Diagnose: summarize_class_balance; confusion matrix asymmetry; PR curves
Fix: resampling/weights, use F1/PR metrics; collect more minority samples

Mislabeled samples¶

Why: data entry or sample mix-up
Symptoms: persistent outliers, impossible confusion errors
Diagnose: PCA score outliers; detect_outliers; high leverage points
Fix: audit sample IDs; remove/relabel after verification; re-acquire if uncertain

C. Preprocessing & Chemometric Problems¶

Over-smoothing / under-smoothing¶

Symptoms: peak loss or excessive noise
Diagnose: compare raw vs smoothed overlays; SNR changes
Fix: adjust Savitzky–Golay window/order; avoid smoothing if not needed

Baseline not removed / over-corrected¶

Symptoms: residual slope or negative artifacts
Diagnose: inspect corrected spectra; mean spectrum drift
Fix: tune ALS lambda/p; try rubberband/polynomial; ensure crop before ratios

Scatter/normalization issues¶

Symptoms: intensity scaling differences remain
Diagnose: norms variance across samples; check after SNV/MSC/vector norms
Fix: use SNV/MSC; ensure consistent application within pipelines (no leakage)

Peak picking / ratios unstable¶

Symptoms: large variance in peak height/area; missing peaks
Diagnose: visualize peak windows; check wavenumber alignment; inspect window tolerance
Fix: adjust expected peaks/tolerance; ensure ascending wavenumbers; consider smoothing/cropping first

D. Machine Learning Problems¶

Overfitting¶

Symptoms: high train accuracy, low test/CV accuracy
Diagnose: CV metrics vs train; learning curves
Fix: simplify model, regularize, more data, better preprocessing; ensure stratified CV; use compute_classification_metrics

Data leakage¶

Symptoms: unrealistically high CV scores
Diagnose: verify preprocessing inside Pipeline; splits done after pipeline definition; no label leakage
Fix: wrap preprocessing+model in a single pipeline; redo splits; re-evaluate

Imbalanced performance¶

Symptoms: minority class misclassified
Diagnose: confusion matrix by class; PR curves; class balance summary
Fix: class weights, resampling, threshold tuning; report F1_macro, balanced accuracy

E. Deep Learning Problems¶

Unstable training / divergence¶

Symptoms: loss oscillations, NaNs
Diagnose: monitor loss/metrics per epoch; check learning rate/batch size
Fix: lower learning rate, use normalization, add early stopping/dropout; ensure sufficient data

Overfitting with small data¶

Symptoms: train ≫ val performance
Diagnose: validation curves; high variance metrics
Fix: regularize, data augmentation (if appropriate), prefer classical models

F. Statistical Problems¶

Violating test assumptions¶

Symptoms: non-normal residuals, heteroscedasticity
Diagnose: residual plots, normality tests, Levene's test
Fix: transform data, use nonparametric tests (run_kruskal_wallis, run_mannwhitney_u); report effect sizes

Multiple comparisons without correction¶

Symptoms: many marginal p-values
Diagnose: count of tests; inconsistent significance
Fix: use Tukey/FDR; emphasize effect sizes; consolidate hypotheses

G. Visualization Problems¶

Misleading scales / unlabeled axes¶

Symptoms: hard-to-read plots; ambiguous units
Diagnose: review plots; check legends/units
Fix: label wavenumber (cm⁻¹), intensity (a.u.), class labels, sample counts; use consistent ranges

Overplotting / clutter¶

Symptoms: unreadable overlays with many samples
Diagnose: high-density overlays
Fix: show mean ± CI, subset samples, use transparency

H. Reporting & Reproducibility Problems¶

Missing pipeline/config trace¶

Symptoms: cannot reproduce metrics or plots later
Diagnose: absent configs, missing run_metadata.json
Fix: use export_run_metadata; record preprocessing, models, metrics, versions

Ambiguous metrics¶

Symptoms: headline accuracy without class counts or CI
Diagnose: incomplete reporting
Fix: include per-class metrics, supports, CIs/bootstraps; link to metrics documentation

I. Workflow Design Problems¶

Unclear question → wrong pipeline¶

Symptoms: metrics irrelevant to decision (e.g., accuracy on rare event)
Diagnose: revisit scientific goal; map task to metrics/models
Fix: consult Workflow Design Guide; pick appropriate metrics/models

Insufficient replicates / imbalance¶

Symptoms: unstable metrics across splits
Diagnose: high variance CV; summarize_class_balance
Fix: collect more data; use robust CV; consider effect sizes and uncertainty reporting

J. Operational / User Errors¶

Wrong file format / path¶

Symptoms: loader failures
Diagnose: check detect_format, file extensions; consult instrument file formats guide
Fix: convert to supported formats (CSV, JCAMP, SPC/OPUS with extras)

Mismatched wavenumber ordering¶

Symptoms: shape errors, misaligned peaks
Diagnose: ensure ascending wavenumbers; validate with validate_spectrum_set
Fix: sort wavenumbers; re-export if needed

FoodSpec Utilities for Diagnosis¶

estimate_snr(spectrum): rough SNR estimate
summarize_class_balance(labels): counts per class
detect_outliers(X, method="pca_distance"): simple outlier flagging
check_missing_metadata(df, required_cols): ensure metadata completeness

When to Re-acquire Data¶

Severe saturation/clipping; unstable baselines consuming dynamic range
Wavenumber calibration drift not correctable in software
Extremely low SNR that preprocessing cannot salvage
Persistent metadata mislabeling that cannot be resolved

FAQ – Frequently asked questions
Troubleshooting Guide – Step-by-step debugging for errors
Reporting & Reproducibility – Document results and validation
How to Cite – Citation instructions for FoodSpec