Decision Guide: Choosing the Right Approach¶
Purpose: Navigate FoodSpec's methods, workflows, and APIs based on your research goals.
This guide helps you choose the appropriate analysis path by asking: "What am I trying to do?" Each decision leads to specific methods, working examples, and API references.
๐ฏ Quick Decision Tree¶
flowchart TD
Start[What is your goal?] --> Goal{Goal Type}
Goal -->|Identify/Classify| Class[Classification]
Goal -->|Quantify| Quant[Quantification]
Goal -->|Monitor Change| Monitor[Temporal Analysis]
Goal -->|Compare Instruments| Harm[Harmonization]
Goal -->|Clean Data| Prep[Preprocessing]
Class --> ClassType{Known Classes?}
ClassType -->|Yes, 2+ groups| Auth[Authentication/Discrimination]
ClassType -->|Unknown patterns| Explore[Exploratory Analysis]
Quant --> QuantType{What to measure?}
QuantType -->|Component %| Mix[Mixture Analysis]
QuantType -->|Continuous property| Reg[Regression]
Monitor --> MonType{What changes?}
MonType -->|Quality degradation| Heat[Heating/Aging]
MonType -->|Batch consistency| QC[Quality Control]
Harm --> HarmType{Data Type}
HarmType -->|Different instruments| Calib[Calibration Transfer]
HarmType -->|Different matrices| Matrix[Matrix Correction]
๐ Goal-Based Navigation¶
1. Classification & Discrimination¶
1.1 Authenticate or Detect Adulteration¶
When: You need to distinguish genuine samples from adulterants or identify product origin.
Decision factors: - Small dataset (<100 samples): Use PLS-DA with cross-validation - Large dataset (>1000 samples): Consider deep learning or ensemble methods - Interpretability required: Use ratio-based features or VIP scores - Black-box acceptable: Neural networks or random forests
Matrix considerations: - Pure oils: Standard preprocessing โ classification - Complex matrices (chips, meat): Add scatter correction + MSC normalization
โ Method: Classification & Regression
โ Example: Oil Authentication
โ API: ML & Validation
Typical workflow:
from foodspec import FoodSpec
# Load and preprocess
fs = FoodSpec.from_csv("oils.csv", modality="raman")
fs = fs.baseline_als().normalize_snv()
# Classify
result = fs.classify(
label_column="oil_type",
model="pls-da",
cv_folds=5
)
print(f"Accuracy: {result.accuracy:.1%}")
1.2 Exploratory Analysis (Unknown Groupings)¶
When: You suspect patterns but don't have labels, or want to discover subgroups.
Decision factors: - Dimensionality reduction first: Always start with PCA - Cluster hypothesis testing: Use PERMANOVA or ANOSIM - Outlier detection: Check before clustering
โ Method: PCA & Dimensionality Reduction
โ Example: Exploratory PCA in Examples
โ API: Chemometrics
Typical workflow:
from foodspec.chemometrics import run_pca
# Run PCA
pca_result = run_pca(
X=fs.x,
n_components=5,
scale=True
)
# Visualize
pca_result.plot_scores(
labels=fs.metadata["batch"],
title="Batch Clustering"
)
2. Quantification¶
2.1 Mixture Analysis (Component Percentages)¶
When: Estimate % composition of known components in mixtures.
Decision factors: - Known pure references available: Use MCR-ALS or NNLS - No pure references: Use PLS regression with calibration set - 2-3 components: Direct peak ratios may suffice - 4+ components: Multivariate methods required
โ Method: Mixture Models
โ Example: Mixture Analysis
โ API: Chemometrics - Mixture Analysis
Typical workflow:
from foodspec.chemometrics import mcr_als
# MCR-ALS for 3-component mixture
result = mcr_als(
X=mixture_spectra,
n_components=3,
initial_guess=pure_spectra,
max_iter=100
)
# Get concentrations
concentrations = result.C # Sample ร component
2.2 Regression (Continuous Properties)¶
When: Predict continuous values (moisture %, protein content, peroxide value).
Decision factors: - Linear relationship expected: PLS regression - Nonlinear relationships: Random forest, neural networks - Small calibration set (<50): PLS with careful validation - Large calibration set (>200): More complex models feasible
โ Method: Classification & Regression
โ Example: Calibration Example
โ API: ML & Validation
Typical workflow:
from foodspec.ml import nested_cross_validate
# PLS regression with nested CV
results = nested_cross_validate(
X=fs.x,
y=fs.metadata["moisture_percent"],
model="pls",
cv_folds=5,
n_components_range=[1, 2, 3, 5, 10]
)
print(f"Rยฒ = {results['r2']:.3f}, RMSE = {results['rmse']:.2f}")
3. Temporal Analysis & Monitoring¶
3.1 Heating & Degradation Monitoring¶
When: Track quality changes over time (oxidation, thermal degradation, shelf life).
Decision factors: - Known degradation markers: Track specific peak ratios over time - Unknown mechanisms: Use multivariate time-series analysis - Predict shelf life: Fit degradation models to ratio trajectories
โ Method: Statistical Analysis
โ Example: Heating Quality Monitoring
โ API: Workflows
Typical workflow:
from foodspec.workflows import analyze_heating_trajectory
# Analyze time series
result = analyze_heating_trajectory(
spectra=fs,
time_column="heating_time_min",
ratio_numerator=1655, # C=C unsaturation
ratio_denominator=1440 # CH2 reference
)
# Get shelf life estimate
shelf_life = result.estimate_shelf_life(threshold=0.8)
print(f"Estimated shelf life: {shelf_life} hours")
3.2 Batch Quality Control¶
When: Monitor production batches for consistency and drift detection.
Decision factors: - Continuous monitoring: Control charts with Hotelling's Tยฒ - Batch comparison: ANOVA or Kruskal-Wallis tests - Outlier detection: Mahalanobis distance or PCA residuals - Small batch sizes (<10): Use robust statistics
โ Method: Statistical Study Design
โ Example: Batch QC Workflow
โ API: Statistics
Typical workflow:
from foodspec.qc import check_class_balance, detect_outliers
# Check batch consistency
balance = check_class_balance(fs.metadata, "batch_id")
outliers = detect_outliers(
fs.x,
method="mahalanobis",
threshold=3.0
)
# Statistical comparison
from foodspec.stats import run_anova
anova_result = run_anova(
fs.x[:, peak_idx], # Specific peak
groups=fs.metadata["batch_id"]
)
4. Harmonization & Instrument Comparability¶
4.1 Different Instruments (Same Sample Type)¶
When: Combine data from multiple Raman or FTIR instruments measuring the same samples.
Decision factors: - Standards available: Piecewise Direct Standardization (PDS) - No standards, overlapping samples: Direct Standardization (DS) - Completely different wavelength ranges: May not be harmonizable
โ Method: Harmonization Theory
โ Example: Multi-Instrument Workflow
โ API: Calibration Transfer
Typical workflow:
from foodspec.calibration_transfer import piecewise_direct_standardization
# Transfer from instrument A to B
transfer = piecewise_direct_standardization(
X_source=spectra_instrument_A,
X_target=spectra_instrument_B,
window_size=11
)
# Apply to new measurements
X_harmonized = transfer.transform(X_new_from_A)
4.2 Different Matrices (Same Measurement Goal)¶
When: Compare oils in pure form vs. oils extracted from fried chips, or milk vs. cheese.
Decision factors: - Known matrix effects: Apply matrix-specific corrections first - Unknown effects: Domain adaptation or transfer learning - Small target matrix data: Use source matrix model with caution
โ Method: Matrix Effects
โ Example: Matrix Correction
โ API: Workflows - Matrix Correction
Typical workflow:
from foodspec.matrix_correction import apply_matrix_correction
# Correct for matrix effects
corrected = apply_matrix_correction(
X_target=chips_spectra,
X_reference=oil_spectra,
method="msc"
)
5. Preprocessing & Data Cleaning¶
5.1 Which Preprocessing Steps Do I Need?¶
Decision factors by symptom:
| Symptom | Solution | Method | API |
|---|---|---|---|
| Curved baselines, fluorescence | Baseline correction | Baseline Correction | baseline_als |
| Different intensities, scaling issues | Normalization | Normalization | normalize_snv |
| Noisy spectra, hard to see peaks | Smoothing | Smoothing | savgol_smooth |
| Cosmic ray spikes (Raman) | Spike removal | Cosmic Rays | CosmicRayRemover |
| Overlapping peaks | Derivatives (1st/2nd) | Derivatives | savgol_smooth |
| Scatter effects, particle size | MSC/SNV | Scatter Correction | MSCNormalizer |
Recommended preprocessing order: 1. Cosmic ray removal (if Raman) 2. Baseline correction (if curved backgrounds) 3. Smoothing (if noisy) 4. Normalization (SNV or MSC) 5. Derivatives (optional, for overlapping peaks) 6. Feature extraction or full-spectrum modeling
โ Full Guide: Preprocessing Methods Overview
๐ Dataset Size & Complexity Guide¶
Small Datasets (<100 samples)¶
Challenges: Limited statistical power, risk of overfitting.
Recommended approaches: - Preprocessing: Conservative (avoid over-smoothing) - Feature selection: Use a priori knowledge (literature-based peaks) - Validation: Leave-one-out CV or stratified 5-fold CV - Models: Simple models (PLS-DA, linear regression) - Avoid: Deep learning, complex ensemble methods
โ Guide: Study Design - Sample Size
Medium Datasets (100-1000 samples)¶
Opportunities: Moderate statistical power, can test multiple methods.
Recommended approaches: - Preprocessing: Standard pipelines - Feature selection: Data-driven + domain knowledge hybrid - Validation: Nested cross-validation with holdout test set - Models: PLS, random forests, gradient boosting - Hyperparameter tuning: Grid search feasible
โ Guide: Cross-Validation Best Practices
Large Datasets (>1000 samples)¶
Opportunities: High statistical power, can use complex models.
Recommended approaches: - Preprocessing: Automated pipelines acceptable - Feature selection: Automatic feature importance ranking - Validation: Train/validation/test splits - Models: Neural networks, deep learning, ensembles - Advanced techniques: Transfer learning, multi-task learning
โ Guide: Advanced Deep Learning
๐งช Sample Matrix Guide¶
Pure Liquids (Oils, Solvents)¶
Characteristics: Minimal scatter, good optical contact.
Preprocessing: - Baseline: Mild (ALS with conservative parameters) - Normalization: Vector or area normalization - Scatter correction: Usually not needed
โ Example: Oil Authentication
Powders & Solids (Flour, Spices)¶
Characteristics: High scatter from particle size variations.
Preprocessing: - Baseline: Aggressive (ALS or rubberband) - Normalization: SNV or MSC (critical) - Scatter correction: Essential
โ Method: Scatter Correction
Emulsions & Suspensions (Milk, Juices)¶
Characteristics: Complex scatter, heterogeneous.
Preprocessing: - Baseline: Moderate - Normalization: MSC with robust mean - Homogenization: May need sample prep guidance
Tissue & Meat Products¶
Characteristics: Variable water content, complex matrix.
Preprocessing: - Baseline: Essential - Normalization: SNV recommended - Water bands: May need masking (1640 cmโปยน, 3200-3600 cmโปยน)
๐ Cross-Reference Table¶
| Goal | Method Page | Example | API |
|---|---|---|---|
| Oil authentication | Classification | Oil Example | ML API |
| Heating monitoring | Statistics | Heating Example | Workflows API |
| Mixture quantification | Mixture Models | Mixture Example | Chemometrics API |
| Hyperspectral mapping | Spatial Analysis | HSI Example | Datasets API |
| Baseline correction | Baseline Methods | Recipe Card #2 | Preprocessing API |
| PCA exploration | PCA Guide | PCA Examples | Chemometrics API |
| Batch QC | Study Design | QC Workflow | Statistics API |
| Multi-instrument | Harmonization | Harmonization Workflow | Workflows API |
๐งญ Still Not Sure?¶
If you're uncertain which approach to use:
- Start simple: Run PCA on preprocessed data to visualize structure
- Check assumptions: Read Study Design for sample size guidance
- Try examples: Run closest teaching example with your data
- Ask for help: See FAQ or community discussions
Common pitfalls to avoid: - โ Applying complex models to small datasets - โ Skipping preprocessing for raw spectra - โ Not validating results properly (train/test leakage) - โ Ignoring matrix effects in heterogeneous samples
Best practices: - โ Start with visualization (PCA, score plots) - โ Use domain knowledge for feature selection - โ Validate rigorously (nested CV or holdout test) - โ Document preprocessing decisions - โ Report uncertainty (confidence intervals, p-values)
๐ Further Reading¶
- For method details: Methods Overview
- For worked examples: Examples Gallery
- For API documentation: API Reference
- For theory: Spectroscopy Basics
- For troubleshooting: Common Problems