Decision Guide: Choosing the Right Approach¶

Purpose: Navigate FoodSpec's methods, workflows, and APIs based on your research goals.

This guide helps you choose the appropriate analysis path by asking: "What am I trying to do?" Each decision leads to specific methods, working examples, and API references.

🎯 Quick Decision Tree¶

flowchart TD
    Start[What is your goal?] --> Goal{Goal Type}

    Goal -->|Identify/Classify| Class[Classification]
    Goal -->|Quantify| Quant[Quantification]
    Goal -->|Monitor Change| Monitor[Temporal Analysis]
    Goal -->|Compare Instruments| Harm[Harmonization]
    Goal -->|Clean Data| Prep[Preprocessing]

    Class --> ClassType{Known Classes?}
    ClassType -->|Yes, 2+ groups| Auth[Authentication/Discrimination]
    ClassType -->|Unknown patterns| Explore[Exploratory Analysis]

    Quant --> QuantType{What to measure?}
    QuantType -->|Component %| Mix[Mixture Analysis]
    QuantType -->|Continuous property| Reg[Regression]

    Monitor --> MonType{What changes?}
    MonType -->|Quality degradation| Heat[Heating/Aging]
    MonType -->|Batch consistency| QC[Quality Control]

    Harm --> HarmType{Data Type}
    HarmType -->|Different instruments| Calib[Calibration Transfer]
    HarmType -->|Different matrices| Matrix[Matrix Correction]

1. Classification & Discrimination¶

1.1 Authenticate or Detect Adulteration¶

When: You need to distinguish genuine samples from adulterants or identify product origin.

Decision factors: - Small dataset (<100 samples): Use PLS-DA with cross-validation - Large dataset (>1000 samples): Consider deep learning or ensemble methods - Interpretability required: Use ratio-based features or VIP scores - Black-box acceptable: Neural networks or random forests

Matrix considerations: - Pure oils: Standard preprocessing → classification - Complex matrices (chips, meat): Add scatter correction + MSC normalization

→ Method: Classification & Regression
→ Example: Oil Authentication
→ API: ML & Validation

Typical workflow:

from foodspec import FoodSpec

# Load and preprocess
fs = FoodSpec.from_csv("oils.csv", modality="raman")
fs = fs.baseline_als().normalize_snv()

# Classify
result = fs.classify(
    label_column="oil_type",
    model="pls-da",
    cv_folds=5
)
print(f"Accuracy: {result.accuracy:.1%}")

1.2 Exploratory Analysis (Unknown Groupings)¶

When: You suspect patterns but don't have labels, or want to discover subgroups.

Decision factors: - Dimensionality reduction first: Always start with PCA - Cluster hypothesis testing: Use PERMANOVA or ANOSIM - Outlier detection: Check before clustering

→ Method: PCA & Dimensionality Reduction
→ Example: Exploratory PCA in Examples
→ API: Chemometrics

Typical workflow:

from foodspec.chemometrics import run_pca

# Run PCA
pca_result = run_pca(
    X=fs.x,
    n_components=5,
    scale=True
)

# Visualize
pca_result.plot_scores(
    labels=fs.metadata["batch"],
    title="Batch Clustering"
)

2. Quantification¶

2.1 Mixture Analysis (Component Percentages)¶

When: Estimate % composition of known components in mixtures.

Decision factors: - Known pure references available: Use MCR-ALS or NNLS - No pure references: Use PLS regression with calibration set - 2-3 components: Direct peak ratios may suffice - 4+ components: Multivariate methods required

→ Method: Mixture Models
→ Example: Mixture Analysis
→ API: Chemometrics - Mixture Analysis

Typical workflow:

from foodspec.chemometrics import mcr_als

# MCR-ALS for 3-component mixture
result = mcr_als(
    X=mixture_spectra,
    n_components=3,
    initial_guess=pure_spectra,
    max_iter=100
)

# Get concentrations
concentrations = result.C  # Sample × component

2.2 Regression (Continuous Properties)¶

When: Predict continuous values (moisture %, protein content, peroxide value).

Decision factors: - Linear relationship expected: PLS regression - Nonlinear relationships: Random forest, neural networks - Small calibration set (<50): PLS with careful validation - Large calibration set (>200): More complex models feasible

→ Method: Classification & Regression
→ Example: Calibration Example
→ API: ML & Validation

Typical workflow:

from foodspec.ml import nested_cross_validate

# PLS regression with nested CV
results = nested_cross_validate(
    X=fs.x,
    y=fs.metadata["moisture_percent"],
    model="pls",
    cv_folds=5,
    n_components_range=[1, 2, 3, 5, 10]
)
print(f"R² = {results['r2']:.3f}, RMSE = {results['rmse']:.2f}")

3. Temporal Analysis & Monitoring¶

3.1 Heating & Degradation Monitoring¶

When: Track quality changes over time (oxidation, thermal degradation, shelf life).

Decision factors: - Known degradation markers: Track specific peak ratios over time - Unknown mechanisms: Use multivariate time-series analysis - Predict shelf life: Fit degradation models to ratio trajectories

→ Method: Statistical Analysis
→ Example: Heating Quality Monitoring
→ API: Workflows

Typical workflow:

from foodspec.workflows import analyze_heating_trajectory

# Analyze time series
result = analyze_heating_trajectory(
    spectra=fs,
    time_column="heating_time_min",
    ratio_numerator=1655,  # C=C unsaturation
    ratio_denominator=1440  # CH2 reference
)

# Get shelf life estimate
shelf_life = result.estimate_shelf_life(threshold=0.8)
print(f"Estimated shelf life: {shelf_life} hours")

3.2 Batch Quality Control¶

When: Monitor production batches for consistency and drift detection.

Decision factors: - Continuous monitoring: Control charts with Hotelling's T² - Batch comparison: ANOVA or Kruskal-Wallis tests - Outlier detection: Mahalanobis distance or PCA residuals - Small batch sizes (<10): Use robust statistics

→ Method: Statistical Study Design
→ Example: Batch QC Workflow
→ API: Statistics

Typical workflow:

from foodspec.qc import check_class_balance, detect_outliers

# Check batch consistency
balance = check_class_balance(fs.metadata, "batch_id")
outliers = detect_outliers(
    fs.x,
    method="mahalanobis",
    threshold=3.0
)

# Statistical comparison
from foodspec.stats import run_anova
anova_result = run_anova(
    fs.x[:, peak_idx],  # Specific peak
    groups=fs.metadata["batch_id"]
)

4. Harmonization & Instrument Comparability¶

4.1 Different Instruments (Same Sample Type)¶

When: Combine data from multiple Raman or FTIR instruments measuring the same samples.

Decision factors: - Standards available: Piecewise Direct Standardization (PDS) - No standards, overlapping samples: Direct Standardization (DS) - Completely different wavelength ranges: May not be harmonizable

→ Method: Harmonization Theory
→ Example: Multi-Instrument Workflow
→ API: Calibration Transfer

Typical workflow:

from foodspec.calibration_transfer import piecewise_direct_standardization

# Transfer from instrument A to B
transfer = piecewise_direct_standardization(
    X_source=spectra_instrument_A,
    X_target=spectra_instrument_B,
    window_size=11
)

# Apply to new measurements
X_harmonized = transfer.transform(X_new_from_A)

4.2 Different Matrices (Same Measurement Goal)¶

When: Compare oils in pure form vs. oils extracted from fried chips, or milk vs. cheese.

Decision factors: - Known matrix effects: Apply matrix-specific corrections first - Unknown effects: Domain adaptation or transfer learning - Small target matrix data: Use source matrix model with caution

→ Method: Matrix Effects
→ Example: Matrix Correction
→ API: Workflows - Matrix Correction

Typical workflow:

from foodspec.matrix_correction import apply_matrix_correction

# Correct for matrix effects
corrected = apply_matrix_correction(
    X_target=chips_spectra,
    X_reference=oil_spectra,
    method="msc"
)

5. Preprocessing & Data Cleaning¶

5.1 Which Preprocessing Steps Do I Need?¶

Decision factors by symptom:

Symptom	Solution	Method	API
Curved baselines, fluorescence	Baseline correction	Baseline Correction	baseline_als
Different intensities, scaling issues	Normalization	Normalization	normalize_snv
Noisy spectra, hard to see peaks	Smoothing	Smoothing	savgol_smooth
Cosmic ray spikes (Raman)	Spike removal	Cosmic Rays	CosmicRayRemover
Overlapping peaks	Derivatives (1st/2nd)	Derivatives	savgol_smooth
Scatter effects, particle size	MSC/SNV	Scatter Correction	MSCNormalizer

Recommended preprocessing order: 1. Cosmic ray removal (if Raman) 2. Baseline correction (if curved backgrounds) 3. Smoothing (if noisy) 4. Normalization (SNV or MSC) 5. Derivatives (optional, for overlapping peaks) 6. Feature extraction or full-spectrum modeling

→ Full Guide: Preprocessing Methods Overview

📊 Dataset Size & Complexity Guide¶

Small Datasets (<100 samples)¶

Challenges: Limited statistical power, risk of overfitting.

Recommended approaches: - Preprocessing: Conservative (avoid over-smoothing) - Feature selection: Use a priori knowledge (literature-based peaks) - Validation: Leave-one-out CV or stratified 5-fold CV - Models: Simple models (PLS-DA, linear regression) - Avoid: Deep learning, complex ensemble methods

→ Guide: Study Design - Sample Size

Medium Datasets (100-1000 samples)¶

Opportunities: Moderate statistical power, can test multiple methods.

Recommended approaches: - Preprocessing: Standard pipelines - Feature selection: Data-driven + domain knowledge hybrid - Validation: Nested cross-validation with holdout test set - Models: PLS, random forests, gradient boosting - Hyperparameter tuning: Grid search feasible

→ Guide: Cross-Validation Best Practices

Large Datasets (>1000 samples)¶

Opportunities: High statistical power, can use complex models.

Recommended approaches: - Preprocessing: Automated pipelines acceptable - Feature selection: Automatic feature importance ranking - Validation: Train/validation/test splits - Models: Neural networks, deep learning, ensembles - Advanced techniques: Transfer learning, multi-task learning

→ Guide: Advanced Deep Learning

🧪 Sample Matrix Guide¶

Pure Liquids (Oils, Solvents)¶

Characteristics: Minimal scatter, good optical contact.

Preprocessing: - Baseline: Mild (ALS with conservative parameters) - Normalization: Vector or area normalization - Scatter correction: Usually not needed

→ Example: Oil Authentication

Powders & Solids (Flour, Spices)¶

Characteristics: High scatter from particle size variations.

Preprocessing: - Baseline: Aggressive (ALS or rubberband) - Normalization: SNV or MSC (critical) - Scatter correction: Essential

→ Method: Scatter Correction

Emulsions & Suspensions (Milk, Juices)¶

Characteristics: Complex scatter, heterogeneous.

Preprocessing: - Baseline: Moderate - Normalization: MSC with robust mean - Homogenization: May need sample prep guidance

Tissue & Meat Products¶

Characteristics: Variable water content, complex matrix.

Preprocessing: - Baseline: Essential - Normalization: SNV recommended - Water bands: May need masking (1640 cm⁻¹, 3200-3600 cm⁻¹)

🔗 Cross-Reference Table¶

Goal	Method Page	Example	API
Oil authentication	Classification	Oil Example	ML API
Heating monitoring	Statistics	Heating Example	Workflows API
Mixture quantification	Mixture Models	Mixture Example	Chemometrics API
Hyperspectral mapping	Spatial Analysis	HSI Example	Datasets API
Baseline correction	Baseline Methods	Recipe Card #2	Preprocessing API
PCA exploration	PCA Guide	PCA Examples	Chemometrics API
Batch QC	Study Design	QC Workflow	Statistics API
Multi-instrument	Harmonization	Harmonization Workflow	Workflows API

🧭 Still Not Sure?¶

If you're uncertain which approach to use:

Start simple: Run PCA on preprocessed data to visualize structure
Check assumptions: Read Study Design for sample size guidance
Try examples: Run closest teaching example with your data
Ask for help: See FAQ or community discussions

Common pitfalls to avoid: - ❌ Applying complex models to small datasets - ❌ Skipping preprocessing for raw spectra - ❌ Not validating results properly (train/test leakage) - ❌ Ignoring matrix effects in heterogeneous samples

Best practices: - ✅ Start with visualization (PCA, score plots) - ✅ Use domain knowledge for feature selection - ✅ Validate rigorously (nested CV or holdout test) - ✅ Document preprocessing decisions - ✅ Report uncertainty (confidence intervals, p-values)

📚 Further Reading¶

For method details: Methods Overview
For worked examples: Examples Gallery
For API documentation: API Reference
For theory: Spectroscopy Basics
For troubleshooting: Common Problems

Decision Guide: Choosing the Right Approach¶

🎯 Quick Decision Tree¶

🔍 Goal-Based Navigation¶

1. Classification & Discrimination¶

1.1 Authenticate or Detect Adulteration¶

1.2 Exploratory Analysis (Unknown Groupings)¶

2. Quantification¶

2.1 Mixture Analysis (Component Percentages)¶

2.2 Regression (Continuous Properties)¶

3. Temporal Analysis & Monitoring¶

3.1 Heating & Degradation Monitoring¶

3.2 Batch Quality Control¶

4. Harmonization & Instrument Comparability¶

4.1 Different Instruments (Same Sample Type)¶

4.2 Different Matrices (Same Measurement Goal)¶

5. Preprocessing & Data Cleaning¶

5.1 Which Preprocessing Steps Do I Need?¶

📊 Dataset Size & Complexity Guide¶

Small Datasets (<100 samples)¶

Medium Datasets (100-1000 samples)¶

Large Datasets (>1000 samples)¶

🧪 Sample Matrix Guide¶

Pure Liquids (Oils, Solvents)¶

Powders & Solids (Flour, Spices)¶

Emulsions & Suspensions (Milk, Juices)¶

Tissue & Meat Products¶

🔗 Cross-Reference Table¶

🧭 Still Not Sure?¶

📚 Further Reading¶