Skip to content

Workflows: Reproducible Analysis Patterns

Reproducible end‑to‑end workflows for authentication, degradation monitoring, mixture analysis, harmonization and hyperspectral mapping.


πŸ—ΊοΈ Workflow Categories

Authentication & Identification

Determine what a sample is (classification).

Workflow Problem Time Difficulty
Oil Authentication "What oil is this?" / "Is it adulterated?" 30 min Beginner
Matrix Effects Compare markers across matrices (oils vs chips) 40 min Applied

When to use: Verify authenticity, detect fraud, classify unknowns into known categories.


Degradation & Thermal Monitoring

Track chemical changes over time, temperature, or storage.

Workflow Problem Time Difficulty
Heating & Quality Monitoring Track oxidation/degradation during frying 35 min Beginner
Aging Workflows Monitor shelf-life and storage stability 40 min Applied
Batch Quality Control Detect drift, outliers, and batch-to-batch variation 45 min Applied

When to use: Monitor frying cycles, predict shelf-life, detect off-spec batches, study degradation kinetics.


Adulteration & Mixture Analysis

Quantify components in blends or detect contamination.

Workflow Problem Time Difficulty
Mixture Analysis Quantify adulteration levels (e.g., 10% seed oil in olive) 40 min Applied
Calibration & Regression Build calibration curves for quantitative prediction 50 min Advanced

When to use: Quantify adulterants, build concentration models, detect contamination thresholds.


Harmonization & Instrument Effects

Handle multi-instrument data or transfer models.

Workflow Problem Time Difficulty
Harmonization & Automated Calibration Transfer models between instruments, correct batch effects 60 min Advanced
Standard Templates Create reusable workflow templates for common tasks 45 min Advanced

When to use: Combine data from multiple instruments, transfer models to new sites, standardize QA protocols.


Spatial & Hyperspectral Analysis

Map chemical composition across surfaces.

Workflow Problem Time Difficulty
Hyperspectral Mapping Map contaminants, coatings, or ROIs on surfaces 50 min Advanced

When to use: Visualize spatial distribution, segment regions of interest, analyze surface coatings.


Workflow Design & Reporting

Meta-workflow for creating new analysis pipelines.

Workflow Problem Time Difficulty
Workflow Design & Reporting Design custom workflows with proper documentation 60 min Advanced

When to use: Build new domain-specific workflows, document analysis procedures, ensure reproducibility.


πŸ“‹ Workflow Structure

Every FoodSpec workflow follows a consistent template:

1. Standard Header

  • Purpose: One-sentence problem statement
  • When to Use: Specific scenarios where this workflow applies
  • Inputs: Required data format and metadata columns
  • Outputs: Expected results (plots, tables, metrics)
  • Assumptions: What the workflow assumes about your data

2. Minimal Reproducible Example (MRE)

  • Synthetic data generator or bundled example dataset
  • Copy-paste code that runs without external files
  • Complete workflow from load β†’ preprocess β†’ model β†’ results

3. Validation & Sanity Checks

  • Success indicators: What plots/metrics look like when working correctly
  • Failure indicators: Red flags that something is wrong
  • Quality thresholds: Minimum acceptable performance

4. Parameters You Must Justify

  • Critical parameters (baseline Ξ», smoothing window, CV folds)
  • When to adjust from defaults
  • How to document parameter choices

πŸš€ Quick Start Guide

New to FoodSpec?

  1. Start with Oil Authentication (simplest workflow)
  2. Try Heating & Quality Monitoring (time-series analysis)
  3. Explore Workflow Design & Reporting (custom workflows)

Have your own data?

  1. Check the Inputs section of relevant workflow
  2. Ensure your data matches the format (CSV or HDF5 with required metadata)
  3. Run the MRE with your data path substituted
  4. Review Validation & Sanity Checks to verify results

Building a new workflow?

  1. Read Workflow Design & Reporting
  2. Use Standard Templates as starting point
  3. Follow the standard structure (Header β†’ MRE β†’ Validation β†’ Parameters)

πŸ” Choosing the Right Workflow

Decision Tree

What's your goal?
β”œβ”€ Identify/classify samples?
β”‚  └─ Oil Authentication
β”œβ”€ Track degradation over time?
β”‚  β”œβ”€ Heating cycles? β†’ Heating & Quality Monitoring
β”‚  └─ Storage/shelf-life? β†’ Aging Workflows
β”œβ”€ Quantify adulterants?
β”‚  β”œβ”€ Discrete levels? β†’ Mixture Analysis
β”‚  └─ Continuous concentration? β†’ Calibration & Regression
β”œβ”€ Handle multiple instruments?
β”‚  └─ Harmonization & Automated Calibration
β”œβ”€ Map surfaces spatially?
β”‚  └─ Hyperspectral Mapping
└─ Build custom workflow?
   └─ Workflow Design & Reporting

πŸ“Š Workflow Comparison

Feature Authentication Degradation Adulteration Harmonization
Output Type Classification Regression/Trends Quantification Model Transfer
Metadata Required Labels Time/Temperature Concentration Instrument ID
Typical Duration 30–40 min 35–45 min 40–50 min 60+ min
Model Type RF, SVM, PLS-DA Linear, ANCOVA NNLS, MCR-ALS DS, PDS, ComBat
Validation CV + Confusion Matrix RΒ², RMSE, Trends RΒ², Calibration Curve Transfer Accuracy

βš™οΈ Common Parameters Across Workflows

Preprocessing (Universal)

  • Baseline correction: ALS (Ξ»=1e4, p=0.01) β€” Remove background curvature
  • Smoothing: Savitzky-Golay (window=21, polyorder=3) β€” Reduce noise
  • Normalization: SNV or L2 β€” Scale spectra to unit norm
  • Cropping: Spectral region (e.g., 600–1800 cm⁻¹) β€” Focus on informative peaks

Modeling (Task-Specific)

  • Classification: Random Forest (n_trees=100, max_depth=None)
  • Regression: Linear or Ridge (Ξ±=1.0)
  • Validation: 5-fold stratified CV (for classification), 5-fold CV (for regression)

Reporting (Universal)

  • Plots: Confusion matrix, PCA scores, ratio trends, calibration curves
  • Tables: Metrics (accuracy, RΒ², RMSE), feature importance, ANOVA results
  • Narrative: report.md summarizing findings

See individual workflows for parameter justification guidance.


πŸ§ͺ Example Data Requirements

Workflow Min Samples Metadata Columns Typical Wavenumber Range
Oil Authentication 50–100 oil_type, batch (optional) 600–1800 cm⁻¹
Heating Monitoring 30–50 heating_time, oil_type (optional) 600–1800 cm⁻¹
Mixture Analysis 40–80 concentration, mixture_type 600–1800 cm⁻¹
Batch QC 100+ batch, date, instrument 600–1800 cm⁻¹
Harmonization 50+ per instrument instrument_id, batch Full range

  • Tutorials β€” Step-by-step learning paths

Keywords

  • oil authentication
  • heating quality
  • mixture analysis
  • harmonization
  • hyperspectral mapping
  • Cookbook β€” Recipe-style how-to guides
  • User Guide β€” CLI and automation
  • Theory β€” Scientific foundations
  • API Reference β€” Function/class documentation

πŸ› Troubleshooting

Common issues across workflows:

  1. "Model accuracy too low" β†’ Check preprocessing parameters, SNR, class balance
  2. "Trends not significant" β†’ Increase sample size, check metadata alignment
  3. "Harmonization fails" β†’ Verify instrument IDs, check spectral alignment
  4. "Plots don't render" β†’ Check matplotlib backend, file paths

See Troubleshooting Guide for detailed solutions.


πŸ’‘ Best Practices

  1. Always start with MRE β€” Verify workflow works with synthetic data first
  2. Document parameter choices β€” Justify baseline Ξ», smoothing window, CV folds
  3. Check validation metrics β€” Don't trust the model until you've validated it
  4. Generate reproducible reports β€” Use FoodSpec's auto-reporting tools
  5. Version control workflows β€” Store YAML protocols in Git alongside data

🎯 Success Criteria

After completing a workflow, you should have:

βœ… Plots: Confusion matrix, PCA scores, or trend plots (depending on workflow)
βœ… Tables: Metrics (accuracy, RΒ², RMSE), feature importance, or ANOVA results
βœ… Narrative: report.md summarizing findings and interpretation
βœ… Reproducibility: YAML protocol or Python script that can be re-run
βœ… Validation: Cross-validation metrics or test set results


Happy analyzing! πŸ”¬