Workflows: Reproducible Analysis Patterns¶
Reproducible endβtoβend workflows for authentication, degradation monitoring, mixture analysis, harmonization and hyperspectral mapping.
πΊοΈ Workflow Categories¶
Authentication & Identification¶
Determine what a sample is (classification).
| Workflow | Problem | Time | Difficulty |
|---|---|---|---|
| Oil Authentication | "What oil is this?" / "Is it adulterated?" | 30 min | Beginner |
| Matrix Effects | Compare markers across matrices (oils vs chips) | 40 min | Applied |
When to use: Verify authenticity, detect fraud, classify unknowns into known categories.
Degradation & Thermal Monitoring¶
Track chemical changes over time, temperature, or storage.
| Workflow | Problem | Time | Difficulty |
|---|---|---|---|
| Heating & Quality Monitoring | Track oxidation/degradation during frying | 35 min | Beginner |
| Aging Workflows | Monitor shelf-life and storage stability | 40 min | Applied |
| Batch Quality Control | Detect drift, outliers, and batch-to-batch variation | 45 min | Applied |
When to use: Monitor frying cycles, predict shelf-life, detect off-spec batches, study degradation kinetics.
Adulteration & Mixture Analysis¶
Quantify components in blends or detect contamination.
| Workflow | Problem | Time | Difficulty |
|---|---|---|---|
| Mixture Analysis | Quantify adulteration levels (e.g., 10% seed oil in olive) | 40 min | Applied |
| Calibration & Regression | Build calibration curves for quantitative prediction | 50 min | Advanced |
When to use: Quantify adulterants, build concentration models, detect contamination thresholds.
Harmonization & Instrument Effects¶
Handle multi-instrument data or transfer models.
| Workflow | Problem | Time | Difficulty |
|---|---|---|---|
| Harmonization & Automated Calibration | Transfer models between instruments, correct batch effects | 60 min | Advanced |
| Standard Templates | Create reusable workflow templates for common tasks | 45 min | Advanced |
When to use: Combine data from multiple instruments, transfer models to new sites, standardize QA protocols.
Spatial & Hyperspectral Analysis¶
Map chemical composition across surfaces.
| Workflow | Problem | Time | Difficulty |
|---|---|---|---|
| Hyperspectral Mapping | Map contaminants, coatings, or ROIs on surfaces | 50 min | Advanced |
When to use: Visualize spatial distribution, segment regions of interest, analyze surface coatings.
Workflow Design & Reporting¶
Meta-workflow for creating new analysis pipelines.
| Workflow | Problem | Time | Difficulty |
|---|---|---|---|
| Workflow Design & Reporting | Design custom workflows with proper documentation | 60 min | Advanced |
When to use: Build new domain-specific workflows, document analysis procedures, ensure reproducibility.
π Workflow Structure¶
Every FoodSpec workflow follows a consistent template:
1. Standard Header¶
- Purpose: One-sentence problem statement
- When to Use: Specific scenarios where this workflow applies
- Inputs: Required data format and metadata columns
- Outputs: Expected results (plots, tables, metrics)
- Assumptions: What the workflow assumes about your data
2. Minimal Reproducible Example (MRE)¶
- Synthetic data generator or bundled example dataset
- Copy-paste code that runs without external files
- Complete workflow from load β preprocess β model β results
3. Validation & Sanity Checks¶
- Success indicators: What plots/metrics look like when working correctly
- Failure indicators: Red flags that something is wrong
- Quality thresholds: Minimum acceptable performance
4. Parameters You Must Justify¶
- Critical parameters (baseline Ξ», smoothing window, CV folds)
- When to adjust from defaults
- How to document parameter choices
π Quick Start Guide¶
New to FoodSpec?¶
- Start with Oil Authentication (simplest workflow)
- Try Heating & Quality Monitoring (time-series analysis)
- Explore Workflow Design & Reporting (custom workflows)
Have your own data?¶
- Check the Inputs section of relevant workflow
- Ensure your data matches the format (CSV or HDF5 with required metadata)
- Run the MRE with your data path substituted
- Review Validation & Sanity Checks to verify results
Building a new workflow?¶
- Read Workflow Design & Reporting
- Use Standard Templates as starting point
- Follow the standard structure (Header β MRE β Validation β Parameters)
π Choosing the Right Workflow¶
Decision Tree¶
What's your goal?
ββ Identify/classify samples?
β ββ Oil Authentication
ββ Track degradation over time?
β ββ Heating cycles? β Heating & Quality Monitoring
β ββ Storage/shelf-life? β Aging Workflows
ββ Quantify adulterants?
β ββ Discrete levels? β Mixture Analysis
β ββ Continuous concentration? β Calibration & Regression
ββ Handle multiple instruments?
β ββ Harmonization & Automated Calibration
ββ Map surfaces spatially?
β ββ Hyperspectral Mapping
ββ Build custom workflow?
ββ Workflow Design & Reporting
π Workflow Comparison¶
| Feature | Authentication | Degradation | Adulteration | Harmonization |
|---|---|---|---|---|
| Output Type | Classification | Regression/Trends | Quantification | Model Transfer |
| Metadata Required | Labels | Time/Temperature | Concentration | Instrument ID |
| Typical Duration | 30β40 min | 35β45 min | 40β50 min | 60+ min |
| Model Type | RF, SVM, PLS-DA | Linear, ANCOVA | NNLS, MCR-ALS | DS, PDS, ComBat |
| Validation | CV + Confusion Matrix | RΒ², RMSE, Trends | RΒ², Calibration Curve | Transfer Accuracy |
βοΈ Common Parameters Across Workflows¶
Preprocessing (Universal)¶
- Baseline correction: ALS (Ξ»=1e4, p=0.01) β Remove background curvature
- Smoothing: Savitzky-Golay (window=21, polyorder=3) β Reduce noise
- Normalization: SNV or L2 β Scale spectra to unit norm
- Cropping: Spectral region (e.g., 600β1800 cmβ»ΒΉ) β Focus on informative peaks
Modeling (Task-Specific)¶
- Classification: Random Forest (n_trees=100, max_depth=None)
- Regression: Linear or Ridge (Ξ±=1.0)
- Validation: 5-fold stratified CV (for classification), 5-fold CV (for regression)
Reporting (Universal)¶
- Plots: Confusion matrix, PCA scores, ratio trends, calibration curves
- Tables: Metrics (accuracy, RΒ², RMSE), feature importance, ANOVA results
- Narrative: report.md summarizing findings
See individual workflows for parameter justification guidance.
π§ͺ Example Data Requirements¶
| Workflow | Min Samples | Metadata Columns | Typical Wavenumber Range |
|---|---|---|---|
| Oil Authentication | 50β100 | oil_type, batch (optional) |
600β1800 cmβ»ΒΉ |
| Heating Monitoring | 30β50 | heating_time, oil_type (optional) |
600β1800 cmβ»ΒΉ |
| Mixture Analysis | 40β80 | concentration, mixture_type |
600β1800 cmβ»ΒΉ |
| Batch QC | 100+ | batch, date, instrument |
600β1800 cmβ»ΒΉ |
| Harmonization | 50+ per instrument | instrument_id, batch |
Full range |
π Related Documentation¶
- Tutorials β Step-by-step learning paths
Keywords¶
- oil authentication
- heating quality
- mixture analysis
- harmonization
- hyperspectral mapping
- Cookbook β Recipe-style how-to guides
- User Guide β CLI and automation
- Theory β Scientific foundations
- API Reference β Function/class documentation
π Troubleshooting¶
Common issues across workflows:
- "Model accuracy too low" β Check preprocessing parameters, SNR, class balance
- "Trends not significant" β Increase sample size, check metadata alignment
- "Harmonization fails" β Verify instrument IDs, check spectral alignment
- "Plots don't render" β Check matplotlib backend, file paths
See Troubleshooting Guide for detailed solutions.
π‘ Best Practices¶
- Always start with MRE β Verify workflow works with synthetic data first
- Document parameter choices β Justify baseline Ξ», smoothing window, CV folds
- Check validation metrics β Don't trust the model until you've validated it
- Generate reproducible reports β Use FoodSpec's auto-reporting tools
- Version control workflows β Store YAML protocols in Git alongside data
π― Success Criteria¶
After completing a workflow, you should have:
β
Plots: Confusion matrix, PCA scores, or trend plots (depending on workflow)
β
Tables: Metrics (accuracy, RΒ², RMSE), feature importance, or ANOVA results
β
Narrative: report.md summarizing findings and interpretation
β
Reproducibility: YAML protocol or Python script that can be re-run
β
Validation: Cross-validation metrics or test set results
π Quick Links¶
- Beginner-Friendly: Oil Authentication, Heating Monitoring
- Most Common: Batch QC, Mixture Analysis
- Advanced: Harmonization, Hyperspectral
- Meta: Workflow Design, Templates
Happy analyzing! π¬