Quickstart (CLI)¶
Who needs this? QC engineers, lab technicians, anyone without Python experience
What problem does this solve? Run a complete oil authentication analysis from CSV to results using command-line
When to use this? First-time users, batch processing, automation scripts
Why it matters? CLI workflows are scriptable, reproducible, and don't require Python knowledge
Time to complete: 10 minutes
Prerequisites: FoodSpec installed (pip install foodspec), CSV file with spectra, basic terminal knowledge
Installation¶
Option 1: pip (Recommended)¶
pip install foodspec
Option 2: conda¶
conda install -c conda-forge foodspec
Option 3: Development version (from source)¶
git clone https://github.com/chandrasekarnarayana/foodspec.git
cd foodspec
pip install -e .
Verify installation:
foodspec --version
# Output: 1.0.0
Dataset Format¶
Data Format Reference
See Data Format Reference for complete schema specifications, unit conventions, and validation checklist. Key terms defined in Glossary.
CSV Requirements¶
Your input CSV must have: 1. Wavenumber/Wavelength column (e.g., "wavenumber", "wavelength", "cm-1") 2. Spectra columns โ one per sample (intensity values) 3. Label column (optional) โ class labels for classification (e.g., "oil_type", "sample_id")
Example CSV (wide format):¶
wavenumber,olive_oil_1,olive_oil_2,sunflower_oil_1,palm_oil_1
1000.0,5.2,5.1,4.8,6.1
1010.0,5.5,5.3,5.0,6.3
1020.0,5.8,5.6,5.2,6.5
...
3000.0,2.1,2.0,2.3,1.9
Minimal Requirements:¶
- Min spectra: 3 per class (10+ recommended)
- Min wavenumbers: 50 (100+ recommended)
- Spacing: Regular or irregular (FoodSpec handles both)
- Units: Any (cmโปยน, nm, etc.) โ FoodSpec doesn't care
Add labels (CSV with metadata):¶
wavenumber,oil_type,batch,s1,s2,s3
1000.0,olive,A,5.2,5.1,4.8
1010.0,olive,A,5.5,5.3,5.0
...
One Complete Example (Oil Authentication)¶
Step 1: Create sample data¶
# Generate synthetic oil spectra for testing
python << 'EOF'
import numpy as np
import pandas as pd
# Create toy dataset
np.random.seed(42)
wavenumbers = np.linspace(1000, 3000, 150)
# Generate 10 samples per class
olive_spectra = np.random.normal(5.0, 0.3, (10, 150))
palm_spectra = np.random.normal(6.0, 0.3, (10, 150))
sunflower_spectra = np.random.normal(4.5, 0.3, (10, 150))
# Combine
X = np.vstack([olive_spectra, palm_spectra, sunflower_spectra])
labels = ['olive']*10 + ['palm']*10 + ['sunflower']*10
# Create DataFrame
df = pd.DataFrame(X, columns=[f'{w:.1f}' for w in wavenumbers])
df.insert(0, 'oil_type', labels)
df.insert(0, 'wavenumber', wavenumbers)
# Save
df.to_csv('oils_demo.csv', index=False)
print("โ Created oils_demo.csv (30 samples, 150 wavenumbers)")
EOF
Step 2: Convert CSV to FoodSpec library¶
foodspec csv-to-library \
oils_demo.csv \
oils_demo.h5 \
--wavenumber-column wavenumber \
--label-column oil_type \
--modality raman
Expected output:
โ Loaded 30 spectra from oils_demo.csv
โ Wrote 30 spectra to oils_demo.h5
Step 3: Run oil authentication workflow¶
foodspec oil-auth \
oils_demo.h5 \
--output-dir runs/demo
Expected output:
โ Loaded 30 spectra
โ Preprocessing (ALS baseline, normalization, smoothing)
โ Running RQ (ratiometric features) analysis
โ Cross-validation: 5-fold
โ Mean accuracy: 0.87 (ยฑ0.12)
โ Results saved to runs/demo/20241228_120000_run/
Step 4: View results¶
# Print summary report
cat runs/demo/*/report.txt
# View metrics
cat runs/demo/*/metrics.json | python -m json.tool
# List all outputs
ls -lh runs/demo/*/
Expected files:
report.txt โ Text summary with accuracy, key ratios
metrics.json โ Detailed CV metrics
confusion_matrix.png โ Classification visualization
rq_summary.csv โ Extracted ratiometric features
tables/ โ Additional analysis tables
Explore Other Workflows¶
Oil Heating Stability¶
Track degradation during frying:
foodspec heating \
oils_demo.h5 \
--output-dir runs/heating
Mixture Analysis¶
Quantify oil blends:
foodspec mixture \
oils_demo.h5 \
--components olive,sunflower \
--output-dir runs/mixture
List all commands¶
foodspec --help
Expected Outputs¶
Directory Structure¶
runs/demo/20241228_120000_run/
โโโ report.txt # Text summary
โโโ metrics.json # Cross-validation metrics
โโโ confusion_matrix.png # Classification plot
โโโ rq_summary.csv # Feature matrix
โโโ tables/
โ โโโ rq_features.csv
โ โโโ cv_results.csv
โโโ metadata.json # Run provenance
Typical Metrics (metrics.json)¶
{
"balanced_accuracy": 0.87,
"f1_macro": 0.85,
"roc_auc_ovr": 0.92,
"n_samples": 30,
"n_features": 12,
"cv_fold": 5,
"timestamp": "2024-12-28T12:00:00Z"
}
Troubleshooting (Top 5 Issues)¶
1๏ธโฃ "No such command: oil-auth"¶
Cause: FoodSpec not installed or installed incorrectly
Fix:
# Reinstall
pip uninstall -y foodspec && pip install foodspec
# Verify
foodspec --version
foodspec oil-auth --help
2๏ธโฃ "FileNotFoundError: oils_demo.csv not found"¶
Cause: CSV path is wrong or file doesn't exist
Fix:
# Check file exists
ls -l oils_demo.csv
# Use absolute path
foodspec csv-to-library /full/path/to/oils_demo.csv oils_demo.h5
# Or run from correct directory
cd /path/to/data
foodspec csv-to-library oils_demo.csv oils_demo.h5
3๏ธโฃ "Wavenumber column not found"¶
Cause: Column name doesn't match data
Fix:
# Check column names
head -1 oils_demo.csv
# Use correct name
foodspec csv-to-library oils_demo.csv oils_demo.h5 \
--wavenumber-column "cm-1" # or whatever column name exists
# If no wavenumber column, generate one
python << 'EOF'
import pandas as pd
df = pd.read_csv('oils_demo.csv')
df.insert(0, 'wavenumber', range(len(df)))
df.to_csv('oils_demo_fixed.csv', index=False)
EOF
4๏ธโฃ "Error: expected 4 arguments, got 1"¶
Cause: Missing required arguments
Fix:
# Always provide: input, output, wavenumber-column
foodspec csv-to-library \
oils_demo.csv oils_demo.h5 \
--wavenumber-column wavenumber
# Or use --help to see required args
foodspec csv-to-library --help
5๏ธโฃ "Out of memory" or "Empty results"¶
Cause: Dataset too large or no valid spectra
Fix:
# Check file size and number of spectra
wc -l oils_demo.csv
# For large files, subsample
python << 'EOF'
import pandas as pd
df = pd.read_csv('oils_demo.csv')
df_small = df.iloc[::10, :] # every 10th row
df_small.to_csv('oils_demo_small.csv', index=False)
print(f"Reduced to {len(df_small)} spectra")
EOF
# Run on smaller subset
foodspec csv-to-library oils_demo_small.csv oils_demo_small.h5
Copy-Paste Quick Reference¶
Create synthetic data¶
python << 'EOF'
import numpy as np
import pandas as pd
np.random.seed(42)
w = np.linspace(1000, 3000, 150)
df = pd.DataFrame(
np.vstack([np.random.normal(5, 0.3, (10, 150)),
np.random.normal(6, 0.3, (10, 150))]),
columns=[f'{x:.1f}' for x in w]
)
df.insert(0, 'oil_type', ['olive']*10 + ['palm']*10)
df.insert(0, 'wavenumber', w)
df.to_csv('oils_demo.csv', index=False)
print("โ oils_demo.csv created")
EOF
Full pipeline¶
foodspec csv-to-library oils_demo.csv oils_demo.h5 --wavenumber-column wavenumber --label-column oil_type --modality raman && \
foodspec oil-auth oils_demo.h5 --output-dir runs/demo && \
cat runs/demo/*/report.txt
Additional Resources¶
- Data Format Reference - Data validation checklist, schema formats
- Glossary - Terminology (wavenumber, baseline, CV strategy, leakage)
- CLI Help - Complete CLI command documentation
- Workflows - Ready-to-use analysis protocols
Next Steps¶
- โ
Run
foodspec oil-auth --helpto explore options - โ
Try
foodspec heatingorfoodspec mixturefor other analyses - โ Switch to Python API for custom workflows: Python Quickstart
- โ Deep dive: CLI Guide
- report.md summarizes run parameters and files
Tips:
- Use --classifier-name to switch models (rf, svm_rbf, logreg, etc.).
- Add --save-model to persist the fitted pipeline via the model registry.
- For long/tidy CSVs, use --format long --sample-id-column ... --intensity-column ....
Run from exp.yml (one command)¶
Define everything in a single YAML file exp.yml:
dataset:
path: data/oils_demo.h5
modality: raman
schema:
label_column: oil_type
preprocessing:
preset: standard
qc:
method: robust_z
thresholds:
outlier_rate: 0.1
features:
preset: specs
specs:
- name: band_1
ftype: band
regions:
- [1000, 1100]
modeling:
suite:
- algorithm: rf
params:
n_estimators: 50
reporting:
targets: [metrics, diagnostics]
outputs:
base_dir: runs/oils_exp
Run it end-to-end:
foodspec run-exp exp.yml
# Dry-run (validate + hashes only)
foodspec run-exp exp.yml --dry-run
# Emit single-file artifact for deployment
foodspec run-exp exp.yml --artifact-path runs/oils_exp.foodspec
base_dir.
Temporal & Shelf-life (CLI)¶
Aging (degradation trajectories + stages)¶
foodspec aging \
libraries/time_series_demo.h5 \
--value-col degrade_index \
--method linear \
--time-col time \
--entity-col sample_id \
--output-dir runs/aging_demo
aging_metrics.csv, stages.csv, and a sample fit figure under a timestamped folder.
Shelf-life (remaining time to threshold)¶
foodspec shelf-life \
libraries/time_series_demo.h5 \
--value-col degrade_index \
--threshold 2.0 \
--time-col time \
--entity-col sample_id \
--output-dir runs/shelf_life_demo
shelf_life_estimates.csv with t_star, ci_low, ci_high per entity, plus a quick-look figure.
Multi-Modal & Cross-Technique Analysis (Python API)¶
FoodSpec supports multi-modal spectroscopy (Raman + FTIR + NIR) for enhanced authentication and cross-validation. While there's no dedicated CLI command yet, the Python API enables powerful multi-modal workflows:
Quick Example¶
from foodspec.core import FoodSpectrumSet, MultiModalDataset
from foodspec.ml.fusion import late_fusion_concat, decision_fusion_vote
from foodspec.stats.fusion_metrics import modality_agreement_kappa
from sklearn.ensemble import RandomForestClassifier
# Load aligned datasets (same samples, different techniques)
raman = FoodSpectrumSet.from_hdf5("olive_raman.h5")
ftir = FoodSpectrumSet.from_hdf5("olive_ftir.h5")
mmd = MultiModalDataset.from_datasets({"raman": raman, "ftir": ftir})
# **Late fusion**: Concatenate features, train joint model
features = mmd.to_feature_dict()
result = late_fusion_concat(features)
X_fused = result.X_fused
y = raman.sample_table["authentic"]
clf = RandomForestClassifier()
clf.fit(X_fused, y)
y_pred = clf.predict(X_fused)
# **Decision fusion**: Train separate models, combine predictions
predictions = {}
for mod, ds in mmd.datasets.items():
clf = RandomForestClassifier()
clf.fit(ds.X, ds.sample_table["authentic"])
predictions[mod] = clf.predict(ds.X)
# Majority voting
vote_result = decision_fusion_vote(predictions, strategy="majority")
# **Agreement metrics**: Check cross-technique consistency
kappa_df = modality_agreement_kappa(predictions)
print(kappa_df) # Cohen's kappa matrix (ฮบ > 0.8 = excellent agreement)
See full guide: Multi-Modal Workflows
Use cases: - โ Olive oil authentication (Raman confirms FTIR) - โ Novelty detection (modality disagreement flags unknowns) - โ Robustness validation (cross-lab/cross-technique agreement)
Need Help?¶
- Installation errors, NaNs, shape mismatches? โ Troubleshooting Guide
- Questions about methods or usage? โ FAQ
- Report a bug: GitHub Issues