Skip to content

Getting Started

Purpose: Understand FoodSpec's workflow and choose your path (Python vs. CLI).
Audience: Complete beginners; no spectroscopy background required.
Time: 5-10 minutes.
Prerequisites: FoodSpec installed; basic Python or terminal knowledge.


The 30-Second Version

# The absolute minimal example
from foodspec import __version__
print(f"FoodSpec {__version__} is ready!")

# Load some oil data
from foodspec.io import load_csv_spectra
spectra = load_csv_spectra("examples/data/oils.csv")
print(f"Loaded {len(spectra)} spectra")

Expected output:

FoodSpec 1.0.0 is ready!
Loaded 96 spectra


Choose Your Path

Path 1: Python API (Interactive, Customizable)

Best for: Learning, experimentation, custom analysis

Quickstart: 1. 15-Minute Quickstart β€” Get working code now 2. Oil Authentication β€” Real example 3. Full End-to-End β€” Every step explained

Key modules:

from foodspec.datasets import load_oil_example_data
from foodspec.preprocess import baseline_als, normalize_snv
from foodspec.ml import ClassifierFactory
from foodspec.validation import run_stratified_cv


Path 2: CLI (Reproducible, Automatable)

Best for: Reproducibility, batch processing, production use

Quickstart: 1. First Steps (CLI) β€” Run your first command 2. Protocol Design β€” Create YAML configs 3. Reproducibility β€” Best practices

Key commands:

foodspec --version                    # Verify installation
foodspec-run-protocol \               # Run analysis
  --protocol myprotocol.yaml \
  --input data.csv \
  --output-dir results


Who is it for?

Food scientists, analytical chemists, QA engineers, and data scientists working with Raman/FTIR spectra who need reproducible preprocessing, chemometrics, and reporting.


What Does FoodSpec Do?

In: Spectral data (CSV or HDF5) + metadata (labels, batch info)
Out: Validated model + metrics + figures + reproducibility report

Typical workflow:

Raw spectra (instrument file)
        ↓
  CSV β†’ HDF5 (standardized format)
        ↓
   Preprocess (baseline, smooth, normalize)
        ↓
   Extract features (peaks, ratios, PCA)
        ↓
   Train & validate (cross-validation)
        ↓
   Results (metrics, figures, JSON report)

Installation Options

Core (always install first):

pip install foodspec

Deep learning (optional, for 1D CNN):

pip install "foodspec[deep]"

Verify installation:

foodspec --version  # Should print version
foodspec about      # Detailed info


Data Format Quick Check

FoodSpec expects data in one of these formats:

CSV (Simplest)

sample_id,wavenumber,intensity,label
OO_001,4000.0,0.234,Olive
OO_001,3998.0,0.235,Olive
OO_002,4000.0,0.241,Olive

Load with:

from foodspec.io import load_csv_spectra
spectra = load_csv_spectra("oils.csv", label_column="label")

HDF5 (Efficient for large datasets)

from foodspec.io import load_hdf5_library
spectra = load_hdf5_library("oils_library.h5")

See Data Format Reference for full details.


Next Steps

Never used FoodSpec before? β†’ Start with 15-Minute Quickstart

Prefer command-line? β†’ Go to First Steps (CLI)

Want to understand reproducibility? β†’ Read Reproducibility Guide

Ready for real examples? β†’ Oil Authentication


FAQ (Quick Answers)

Q: Can I use my own data?
A: Yes! See Data Format Reference for import instructions.

Q: Do I need to know Python?
A: No! Use CLI with YAML configs. Or yes, use Python API for flexibility.

Q: How long does an analysis take?
A: Usually < 1 minute for 100 samples. Depends on data size and model.

Q: Can I reproduce old analyses?
A: Yes! Save protocols and metadata with each run. See Reproducibility.


Questions this page answers

Food scientists, analytical chemists, QC engineers, and data scientists working with Raman/FTIR spectra who need reproducible preprocessing, chemometrics, and reporting.

Installation

  • Core:
    pip install foodspec
    
  • Deep-learning extra (optional 1D CNN prototype):
    pip install "foodspec[deep]"
    
  • Verify:
    foodspec about
    

Data formats and metadata

  • Instrument exports: commercial Raman/FTIR instruments often export per-spectrum TXT/CSV files (wavenumber/intensity columns) or wide CSVs (one column per spectrum).
  • FoodSpec standard: convert to a validated FoodSpectrumSet (HDF5 library) with:
  • x: spectra matrix (n_samples Γ— n_wavenumbers)
  • wavenumbers: monotonic axis (cm⁻¹)
  • metadata: one row per sample (e.g., oil_type, meat_type, species, heating_time)
  • modality: raman/ftir/nir
  • Why this protocol? Keeps spectra + metadata together, enables reproducible preprocessing/models, and matches downstream workflows (oil-auth, heating, QC).

Typical pipeline (text diagram)

Raw spectra (instrument CSV/TXT) → CSV→HDF5 library → Preprocess (baseline, smoothing, normalization, crop) → Features/chemometrics (peaks/ratios/PCA/PLS/models) → Metrics & reports (plots, JSON/Markdown).

Minimal examples (stepwise)

For full code, see the dedicated quickstarts. Highlights:

Python (steps)

1) Load library & validate. 2) Apply simple preprocessing (ALS baseline β†’ Savitzky–Golay β†’ Vector norm). 3) Run PCA for a quick check.

from pathlib import Path
import matplotlib.pyplot as plt
from foodspec import load_library
from foodspec.validation import validate_spectrum_set
from foodspec.preprocess.baseline import ALSBaseline
from foodspec.preprocess.smoothing import SavitzkyGolaySmoother
from foodspec.preprocess.normalization import VectorNormalizer
from foodspec.chemometrics.pca import run_pca

fs = load_library(Path("libraries/oils_demo.h5"))
validate_spectrum_set(fs)

X = fs.x
for step in [ALSBaseline(lambda_=1e5, p=0.01, max_iter=10),
             SavitzkyGolaySmoother(window_length=9, polyorder=3),
             VectorNormalizer(norm="l2")]:
    X = step.fit_transform(X)

_, pca_res = run_pca(X, n_components=2)
plt.scatter(pca_res.scores[:, 0], pca_res.scores[:, 1]); plt.tight_layout()
plt.savefig("pca_scores.png", dpi=150)

CLI (steps)

1) Convert CSV (wide example) to HDF5:

foodspec csv-to-library data/oils.csv libraries/oils.h5 \
  --format wide --wavenumber-column wavenumber \
  --label-column oil_type --modality raman
2) Run oil authentication:
foodspec oil-auth libraries/oils.h5 \
  --label-column oil_type \
  --output-dir runs/oils_demo
Outputs: metrics.json/CSV, confusion_matrix.png, report.md in a timestamped folder.

Quickstarts

See also - ../workflows/oil_authentication.md - ../workflows/heating_quality_monitoring.md - ../user-guide/csv_to_library.md