Skip to content

Workflows API

High-level domain-specific analysis workflows.

The foodspec.workflows module provides end-to-end workflows for common food spectroscopy applications.

Aging

AgingResult

Structured results for degradation trajectory analysis.

TrajectoryFit

Per-entity trajectory fit parameters and diagnostics.

compute_degradation_trajectories

Fit degradation trajectories across entities over storage time.

Oil Authentication

Complete oil authentication workflows are available through the CLI and protocol system. See the Oil Authentication Guide for details.

Heating Quality

analyze_heating_trajectory

Analyze thermal degradation patterns over time.

Analyze heating/oxidation trajectory from time-series spectra.

Workflow: 1. Extract oxidation indices 2. Fit trajectory models per index 3. (Optional) Classify degradation stages 4. (Optional) Estimate shelf life

Assumptions: - time_column exists in metadata and is numeric - For stage classification: stage_column must be provided - For shelf-life estimation: shelf_life_threshold must be provided - Sufficient time points (≥5) for trajectory fitting

Parameters

dataset : FoodSpectrumSet Input dataset with time-series spectra. time_column : str Metadata column with time values. indices : list of str, default=['pi', 'tfc', 'oit_proxy'] Indices to extract and model. classify_stages : bool, default=False Whether to train degradation stage classifier. stage_column : str, optional Metadata column with degradation stage labels (required if classify_stages=True). estimate_shelf_life : bool, default=False Whether to estimate shelf life. shelf_life_threshold : float, optional Threshold for shelf-life criterion (required if estimate_shelf_life=True). shelf_life_index : str, default='pi' Index to use for shelf-life estimation.

Returns

results : dict - 'indices': DataFrame of extracted indices - 'trajectory_models': dict of fit metrics per index - 'stage_classification' (if enabled): classification metrics - 'shelf_life' (if enabled): shelf-life estimation metrics

LibrarySearchWorkflow

Search spectral library for matches.

Run a library search workflow (scaffold).

Parameters

metric : str Similarity metric ('cosine', 'sid', 'sam'). top_k : int Number of top matches to return.

Methods

run(library_df, query, wavenumbers) Return a DataFrame with columns ['index', 'label', 'score', 'confidence', 'metric']. validate() Validate configuration. to_dict() JSON-friendly dict of configuration. hash() Hash of configuration for reproducibility.

run

run(library_df, query, wavenumbers=None)

Execute the library search (placeholder).

Parameters

library_df : pd.DataFrame Library spectra in wide format (metadata + spectral columns). query : np.ndarray Query spectrum as 1-D array. wavenumbers : Optional[np.ndarray] Wavenumber grid for plotting or alignment (unused in scaffold).

Returns

pd.DataFrame Ranked matches with placeholder scores/confidence.

Shelf Life

estimate_remaining_shelf_life

Model shelf life from spectral evolution.

Data Governance

summarize_dataset

Generate comprehensive dataset quality report.

Comprehensive dataset summary for at-a-glance quality assessment.

Workflow: 1. Samples per class distribution 2. Spectral quality metrics (SNR, range, NaN/inf) 3. Metadata completeness

Assumptions: - Dataset is a valid FoodSpectrumSet - label_column (if provided) is categorical - Spectral data is in standard format (no major preprocessing artifacts)

Parameters:

Name Type Description Default
dataset FoodSpectrumSet

Input dataset.

required
label_column str | None

Column with class labels for balance analysis.

None
required_metadata_columns list[str] | None

Columns required for workflows.

None

Returns:

Name Type Description
dict Dict[str, Any]

Summary sections including class distribution, spectral quality,

Dict[str, Any]

metadata completeness, and dataset info.

compute_readiness_score

Assess dataset readiness for modeling.

Compute comprehensive dataset readiness score (0-100).

Workflow: 1. Score sample size 2. Score class balance 3. Score replicate consistency (if replicate_column provided) 4. Score metadata completeness 5. Score spectral quality 6. Score leakage risk (if batch/replicate columns provided) 7. Weighted average → overall score

Assumptions: - Default weights: sample_size=0.20, balance=0.20, replicates=0.15, metadata=0.15, spectral=0.15, leakage=0.15 - Thresholds: min_samples_per_class=20, imbalance_ratio=10, technical_cv=10%

Parameters

dataset : FoodSpectrumSet Input dataset. label_column : str Column with class labels. batch_column : str, optional Column defining batches (for leakage detection). replicate_column : str, optional Column defining replicate groups (for consistency and leakage). required_metadata_columns : list of str, optional Metadata columns that must be complete. weights : dict, optional Custom weights for scoring dimensions. Keys: 'sample_size', 'balance', 'replicates', 'metadata', 'spectral', 'leakage'

Returns

score_report : dict - 'overall_score': 0-100 score - 'dimension_scores': dict with individual dimension scores - 'passed_criteria': list of criteria that passed (score ≥ 70) - 'failed_criteria': list of criteria that failed (score < 70) - 'recommendation': text guidance based on overall score

See Also