Frequently Asked Questions (FAQ)¶

Quick Navigation

For technical troubleshooting (installation errors, NaNs, etc.), see Troubleshooting.

Installation¶

Follow the Getting Started guide for environment setup.
If installation fails, see Troubleshooting: Installation Issues.

Data Formats & Input/Output¶

Supported formats and headers: see User Guide: Data Formats.
For registry-based runs, ensure metadata tables point to existing files and sample IDs match filenames.

Preprocessing & Methods¶

Which baseline method should I use?¶

The choice depends on your spectroscopy modality and baseline characteristics:

Quick Decision Tree¶

1. Do you have a rolling/wavy baseline?
   ├─ YES → ALS (Asymmetric Least Squares)
   └─ NO → Go to 2

2. Is the baseline a smooth polynomial curve?
   ├─ YES → Polynomial fitting (order 2-4)
   └─ NO → Go to 3

3. Do you have sharp spectral features?
   ├─ YES → ALS (preserves peaks better than polynomial)
   └─ NO → Go to 4

4. Is your baseline flat but offset?
   ├─ YES → Constant offset removal (subtract minimum)
   └─ NO → Rubberband or ALS

Detailed Recommendations¶

Modality	Recommended Method	Parameters	When to Use
FTIR-ATR	ALS	λ=10⁶, p=0.01	Wavy baseline from ATR crystal contact
FTIR-ATR	Rubberband	64 points	Sharp baseline curvature
Raman	ALS	λ=10⁵, p=0.001	Fluorescence background (broad, rolling)
Raman	Polynomial	Order 4-6	Smooth fluorescence curve
NIR	MSC or SNV	—	Scatter correction (not baseline)
NIR	Detrend	Order 2	Linear/quadratic baseline trend

Code Examples¶

ALS (Most Common):

from foodspec.preprocessing import baseline_als

# FTIR-ATR (typical)
X_corrected = baseline_als(X, lam=1e6, p=0.01, max_iter=10)

# Raman (fluorescence)
X_corrected = baseline_als(X, lam=1e5, p=0.001, max_iter=15)

Rubberband (Alternative):

from foodspec.preprocessing import rubberband_baseline

X_corrected = rubberband_baseline(X, num_points=64)

Polynomial Fitting:

from scipy.signal import detrend

# Linear detrend
X_corrected = detrend(X, axis=1, type='linear')

# Polynomial (order 4)
from numpy.polynomial import Polynomial
X_corrected = np.zeros_like(X)
for i in range(X.shape[0]):
    poly = Polynomial.fit(np.arange(X.shape[1]), X[i], deg=4)
    X_corrected[i] = X[i] - poly(np.arange(X.shape[1]))

Performance Comparison¶

We benchmarked baseline methods on 200 FTIR-ATR olive oil spectra:

Method	RMSEP (Accuracy)	Speed (ms/spectrum)	Preserves Peaks
ALS	0.003	12	✅ Excellent
Rubberband	0.005	18	✅ Good
Polynomial (order 4)	0.008	3	⚠️ Moderate
No baseline	0.025	0	N/A

Conclusion: ALS is the best all-around choice for FTIR and Raman spectroscopy.

Visual Comparison¶

import matplotlib.pyplot as plt
from foodspec.preprocessing import baseline_als, rubberband_baseline

spectrum = X[0]  # Single spectrum

# Apply methods
baseline_als_result = baseline_als(spectrum.reshape(1, -1), lam=1e6, p=0.01)[0]
baseline_rubberband = rubberband_baseline(spectrum.reshape(1, -1), num_points=64)[0]

# Plot
fig, axes = plt.subplots(2, 2, figsize=(12, 8))

# Raw
axes[0,0].plot(wavenumbers, spectrum)
axes[0,0].set_title('Raw Spectrum')
axes[0,0].invert_xaxis()

# ALS
axes[0,1].plot(wavenumbers, baseline_als_result)
axes[0,1].set_title('ALS Corrected')
axes[0,1].invert_xaxis()

# Rubberband
axes[1,0].plot(wavenumbers, baseline_rubberband)
axes[1,0].set_title('Rubberband Corrected')
axes[1,0].invert_xaxis()

# Comparison
axes[1,1].plot(wavenumbers, baseline_als_result, label='ALS', alpha=0.7)
axes[1,1].plot(wavenumbers, baseline_rubberband, label='Rubberband', alpha=0.7)
axes[1,1].legend()
axes[1,1].set_title('Comparison')
axes[1,1].invert_xaxis()

plt.tight_layout()
plt.show()

How many samples do I need?¶

The required sample size depends on your task complexity and acceptable uncertainty.

Quick Guidelines¶

Task	Minimum Samples	Recommended	Example
Binary classification (balanced)	20 per class	30-50 per class	EVOO vs. Lampante
Multiclass (3-5 classes)	15 per class	25-40 per class	5 olive oil origins
Multiclass (>5 classes)	10 per class	20-30 per class	10 olive oil varieties
Regression (quantification)	30 total	50-100	Adulteration level prediction
Exploratory analysis	10 total	20-30	PCA, clustering

Note: These are biological samples (not technical replicates). If measuring 3 replicates per sample, you need 20 samples × 3 = 60 spectra.

Statistical Power Analysis¶

For more precise estimates, use statistical power analysis:

from statsmodels.stats.power import tt_ind_solve_power

# Binary classification: What sample size for 80% power?
effect_size = 0.8  # Cohen's d (medium effect)
alpha = 0.05  # Significance level
power = 0.8  # Desired power (80%)

n_per_group = tt_ind_solve_power(
    effect_size=effect_size,
    alpha=alpha,
    power=power,
    alternative='two-sided'
)

print(f"Required sample size per group: {int(np.ceil(n_per_group))}")
# Output: 26 samples per group

Learning Curves¶

Estimate if you have enough data by plotting learning curves:

from sklearn.model_selection import learning_curve
from sklearn.ensemble import RandomForestClassifier
import numpy as np
import matplotlib.pyplot as plt

train_sizes, train_scores, val_scores = learning_curve(
    RandomForestClassifier(random_state=42),
    X, y,
    train_sizes=np.linspace(0.1, 1.0, 10),
    cv=5,
    scoring='accuracy',
    n_jobs=-1
)

# Plot
plt.figure(figsize=(10, 6))
plt.plot(train_sizes, train_scores.mean(axis=1), label='Training accuracy', marker='o')
plt.plot(train_sizes, val_scores.mean(axis=1), label='Validation accuracy', marker='s')
plt.fill_between(train_sizes, 
                 val_scores.mean(axis=1) - val_scores.std(axis=1),
                 val_scores.mean(axis=1) + val_scores.std(axis=1),
                 alpha=0.2)
plt.xlabel('Training Set Size')
plt.ylabel('Accuracy')
plt.title('Learning Curve: Do I Need More Data?')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

# Interpretation:
# - If validation curve plateaus → You have enough data
# - If validation curve still rising → Collect more data
# - If train/val gap large → Model overfitting (regularize or simplify)

Rules of Thumb¶

Feature-to-Sample Ratio: - p/n <1: Safe (more samples than features after PCA/feature selection) - p/n = 1-5: Moderate overfitting risk (use regularization) - p/n = 5-10: High overfitting risk (use PCA, LDA, or PLS) - p/n >10: Severe overfitting (dimensionality reduction mandatory)

Where: - p = number of features (wavenumbers, typically 1000-2000) - n = number of samples (biological samples, not replicates)

Example:

n_samples, n_features = X.shape
ratio = n_features / n_samples

if ratio > 10:
    print(f"⚠️ High overfitting risk (p/n = {ratio:.1f})")
    print("   Recommendation: Use PCA to reduce to <100 components")

    from sklearn.decomposition import PCA
    pca = PCA(n_components=min(50, n_samples // 2))
    X_reduced = pca.fit_transform(X)
    print(f"   Reduced to {X_reduced.shape[1]} components ({pca.explained_variance_ratio_.sum()*100:.1f}% variance)")

Can I compare Raman vs. FTIR?¶

Short answer: Not directly, but with proper preprocessing and domain adaptation.

Why Direct Comparison Fails¶

Raman and FTIR probe different vibrational modes:

Aspect	FTIR	Raman
Molecular transitions	IR-active (change in dipole moment)	Raman-active (change in polarizability)
Water sensitivity	Strong water interference	Weak water interference
Peak positions	Wavenumber (cm⁻¹)	Raman shift (cm⁻¹)
Intensity scale	Absorbance (0-3)	Counts (0-65535) or arbitrary units
Peak patterns	Different relative intensities	Different relative intensities

Example: The C=O stretch at 1740 cm⁻¹ - FTIR: Strong absorption (IR-active) - Raman: Weak scatter (Raman-inactive)

Conversely, the C=C stretch at 1660 cm⁻¹: - FTIR: Weak absorption - Raman: Strong scatter (Raman-active)

When Comparison Makes Sense¶

1. Qualitative Comparison (Exploratory)

# Both modalities should separate the same classes
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt

# FTIR PCA
pca_ftir = PCA(n_components=2)
X_ftir_pca = pca_ftir.fit_transform(X_ftir)

# Raman PCA
pca_raman = PCA(n_components=2)
X_raman_pca = pca_raman.fit_transform(X_raman)

# Plot side-by-side
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

axes[0].scatter(X_ftir_pca[:, 0], X_ftir_pca[:, 1], c=y, cmap='viridis', alpha=0.7)
axes[0].set_title('FTIR PCA')
axes[0].set_xlabel('PC1')
axes[0].set_ylabel('PC2')

axes[1].scatter(X_raman_pca[:, 0], X_raman_pca[:, 1], c=y, cmap='viridis', alpha=0.7)
axes[1].set_title('Raman PCA')
axes[1].set_xlabel('PC1')
axes[1].set_ylabel('PC2')

plt.tight_layout()
plt.show()

# If both show similar class separation → Both modalities capture the same information

2. Complementary Fusion (Multimodal)

Combine FTIR + Raman for improved classification:

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score

# Separate models
model_ftir = RandomForestClassifier(random_state=42)
model_raman = RandomForestClassifier(random_state=42)

acc_ftir = cross_val_score(model_ftir, X_ftir, y, cv=5).mean()
acc_raman = cross_val_score(model_raman, X_raman, y, cv=5).mean()

print(f"FTIR only:  {acc_ftir:.3f}")
print(f"Raman only: {acc_raman:.3f}")

# Concatenated features (early fusion)
X_fused = np.hstack([X_ftir, X_raman])
model_fused = RandomForestClassifier(random_state=42)
acc_fused = cross_val_score(model_fused, X_fused, y, cv=5).mean()

print(f"FTIR + Raman (fused): {acc_fused:.3f}")

# Decision-level fusion (late fusion)
proba_ftir = model_ftir.fit(X_ftir_train, y_train).predict_proba(X_ftir_test)
proba_raman = model_raman.fit(X_raman_train, y_train).predict_proba(X_raman_test)

# Average probabilities
proba_avg = (proba_ftir + proba_raman) / 2
y_pred = proba_avg.argmax(axis=1)
acc_late_fusion = (y_pred == y_test).mean()

print(f"FTIR + Raman (late fusion): {acc_late_fusion:.3f}")

Best Practices¶

Separate preprocessing pipelines: FTIR and Raman require different preprocessing (different baseline methods, normalization)
Separate models: Train one model per modality, then fuse decisions
Use mid-level fusion: Extract features (e.g., PCA scores) from each modality, then combine
Report per-modality performance: Always show FTIR-only, Raman-only, and fused results

FoodSpec Example:

from foodspec.ml.multimodal import MultimodalFusion

fusion = MultimodalFusion(
    modalities=['ftir', 'raman'],
    fusion_strategy='late',  # 'early', 'late', or 'stacked'
    models={'ftir': RandomForestClassifier(), 'raman': RandomForestClassifier()}
)

fusion.fit({'ftir': X_ftir_train, 'raman': X_raman_train}, y_train)
accuracy = fusion.score({'ftir': X_ftir_test, 'raman': X_raman_test}, y_test)

print(f"Multimodal Fusion Accuracy: {accuracy:.3f}")

How do I handle chips vs. pure oils?¶

This is a domain shift problem: models trained on pure oils often fail on oils extracted from complex food matrices (chips, fried foods).

Why Matrix Effects Matter¶

Pure oil spectrum: - Clean baseline - Sharp peaks - No interference

Oil-in-chips spectrum: - Broad background from food matrix (starch, protein) - Overlapping peaks (carbohydrate absorption at 1000-1200 cm⁻¹) - Lower signal-to-noise ratio

Solution Strategies¶

1. Include Matrix Samples in Training (Recommended)¶

Best approach: Collect spectra from both pure oils AND oils extracted from chips/fried foods.

# Training set: 50% pure oils, 50% oils from chips
X_train = np.vstack([X_pure_oils, X_oils_from_chips])
y_train = np.hstack([y_pure, y_chips])

# Mark matrix type
matrix_type = np.array(['pure']*len(X_pure_oils) + ['chips']*len(X_oils_from_chips))

# Train model on mixed data
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)

# Test on chips → Should generalize
accuracy_chips = model.score(X_chips_test, y_chips_test)
print(f"Accuracy on chips: {accuracy_chips:.3f}")

2. Domain Adaptation (Transfer Learning)¶

Use when: You have a model trained on pure oils, but no labeled chips data.

from foodspec.ml.harmonization import transfer_component_analysis

# Align chips spectra to pure oil distribution
X_chips_aligned = transfer_component_analysis(
    X_source=X_pure_oils_train,  # Source domain (pure oils)
    X_target=X_chips_test,        # Target domain (chips)
    n_components=10
)

# Predict with pure oil model
y_pred = model_pure_oils.predict(X_chips_aligned)

3. Background Subtraction¶

Remove food matrix background (if you have pure matrix spectra):

# Measure pure potato chips spectrum (no oil)
pure_chips_spectrum = load_spectrum('pure_potato_chips.csv')

# Subtract matrix contribution (assume linear mixing)
X_chips_corrected = X_chips - 0.3 * pure_chips_spectrum  # 30% matrix contribution

# Now apply pure oil model
y_pred = model_pure_oils.predict(X_chips_corrected)

4. Feature Selection (Focus on Discriminative Regions)¶

Use spectral regions robust to matrix effects:

# Lipid-specific regions (less interference)
lipid_regions = [
    (3020, 2800),  # C-H stretch
    (1780, 1700),  # C=O stretch (carbonyl)
    (1500, 1400),  # C-H bend
]

# Extract regions
mask = np.zeros(len(wavenumbers), dtype=bool)
for low, high in lipid_regions:
    mask |= (wavenumbers >= low) & (wavenumbers <= high)

X_lipid_only = X[:, mask]

# Train on lipid regions only
model.fit(X_lipid_only_train, y_train)
accuracy = model.score(X_lipid_only_test, y_test)
print(f"Accuracy (lipid regions only): {accuracy:.3f}")

5. Ensemble with Confidence Thresholding¶

Flag uncertain predictions:

from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_pure_oils_train, y_pure_train)

# Predict on chips with confidence
proba = model.predict_proba(X_chips_test)
confidence = proba.max(axis=1)

# Flag low-confidence predictions
threshold = 0.7
uncertain_mask = confidence < threshold

print(f"Uncertain predictions: {uncertain_mask.sum()} / {len(X_chips_test)}")
print(f"High-confidence accuracy: {(model.predict(X_chips_test)[~uncertain_mask] == y_chips_test[~uncertain_mask]).mean():.3f}")

Case Study: Olive Oil in Fried Chips¶

Dataset: - 30 pure EVOO samples (training) - 20 EVOO-fried chips samples (testing)

Results:

Method	Accuracy (chips)	Notes
No adaptation	0.62	Baseline (poor)
Background subtraction	0.73	+11% improvement
Transfer learning (TCA)	0.78	+16% improvement
Include chips in training	0.87	+25% improvement (best)
Lipid regions only	0.81	+19% improvement

Conclusion: Including matrix samples in training is the most effective strategy (+25% accuracy).

How do I cite FoodSpec?¶

Software Citation¶

BibTeX:

@software{foodspec2025,
  author = {Chandrasekar, Narayana},
  title = {FoodSpec: Spectroscopic Analysis Toolkit for Food Quality and Authentication},
  year = {2025},
  version = {1.0.0},
  url = {https://github.com/chandrasekarnarayana/foodspec},
  doi = {10.5281/zenodo.XXXXXXX}
}

APA:

Chandrasekar, N. (2025). FoodSpec: Spectroscopic Analysis Toolkit for Food Quality and 
Authentication (Version 1.0.0) [Computer software]. https://github.com/chandrasekarnarayana/foodspec

IEEE:

N. Chandrasekar, "FoodSpec: Spectroscopic Analysis Toolkit for Food Quality and Authentication," 
Version 1.0.0, 2025. [Online]. Available: https://github.com/chandrasekarnarayana/foodspec

When to Cite¶

Cite FoodSpec if you use: - FoodSpec Python package for data analysis - FoodSpec preprocessing methods (ALS baseline, SNV, MSC) - FoodSpec protocols or workflows - FoodSpec validation strategies - Any results generated using FoodSpec code

Example Citation in Paper¶

Methods Section:

"Spectral data were analyzed using the FoodSpec toolkit (v1.0.0) [1]. Spectra underwent Asymmetric Least Squares (ALS) baseline correction (λ=10⁶, p=0.01) and Standard Normal Variate (SNV) normalization. Classification was performed using Random Forest models with grouped 10-fold cross-validation to prevent replicate leakage."

[1] N. Chandrasekar, "FoodSpec: Spectroscopic Analysis Toolkit for Food Quality and Authentication," Version 1.0.0, 2025. https://github.com/chandrasekarnarayana/foodspec

Citing Specific Methods¶

If you use specific algorithms implemented in FoodSpec, also cite the original papers:

ALS Baseline Correction:

@article{eilers2005baseline,
  title={Baseline correction with asymmetric least squares smoothing},
  author={Eilers, Paul HC and Boelens, Hans FM},
  journal={Leiden University Medical Centre Report},
  volume={1},
  number={1},
  pages={5},
  year={2005}
}

SNV Normalization:

@article{barnes1989standard,
  title={Standard normal variate transformation and de-trending of near-infrared diffuse reflectance spectra},
  author={Barnes, RJ and Dhanoa, MS and Lister, Susan J},
  journal={Applied spectroscopy},
  volume={43},
  number={5},
  pages={772--777},
  year={1989},
  publisher={SAGE Publications Sage UK: London, England}
}

Grouped Cross-Validation:

@article{brereton2010support,
  title={Support vector machines for classification and regression},
  author={Brereton, Richard G and Lloyd, Gavin R},
  journal={Analyst},
  volume={135},
  number={2},
  pages={230--267},
  year={2010},
  publisher={Royal Society of Chemistry}
}

Acknowledgments (Optional)¶

If FoodSpec significantly contributed to your research but doesn't warrant a methods citation:

"We thank the developers of FoodSpec for providing open-source tools for spectroscopic data analysis."

Get DOI for Your Specific Version¶

Archive your exact FoodSpec version on Zenodo for reproducibility:

Fork the FoodSpec repository
Create a release tag (e.g., v1.0.0-myproject)
Link to Zenodo: https://zenodo.org/account/settings/github/
Cite the specific Zenodo DOI in your paper

Example:

@software{foodspec_myproject,
  author = {Chandrasekar, Narayana and {Your Name}},
  title = {FoodSpec (Modified for Olive Oil Study)},
  year = {2025},
  version = {1.0.0-oliveoil},
  doi = {10.5281/zenodo.YYYYYYY}
}

Chemometrics & Machine Learning¶

Model selection, tuning, and validation: see Model Evaluation & Validation.
API details: ML API and Chemometrics API.
For workflows, start with Oil Authentication.

Performance & Optimization¶

Profiling tips and hardware guidance: see User Guide: Automation.
Use vectorized preprocessing and batch operations; avoid Python loops on spectra.
Benchmark with smaller subsets first to validate pipeline settings.

General Questions¶

What file formats does FoodSpec support?¶

Input formats: - CSV: Most common (wide format: rows=samples, columns=wavenumbers) - Excel (.xlsx): Via pandas read_excel() - SPC (Thermo/JCAMP): Via spc_to_df() utility - HDF5: Efficient for large datasets - NumPy arrays (.npy, .npz): Direct loading

Output formats: - HDF5: Recommended for reproducibility (stores spectra + metadata + preprocessing steps) - CSV: For sharing with collaborators - Pickle (.pkl): For serializing trained models

Example:

# CSV input
import pandas as pd
df = pd.read_csv('olive_oils.csv', index_col=0)
X = df.values
wavenumbers = df.columns.astype(float).values

# HDF5 output
from foodspec.io import save_hdf5
save_hdf5('olive_oils.h5', X=X, y=y, wavenumbers=wavenumbers, metadata={'instrument': 'Bruker Alpha'})

# HDF5 input
from foodspec.io import load_hdf5
data = load_hdf5('olive_oils.h5')
X, y, wavenumbers = data['X'], data['y'], data['wavenumbers']

Should I use CLI or Python API?¶

Use CLI if: - Quick exploratory analysis - Standardized workflows (protocols) - Batch processing many files - Don't want to write Python code - Reproducible reports (automatic HTML/PDF generation)

Use Python API if: - Custom analysis pipelines - Integration with other libraries (scikit-learn, TensorFlow) - Interactive exploration (Jupyter notebooks) - Need fine-grained control over parameters - Developing new methods

Example Workflow:

# 1. Quick exploration with CLI
foodspec analyze olive_oils.csv --plot --save-report

# 2. If satisfied, export protocol
foodspec protocol export my_workflow.yaml

# 3. Customize protocol in Python for publication
# (Fine-tune hyperparameters, add validation, etc.)

Can FoodSpec handle hyperspectral imaging (HSI)?¶

Yes! FoodSpec supports spatial spectroscopy (hyperspectral imaging) via the foodspec.spatial module.

Example:

from foodspec.spatial import HyperspectralImage

# Load HSI datacube (x, y, wavenumbers)
hsi = HyperspectralImage.from_file('apple_hsi.hdr')

# Preprocess (per-pixel)
hsi.preprocess(baseline='als', normalize='snv')

# Classify each pixel
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier()
model.fit(X_train, y_train)

# Apply model to entire image
classification_map = hsi.predict(model)

# Visualize
hsi.plot_classification_map(classification_map, cmap='viridis')

See Hyperspectral Mapping Tutorial for details.

What's the difference between PLS-DA and PCA-LDA?¶

Both are dimensionality reduction + classification, but differ in how they reduce dimensions:

Aspect	PCA-LDA	PLS-DA
Step 1	PCA (unsupervised)	PLS (supervised)
Step 2	LDA (supervised)	DA (discriminant)
When to use	Linear separability, exploratory	High p/n ratio, predictive
Pros	Simple, interpretable	Better for p >> n
Cons	PCA ignores labels	Less interpretable

Code Comparison:

from sklearn.decomposition import PCA
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.cross_decomposition import PLSRegression
from sklearn.pipeline import Pipeline

# PCA-LDA
pca_lda = Pipeline([
    ('pca', PCA(n_components=50)),
    ('lda', LinearDiscriminantAnalysis())
])

# PLS-DA
from foodspec.ml import PLSDA
pls_da = PLSDA(n_components=10)

# Compare
from sklearn.model_selection import cross_val_score
scores_pca_lda = cross_val_score(pca_lda, X, y, cv=5)
scores_pls_da = cross_val_score(pls_da, X, y, cv=5)

print(f"PCA-LDA: {scores_pca_lda.mean():.3f}")
print(f"PLS-DA:  {scores_pls_da.mean():.3f}")

Recommendation: Try both and compare. PLS-DA often performs better when p >> n.

How do I export results for publication?¶

FoodSpec provides several export options:

1. Figures (High-Resolution):

import matplotlib.pyplot as plt

# Create figure
fig, ax = plt.subplots(figsize=(8, 6))
ax.plot(wavenumbers, X.mean(axis=0))
ax.invert_xaxis()
ax.set_xlabel('Wavenumber (cm⁻¹)')
ax.set_ylabel('Absorbance')

# Save as high-res PNG or vector PDF
plt.savefig('spectrum.png', dpi=300, bbox_inches='tight')
plt.savefig('spectrum.pdf', bbox_inches='tight')  # Vector (preferred for publication)

2. Tables (CSV/Excel):

import pandas as pd

# Results table
results = pd.DataFrame({
    'Sample': sample_ids,
    'True_Label': y_true,
    'Predicted_Label': y_pred,
    'Confidence': proba.max(axis=1)
})

results.to_csv('results.csv', index=False)
results.to_excel('results.xlsx', index=False)

3. Statistical Reports (LaTeX):

from foodspec.reporting import generate_latex_table

# Confusion matrix
latex_table = generate_latex_table(
    confusion_matrix,
    row_labels=class_names,
    col_labels=class_names,
    caption='Confusion matrix for olive oil classification',
    label='tab:confusion'
)

with open('table.tex', 'w') as f:
    f.write(latex_table)

# Include in LaTeX document:
# \input{table.tex}

4. Complete HTML Reports:

from foodspec.protocols import Protocol

protocol = Protocol.from_yaml('my_analysis.yaml')
results = protocol.run(data_path='olive_oils.csv')

# Generate HTML report with all figures, tables, and methods text
protocol.generate_report(output='report.html')

Still Have Questions?¶

Technical issues? See Troubleshooting
Need help? Open an issue on GitHub
Want to contribute? See Developer Guide
Looking for examples? Browse Tutorials and Methods & Validation

Troubleshooting – Technical error solutions
Getting Started – Installation and first steps
Validation → Leakage Prevention – Avoid overoptimistic results
Cookbook → Preprocessing Guide – Preprocessing recipes
Reference → Glossary – Term definitions

Frequently Asked Questions (FAQ)¶

Installation¶

Data Formats & Input/Output¶

Preprocessing & Methods¶

Which baseline method should I use?¶

Quick Decision Tree¶

Detailed Recommendations¶

Code Examples¶

Performance Comparison¶

Visual Comparison¶

How many samples do I need?¶

Quick Guidelines¶

Statistical Power Analysis¶

Learning Curves¶

Rules of Thumb¶

Can I compare Raman vs. FTIR?¶

Why Direct Comparison Fails¶

When Comparison Makes Sense¶

Best Practices¶

How do I handle chips vs. pure oils?¶

Why Matrix Effects Matter¶

Solution Strategies¶

1. Include Matrix Samples in Training (Recommended)¶

2. Domain Adaptation (Transfer Learning)¶

3. Background Subtraction¶

4. Feature Selection (Focus on Discriminative Regions)¶

5. Ensemble with Confidence Thresholding¶

Case Study: Olive Oil in Fried Chips¶

How do I cite FoodSpec?¶

Software Citation¶

When to Cite¶

Example Citation in Paper¶

Citing Specific Methods¶

Acknowledgments (Optional)¶

Get DOI for Your Specific Version¶

Chemometrics & Machine Learning¶

Performance & Optimization¶

General Questions¶

What file formats does FoodSpec support?¶

Should I use CLI or Python API?¶

Can FoodSpec handle hyperspectral imaging (HSI)?¶

What's the difference between PLS-DA and PCA-LDA?¶

How do I export results for publication?¶

Still Have Questions?¶

Related Pages¶