Introduction to Statistical Analysis in Food Spectroscopy¶

Statistical analysis complements chemometrics and machine learning by testing hypotheses, quantifying uncertainty, and framing results in a way reviewers and regulators expect. This chapter situates classical statistics within Raman/FTIR/NIR workflows in FoodSpec.

Why statistics matters¶

Validation: Confirm that observed differences (e.g., between oil types) are unlikely to be random.
Interpretation: Link spectral features (peaks/ratios/PCs) to food science questions (authenticity, degradation).
Reporting: Provide p-values, confidence intervals, and effect sizes alongside ML metrics for rigor and reproducibility.

Data types in FoodSpec¶

Raw spectra: Intensity vs wavenumber (cm⁻¹).
Derived features: Peak heights/areas, band integrals, ratios, PCA scores, mixture coefficients.
Metadata: Group labels (oil_type), time/temperature (heating), batches/instruments.

Where tests fit in workflows¶

Oil authentication: ANOVA/Tukey on ratios or PC scores across oil types; t-tests on binary comparisons.
Heating quality: Correlation of ratios vs time; ANOVA across stages.
Mixture analysis: MANOVA/ANOVA on mixture proportions vs spectral features.
Batch QC: Tests comparing reference vs suspect sets; correlation maps.

Assumptions and preprocessing¶

Many tests assume approximate normality, homoscedasticity, and independence.
Good preprocessing (baseline, normalization, scatter correction) reduces artifacts that violate assumptions.
When assumptions fail, consider nonparametric tests or robust designs (see Nonparametric methods).

Quick example¶

import pandas as pd
from foodspec.stats import run_anova

df = pd.DataFrame({"ratio": [1.0, 1.1, 0.9, 1.8, 1.7, 1.9],
                   "oil_type": ["olive", "olive", "olive", "sunflower", "sunflower", "sunflower"]})
res = run_anova(df["ratio"], df["oil_type"])
print(res.summary)

Decision aid: tests vs models¶

flowchart LR
  A[Question] --> B{Compare means?}
  B -->|Yes| C{Groups > 2?}
  C -->|No| D[t-test]
  C -->|Yes| E[ANOVA/MANOVA + post-hoc]
  B -->|No| F{Association?}
  F -->|Yes| G[Correlation (Pearson/Spearman)]
  F -->|No| H[Predictive modeling (see ML chapters)]

When Results Cannot Be Trusted¶

⚠️ Red flags for statistical analysis validity across all methods:

Assumptions not checked before test selection
Each test (t-test, ANOVA, correlation) assumes normality, independence, or linearity
Violating assumptions without correcting inflates Type I error
Fix: Always check Q-Q plots, Shapiro–Wilk, Levene's test; use robust/nonparametric alternatives
P-value interpreted as truth (p = 0.04 → "result is true with 96% confidence")
p-value is probability of observing data IF null hypothesis is true; not probability that result is true
Misinterpretation is widespread and leads to overconfidence
Fix: Report effect size and confidence intervals; avoid binary "significant/not significant" language
Multiple testing without correction (testing 50 hypotheses at α = 0.05, expecting ≤2.5 false positives by chance)
Uncorrected p-values are misleading when many tests performed
Problem scales with number of tests
Fix: Declare hypotheses a priori; apply Bonferroni, FDR, or permutation-based correction
Sample size chosen arbitrarily ("n = 10 seems reasonable") without power analysis
Underpowered studies miss real effects and inflate false negatives
No rationale for sample choice reduces credibility
Fix: Conduct a priori power analysis based on target effect size and power (0.80)
Preprocessing choices undisclosed (baseline correction, outlier removal, transformation)
Different preprocessing → different results, even on same raw data
Undisclosed choices enable hidden p-hacking
Fix: Freeze preprocessing before analysis; document all choices; report sensitivity to preprocessing
Batch confounding (treatment A = Day 1 analyzer, treatment B = Day 2 analyzer)
Systematic batch effects mimick biological differences
Impossible to know whether test detects biology or batch artifact
Fix: Randomize sample order; include batch in model; use batch-aware CV
Selective reporting of results (reporting 3 significant tests out of 20 performed)
Bias toward positive results inflates false positive rate
Non-significant results are equally informative
Fix: Pre-register analyses; report all tests performed; use exploratory vs confirmatory designations
Independence assumption violated (using technical replicates as if independent biological replicates)
Repeated measurements of same unit are autocorrelated
Tests assuming independence produce inflated significance
Fix: Analyze ≥3 distinct samples; document which measurements are replicates; use mixed-effects models if nested

Next Steps¶

Hypothesis testing — Statistical significance framework.
Study design — Plan robust experiments.
API: Statistics — Full reference for all statistical functions.