Skip to content

Statistics API

Statistical testing and effect size calculations for spectral analysis.

The foodspec.stats module provides hypothesis testing, correlation analysis, and statistical reporting tools for comparing spectral measurements.

Time Metrics

Functions for analyzing temporal degradation trends.

linear_slope

Linear trend slope and intercept for time series.

Compute linear slope and intercept for time series.

Parameters:

Name Type Description Default
t ndarray

Time points (1D array).

required
y ndarray

Observation values (1D array, same length as t).

required

Returns:

Type Description
Tuple[float, float]

Tuple of (slope, intercept) from linear fit.

Raises:

Type Description
ValueError

If t and y are not 1D arrays of equal length.

quadratic_acceleration

Acceleration coefficient from a quadratic fit to time series.

Compute quadratic acceleration (second derivative) for time series.

Parameters:

Name Type Description Default
t ndarray

Time points (1D array).

required
y ndarray

Observation values (1D array, same length as t).

required

Returns:

Type Description
float

Acceleration coefficient (2 × quadratic coefficient from y = at² + bt + c).

Raises:

Type Description
ValueError

If t and y are not 1D arrays of equal length.

rolling_slope

Rolling-window linear slope across time series.

Compute rolling window linear slope for time series.

Parameters:

Name Type Description Default
t ndarray

Time points (1D array).

required
y ndarray

Observation values (1D array, same length as t).

required
window int

Rolling window size (must be >= 2). Defaults to 5.

5

Returns:

Type Description
ndarray

Array of rolling slopes (same length as t; early values are NaN).

Raises:

Type Description
ValueError

If window < 2.

ValueError

If t and y are not 1D arrays of equal length.

Hypothesis Testing

run_ttest

Independent or paired t-test.

t-test: one-sample, paired, or two-sample (Welch's).

Tests whether sample mean(s) differ significantly from a population mean or each other. Welch's t-test (default for two-sample) does NOT assume equal variances, making it more robust than Student's t-test.

Test Selection: - One-sample: run_ttest(x, popmean=5) — Does x differ from 5? - Paired: run_ttest(before, after, paired=True) — Before vs. after? - Two-sample: run_ttest(groupA, groupB) — Do groups differ?

Assumptions: Normality (robust if n > 30), independence, random sampling

Significance: See Metric Significance Tables

Parameters:

Name Type Description Default
sample1 array - like

First sample

required
sample2 array - like

Second sample

None
popmean float

Population mean (for one-sample)

None
paired bool

Whether samples are paired

False
alternative str

"two-sided", "less", or "greater"

'two-sided'

Returns:

Type Description
TestResult

TestResult with t-statistic, p-value, df

Examples:

>>> from foodspec.stats import run_ttest
>>> result = run_ttest([1, 2, 3], [4, 5, 6])
>>> print(f"p = {result.pvalue:.3f}")
See Also

run_anova

One-way ANOVA for comparing multiple groups.

One-way ANOVA: test if 3+ groups have different means.

Tests null hypothesis that all group means are equal. ANOVA partitions variance into between-group and within-group components. F-statistic is their ratio: large F → significant difference.

When to Use: - 3+ groups (use t-test for 2 groups) - Roughly equal sample sizes per group - Data approximately normal (robust if n > 20 per group)

Assumptions: Normality, homogeneity of variance, independence

Post-hoc Tests (if p < 0.05): - Balanced design: Tukey HSD (run_tukey_hsd) - Unequal variances: Games-Howell (games_howell) - Multiple hypotheses: Benjamini-Hochberg FDR (benjamini_hochberg)

Red Flags: - Significant ANOVA but non-significant post-hoc: likely Type I error - Large effect size (η² > 0.14) but p > 0.05: underpowered

Parameters:

Name Type Description Default
data array - like

All observations

required
groups array - like

Group labels (same length as data)

required

Returns:

Type Description
TestResult

TestResult with F-statistic, p-value

Examples:

>>> from foodspec.stats import run_anova
>>> result = run_anova([1, 2, 1, 5, 6, 5, 9, 10, 9],
...                    ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'])
>>> assert result.pvalue < 0.05  # Groups differ
See Also

run_manova

Multivariate ANOVA for multiple dependent variables.

MANOVA via statsmodels.

Supports two usage patterns: - run_manova(df, group_col="group", dependent_cols=["f1","f2"]) # formula - run_manova(data_df, groups=labels) # use all columns in data_df

Returns a TestResult with pvalue extracted from the MANOVA table (prefers Wilks' lambda, falls back to Pillai's trace).

run_kruskal_wallis

Non-parametric alternative to ANOVA.

Perform Kruskal-Wallis H-test for comparing three or more independent groups.

The Kruskal-Wallis test is a non-parametric alternative to one-way ANOVA. It tests whether samples from different groups come from the same distribution. Use this when comparing 3+ groups with non-normal data or unequal variances.

Parameters:

Name Type Description Default
data DataFrame | Iterable

Either a DataFrame or list of arrays. If DataFrame, must provide group_col and value_col. If list, each element is an array of values for one group.

required
group_col str | None

Column name for group labels (required if data is DataFrame).

None
value_col str | None

Column name for values to compare (required if data is DataFrame).

None

Returns:

Type Description
TestResult

TestResult with: - statistic: Kruskal-Wallis H statistic - pvalue: p-value from chi-squared distribution - df: None (df implicit in chi-squared approximation) - summary: DataFrame with test details

Raises:

Type Description
ValueError

If fewer than 2 groups provided or if DataFrame arguments are missing.

Example

from foodspec import run_kruskal_wallis import pandas as pd df = pd.DataFrame({ ... 'group': ['A']10 + ['B']10 + ['C']*10, ... 'value': list(range(10)) + list(range(10, 20)) + list(range(20, 30)) ... }) result = run_kruskal_wallis(df, 'group', 'value') print(f"H={result.statistic:.2f}, p={result.pvalue:.4f}")

See Also

run_anova: Parametric alternative for normal data run_mannwhitney_u: For comparing exactly 2 groups run_friedman_test: For repeated measures (paired data)

run_mannwhitney_u

Non-parametric alternative to t-test.

Perform Mann-Whitney U test for comparing two independent samples.

The Mann-Whitney U test (also called Wilcoxon rank-sum test) is a non-parametric test for comparing the distributions of two independent groups. It does not assume normality and is robust to outliers. Use this when data are ordinal or continuous but violate t-test assumptions.

Parameters:

Name Type Description Default
data DataFrame | Iterable

Either a DataFrame or array-like. If DataFrame, must provide group_col and value_col. If array-like, represents first sample (requires group_col as second sample or unpacking into two samples).

required
group_col object | None

Column name for group labels (if data is DataFrame) or second sample array (if data is array-like).

None
value_col str | None

Column name for values to compare (required if data is DataFrame).

None
alternative str

Direction of test - "two-sided", "less", or "greater" (default: "two-sided").

'two-sided'

Returns:

Type Description
TestResult

TestResult with: - statistic: Mann-Whitney U statistic - pvalue: Two-tailed (or one-tailed) p-value - df: None (non-parametric test has no degrees of freedom) - summary: DataFrame with test details

Raises:

Type Description
ValueError

If data format is invalid or if not exactly two groups provided.

Example

from foodspec import run_mannwhitney_u import pandas as pd df = pd.DataFrame({'group': ['A']10 + ['B']10, 'value': range(20)}) result = run_mannwhitney_u(df, 'group', 'value') print(f"U={result.statistic:.2f}, p={result.pvalue:.4f}")

See Also

run_ttest: Parametric alternative for normal data run_kruskal_wallis: Extension to 3+ groups run_wilcoxon_signed_rank: For paired samples

Effect Sizes

compute_cohens_d

Cohen's d effect size for group comparisons.

Compute Cohen's d effect size for two groups.

Quantifies the magnitude of difference between two groups independent of sample size. Essential for evaluating whether statistically significant differences are also practically significant.

Parameters:

Name Type Description Default
group1 ndarray

First group numerical samples.

required
group2 ndarray

Second group numerical samples.

required
pooled bool

If True, use pooled standard deviation (assumes equal variances). If False, average unpooled standard deviations.

True

Returns:

Type Description
float

Cohen's d effect size (unbounded; can be negative).

Examples:

>>> from foodspec.stats.effects import compute_cohens_d
>>> import numpy as np
>>> g1 = np.array([1, 2, 3, 4, 5])
>>> g2 = np.array([3, 4, 5, 6, 7])
>>> d = compute_cohens_d(g1, g2)
>>> abs(d) > 0.5
True

compute_anova_effect_sizes

Effect sizes (eta-squared, omega-squared) for ANOVA.

Compute eta-squared and partial eta-squared for ANOVA.

Quantifies proportion of variance explained by group differences.

Interpretation Scale: - eta-squared < 0.01: Negligible effect - 0.01 <= eta-squared < 0.06: Small effect - 0.06 <= eta-squared < 0.14: Medium effect - eta-squared >= 0.14: Large effect

Parameters:

Name Type Description Default
ss_between float

Sum of squares between groups (treatment effect).

required
ss_total float

Total sum of squares (all variation in data).

required
ss_within float | None

Sum of squares within groups (error). If provided, partial eta-squared is computed.

None

Returns:

Type Description
Series

A Series with 'eta_squared' and optionally 'partial_eta_squared'.

Examples:

>>> from foodspec.stats.effects import compute_anova_effect_sizes
>>> result = compute_anova_effect_sizes(ss_between=50, ss_total=200, ss_within=150)
>>> 'eta_squared' in result
True

Correlations & Distances

compute_correlations

Pearson or Spearman correlation between features.

Compute correlation between columns in a DataFrame.

Parameters:

Name Type Description Default
data DataFrame

DataFrame containing the columns of interest (e.g., ratios vs quality metric).

required
cols tuple | list

Two column names to correlate (tuple or list).

required
method str

'pearson' or 'spearman', by default 'pearson'.

'pearson'

Returns:

Type Description
Series

Series with index ['r', 'pvalue']; values are correlation coefficient and p-value.

Raises:

Type Description
ValueError

If cols does not contain exactly two column names.

ValueError

If method is not 'pearson' or 'spearman'.

euclidean_distance

Euclidean distance between spectra.

Compute Euclidean distance between two spectra.

Parameters:

Name Type Description Default
a ndarray

First spectrum (1D or flattened array).

required
b ndarray

Second spectrum (1D or flattened array).

required

Returns:

Type Description
float

Euclidean distance (L2 norm of difference).

cosine_distance

Cosine distance (1 - cosine similarity).

Compute cosine distance (1 - cosine similarity) between two spectra.

Parameters:

Name Type Description Default
a ndarray

First spectrum (1D or flattened array).

required
b ndarray

Second spectrum (1D or flattened array).

required

Returns:

Type Description
float

Cosine distance (1 - cosine similarity).

Robustness Analysis

bootstrap_metric

Bootstrap confidence intervals for metrics.

Estimate robustness of a metric via bootstrap resampling.

Parameters:

Name Type Description Default
metric_func Callable[[ndarray, ndarray], float]

Function taking (y_true, y_pred) and returning a scalar metric.

required
y_true ndarray

True targets (array-like).

required
y_pred ndarray

Predicted targets (array-like).

required
n_bootstrap int

Number of bootstrap samples, by default 1000.

1000
ci tuple[float, float]

Confidence interval percentiles (lower, upper), by default (2.5, 97.5).

(2.5, 97.5)
random_state Optional[int]

Seed for reproducibility.

None

Returns:

Type Description
dict

Dictionary with keys: 'observed' (original metric), 'bootstrap_samples'

dict

(array of bootstrap metrics), and 'ci' (tuple of CI bounds).

permutation_test_metric

Permutation test for statistical significance.

Permutation test for a metric by shuffling the target labels.

Parameters

metric_func : callable Function taking (y_true, y_pred) and returning a scalar metric. y_true : array-like True targets. y_pred : array-like Predicted targets. n_permutations : int, optional Number of permutations, by default 1000. metric_higher_is_better : bool, optional If False, p-value computation is reversed (e.g., for RMSE), by default True. random_state : int, optional Seed for reproducibility.

Returns

dict Contains observed metric, permutation samples, and an empirical p-value.

See Also