Troubleshooting¶

Quick Help

This page provides solutions to common technical issues encountered when using FoodSpec. For conceptual questions, see the FAQ. For reporting and reproducibility guidelines, see Reporting & Reproducibility.

Quick Problem Index¶

Stage	Problem	Quick Fix
Installation	pip install fails	Check Python version ≥3.8; update pip
Installation	Import errors	Verify same Python/pip environment
Data	Missing labels	Use metadata validation tools
Data	Class imbalance	Use F1/PR metrics; resample or weight
Preprocessing	Over-smoothing	Reduce Savitzky–Golay window
Preprocessing	Poor baseline	Tune ALS lambda; try rubberband baseline
ML	Overfitting	Regularize; simplify; use stratified CV
ML	Data leakage	Ensure preprocessing inside Pipeline
Stats	Non-normal residuals	Use nonparametric tests
Stats	Multiple comparisons	Apply FDR/Tukey correction
Visualization	Unlabeled axes	Label wavenumber (cm⁻¹), intensity (a.u.)
Workflow	Wrong metrics	Consult workflow design guide

Installation Issues¶

Problem: `pip install foodspec` fails¶

Symptoms:

ERROR: Could not find a version that satisfies the requirement foodspec
ERROR: No matching distribution found for foodspec

Causes & Solutions:

1. Python Version Incompatibility¶

Check your Python version:

python --version

Solution: FoodSpec requires Python ≥3.8. Upgrade Python:

# Using conda
conda create -n foodspec python=3.10
conda activate foodspec
pip install foodspec

# Using pyenv
pyenv install 3.10.0
pyenv local 3.10.0
pip install foodspec

2. Outdated pip¶

Update pip:

pip install --upgrade pip setuptools wheel
pip install foodspec

3. Network/Firewall Issues¶

Try alternative PyPI mirrors:

# Use a specific PyPI mirror
pip install --index-url https://pypi.org/simple foodspec

# Install with verbose output to diagnose
pip install -v foodspec

4. Package Name Confusion¶

Verify the correct package name:

# Correct
pip install foodspec

NOT: pip install food-spec, FoodSpec, foodspectra, etc.

Problem: Import errors after installation¶

Symptoms:

>>> import foodspec
ModuleNotFoundError: No module named 'foodspec'

Causes & Solutions:

1. Multiple Python Environments¶

Check which Python is active:

which python
which pip
python -c "import sys; print(sys.executable)"

Solution: Ensure pip and python are from the same environment:

# Use python -m pip instead
python -m pip install foodspec

# Verify installation
python -c "import foodspec; print(foodspec.__version__)"

2. Development Installation Not Linked¶

If installing from source:

# Editable install
cd /path/to/foodspec
pip install -e .

# Verify
python -c "import foodspec; print(foodspec.__file__)"

3. PYTHONPATH Issues¶

Check PYTHONPATH:

echo $PYTHONPATH

Solution: Add FoodSpec to PYTHONPATH (if needed):

export PYTHONPATH="/path/to/foodspec/src:$PYTHONPATH"

Data Loading Issues¶

Verify file paths are correct relative to your working directory; prefer absolute paths when scripting.
Confirm delimiters/headers match loader expectations (e.g., wavenumber column present for CSV/HDF5 helpers).
For registry-driven runs, check that metadata tables point to existing files and have consistent sample IDs.

Missing Dependencies¶

Problem: Optional dependencies not installed¶

Symptoms:

>>> from foodspec.visualization import plot_spectra
ImportError: matplotlib is required for visualization. Install with: pip install foodspec[viz]

Solution: Install optional dependency groups:

# Visualization (matplotlib, seaborn)
pip install foodspec[viz]

# Machine learning (scikit-learn, xgboost)
pip install foodspec[ml]

# All optional dependencies
pip install foodspec[all]

# Multiple groups
pip install foodspec[viz,ml]

Available groups: - viz: Plotting and visualization - ml: Machine learning models (RF, XGBoost) - notebooks: Jupyter notebook support - dev: Development tools (pytest, black, mypy) - docs: Documentation building (mkdocs, mkdocstrings) - all: All optional dependencies

Problem: Conflicting dependency versions¶

Symptoms:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed.
ERROR: foodspec requires numpy>=1.20, but you have numpy 1.19.5

Solution 1: Upgrade conflicting packages

pip install --upgrade numpy scipy scikit-learn
pip install foodspec

Solution 2: Use a clean environment

conda create -n foodspec-clean python=3.10
conda activate foodspec-clean
pip install foodspec

Solution 3: Use conda for dependency management

conda install -c conda-forge numpy scipy scikit-learn matplotlib
pip install foodspec

Shape/Axis Mismatch Errors¶

Problem: "Shapes do not match" during preprocessing¶

Symptoms:

>>> from foodspec.preprocessing import baseline_als
>>> X_corrected = baseline_als(X)
ValueError: operands could not be broadcast together with shapes (100, 1800) (1801,)

Diagnosis:

import numpy as np

print(f"X shape: {X.shape}")  # e.g., (100, 1800)
print(f"wavenumbers shape: {wavenumbers.shape}")  # e.g., (1801,)

# Problem: wavenumbers has 1801 elements, but X has 1800 columns

Causes & Solutions:

1. Wavenumber Grid Mismatch¶

Solution: Ensure wavenumber array matches spectral columns:

# Check alignment
assert X.shape[1] == len(wavenumbers), f"Mismatch: {X.shape[1]} vs {len(wavenumbers)}"

# If mismatch, trim or interpolate
if X.shape[1] != len(wavenumbers):
    # Option A: Trim wavenumbers to match X
    wavenumbers = wavenumbers[:X.shape[1]]

    # Option B: Trim X to match wavenumbers
    X = X[:, :len(wavenumbers)]

    # Option C: Interpolate to common grid (recommended)
    from foodspec.preprocessing import interpolate_to_grid
    X, wavenumbers = interpolate_to_grid(X, wavenumbers, new_grid=np.arange(4000, 650, -2))

2. Row vs. Column Confusion¶

Problem: Transpose needed

# Wrong: X is (n_wavenumbers, n_samples) instead of (n_samples, n_wavenumbers)
print(X.shape)  # (1800, 100) - WRONG

# Solution: Transpose
X = X.T
print(X.shape)  # (100, 1800) - CORRECT

FoodSpec convention: Rows = samples, Columns = wavenumbers

3. 1D vs. 2D Array¶

Problem: Single spectrum treated as 2D

# Wrong
single_spectrum = X[0]  # Shape: (1800,)
baseline_als(single_spectrum)  # Error: expects 2D

# Solution: Reshape to 2D
single_spectrum = X[0:1]  # Shape: (1, 1800)
# OR
single_spectrum = X[0].reshape(1, -1)
baseline_als(single_spectrum)  # Works

Problem: "Axis out of range" errors¶

Symptoms:

>>> from foodspec.ml import fit_pls
>>> model = fit_pls(X, y, n_components=10)
numpy.AxisError: axis 1 is out of bounds for array of dimension 1

Diagnosis:

print(f"X ndim: {X.ndim}")  # Should be 2
print(f"X shape: {X.shape}")
print(f"y ndim: {y.ndim}")  # Should be 1 for labels

Solution:

# Ensure X is 2D
if X.ndim == 1:
    X = X.reshape(1, -1)  # Single sample

# Ensure y is 1D (for classification/regression)
if y.ndim > 1:
    y = y.ravel()  # Flatten (100, 1) → (100,)

NaNs After Preprocessing¶

Problem: NaNs appear after baseline correction¶

Symptoms:

>>> X_corrected = baseline_als(X, lam=1e6, p=0.01)
>>> np.isnan(X_corrected).sum()
1500  # Many NaNs!

Causes & Solutions:

1. Input Contains NaNs¶

Check input:

print(f"NaNs in input: {np.isnan(X).sum()}")

Solution: Remove or impute NaNs before preprocessing:

# Option A: Drop samples with NaNs
mask = ~np.isnan(X).any(axis=1)
X_clean = X[mask]

# Option B: Impute with interpolation
from scipy.interpolate import interp1d
for i in range(X.shape[0]):
    if np.isnan(X[i]).any():
        nan_mask = np.isnan(X[i])
        not_nan = ~nan_mask
        X[i, nan_mask] = np.interp(
            np.where(nan_mask)[0],
            np.where(not_nan)[0],
            X[i, not_nan]
        )

2. Division by Zero in Normalization¶

Problem: Zero or near-zero standard deviation

# SNV normalization: (X - mean) / std
# If std ≈ 0 → division by zero → NaN

from foodspec.preprocessing import snv
X_norm = snv(X)

# Diagnosis
stds = X.std(axis=1, ddof=1)
print(f"Samples with std < 1e-6: {(stds < 1e-6).sum()}")

Solution: Add epsilon to denominator or filter flat spectra:

# Option A: Filter flat spectra
threshold = 1e-4
mask = X.std(axis=1, ddof=1) > threshold
X_filtered = X[mask]
X_norm = snv(X_filtered)

# Option B: Custom SNV with epsilon
def snv_safe(X, eps=1e-8):
    mean = X.mean(axis=1, keepdims=True)
    std = X.std(axis=1, ddof=1, keepdims=True)
    std = np.where(std < eps, eps, std)  # Avoid division by zero
    return (X - mean) / std

X_norm = snv_safe(X)

3. Baseline Correction Failure¶

Problem: ALS baseline correction fails on saturated spectra

# Diagnosis: Check for saturation
print(f"Max intensity: {X.max()}")
print(f"Saturated pixels: {(X > 3.0).sum()}")  # Absorbance > 3.0

Solution: Clip intensities or skip saturated spectra:

# Clip absorbance to reasonable range
X_clipped = np.clip(X, -0.5, 3.0)
X_corrected = baseline_als(X_clipped, lam=1e6, p=0.01)

Problem: NaNs after derivative calculation¶

Symptoms:

>>> from foodspec.preprocessing import savgol_filter
>>> X_deriv = savgol_filter(X, window_length=11, polyorder=2, deriv=1)
>>> np.isnan(X_deriv).sum()
50  # NaNs at edges

Cause: Edge effects in Savitzky-Golay filter

Solution: Use mode='interp' or trim edges:

from scipy.signal import savgol_filter

# Option A: Use interp mode (extrapolates to edges)
X_deriv = savgol_filter(X, window_length=11, polyorder=2, deriv=1, mode='interp', axis=1)

# Option B: Trim edges
window = 11
edge = window // 2
X_deriv = savgol_filter(X, window_length=window, polyorder=2, deriv=1, axis=1)
X_deriv = X_deriv[:, edge:-edge]  # Remove edge columns
wavenumbers = wavenumbers[edge:-edge]  # Also trim wavenumbers

Model Overfitting / Too-Good Accuracy¶

Problem: Suspiciously high accuracy (>95%)¶

Symptoms:

>>> from sklearn.model_selection import cross_val_score
>>> scores = cross_val_score(RandomForestClassifier(), X, y, cv=5)
>>> scores.mean()
0.989  # 98.9% accuracy - too good to be true!

Diagnosis Checklist:

# 1. Check for data leakage (replicate leakage)
print(f"Number of samples: {len(np.unique(sample_ids))}")
print(f"Number of spectra: {len(X)}")
print(f"Replicates per sample: {len(X) / len(np.unique(sample_ids))}")

# 2. Check train/test split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print(f"Train size: {len(X_train)}, Test size: {len(X_test)}")

# 3. Check class balance
from collections import Counter
print(f"Class distribution: {Counter(y)}")

# 4. Check for preprocessing leakage
# Did you normalize BEFORE splitting? That's leakage!

Common Causes & Solutions:

1. Replicate Leakage¶

Problem: Technical replicates split across train/test

Diagnosis:

from sklearn.model_selection import GroupKFold, KFold

# Random CV (leaky)
cv_random = KFold(n_splits=5, shuffle=True, random_state=42)
scores_random = cross_val_score(model, X, y, cv=cv_random)

# Grouped CV (correct)
cv_grouped = GroupKFold(n_splits=5)
scores_grouped = cross_val_score(model, X, y, cv=cv_grouped, groups=sample_ids)

print(f"Random CV:  {scores_random.mean():.3f}")
print(f"Grouped CV: {scores_grouped.mean():.3f}")
print(f"Drop: {scores_random.mean() - scores_grouped.mean():.3f}")

# If drop > 0.10 → replicate leakage!

Solution: Always use grouped CV

from foodspec.ml.validation import grouped_cross_validation

results = grouped_cross_validation(
    X, y,
    groups=sample_ids,  # Critical!
    model=RandomForestClassifier(),
    n_splits=5,
    n_repeats=10
)
print(f"Realistic Accuracy: {results['accuracy_mean']:.3f} ± {results['accuracy_ci']:.3f}")

2. Preprocessing Leakage¶

Problem: Normalization fit on entire dataset before splitting

Wrong:

# ❌ WRONG: Preprocessing before splitting
X_norm = snv(X)  # Uses statistics from entire dataset
X_train, X_test = train_test_split(X_norm, y)
model.fit(X_train, y_train)

Correct:

# ✅ CORRECT: Preprocessing within CV folds
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

pipe = Pipeline([
    ('scaler', StandardScaler()),  # Fit only on train in each fold
    ('model', RandomForestClassifier())
])

scores = cross_val_score(pipe, X, y, cv=GroupKFold(5), groups=sample_ids)

3. Overfitting Small Datasets¶

Problem: More features than samples (p >> n)

Diagnosis:

n_samples, n_features = X.shape
print(f"Samples: {n_samples}, Features: {n_features}")
print(f"Feature-to-sample ratio: {n_features / n_samples:.1f}")

# If ratio > 10 → high overfitting risk

Solution: Reduce features or regularize

# Option A: Feature selection (PCA)
from sklearn.decomposition import PCA
pca = PCA(n_components=min(50, n_samples // 2))
X_reduced = pca.fit_transform(X)

# Option B: Regularized models
from sklearn.linear_model import LogisticRegression
model = LogisticRegression(penalty='l1', solver='saga', C=0.1)  # L1 regularization

# Option C: Simpler models
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
model = LinearDiscriminantAnalysis()  # Good for p >> n

Domain Shift Failure¶

Problem: Model fails on new batches/instruments¶

Symptoms:

# Trained on Batch 1-4, tested on Batch 5
>>> model.fit(X_train, y_train)  # Batches 1-4
>>> accuracy_train = model.score(X_train, y_train)
>>> accuracy_test = model.score(X_test, y_test)  # Batch 5
>>> print(f"Train: {accuracy_train:.3f}, Test: {accuracy_test:.3f}")
Train: 0.93, Test: 0.65  # 28% drop!

Diagnosis:

# Visualize batch separation (PCA)
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt

pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)

plt.figure(figsize=(10, 6))
for batch in np.unique(batches):
    mask = batches == batch
    plt.scatter(X_pca[mask, 0], X_pca[mask, 1], label=f'Batch {batch}', alpha=0.7)
plt.xlabel('PC1')
plt.ylabel('PC2')
plt.legend()
plt.title('Batch Effect Visualization')
plt.show()

# If batches form distinct clusters → domain shift

Causes & Solutions:

1. Instrument/Day Variability¶

Solution A: Batch Correction (ComBat)

from neuroCombat import neuroCombat

# Harmonize spectra to remove batch effects
X_harmonized = neuroCombat(
    dat=X.T,  # Features × samples
    covars={'batch': batches},
    categorical_cols=['batch']
)['data'].T

# Re-train on harmonized data
model.fit(X_harmonized_train, y_train)
accuracy_harmonized = model.score(X_harmonized_test, y_test)
print(f"Post-Harmonization: {accuracy_harmonized:.3f}")

Solution B: Transfer Learning

from foodspec.ml.harmonization import transfer_component_analysis

# Align source (old batches) to target (new batch)
X_aligned = transfer_component_analysis(
    X_source=X_train,
    X_target=X_test,
    n_components=10
)

# Re-train on aligned data
model.fit(X_aligned_train, y_train)

Solution C: Standard Addition (Calibration Transfer)

# Measure standard samples on both instruments
# Use piecewise direct standardization (PDS)
from foodspec.ml.calibration import piecewise_direct_standardization

X_test_corrected = piecewise_direct_standardization(
    X_source=X_train_standards,
    X_target=X_test_standards,
    X_to_correct=X_test,
    window_size=9
)

2. Temperature/Humidity Drift¶

Solution: Include environmental covariates or normalize by reference

# Option A: Reference normalization (MSC to reference spectrum)
from foodspec.preprocessing import msc

reference = X_train.mean(axis=0)  # Use training mean as reference
X_train_norm = msc(X_train, reference=reference)
X_test_norm = msc(X_test, reference=reference)

# Option B: Model environmental variables
import pandas as pd
metadata = pd.DataFrame({
    'temperature': [...],
    'humidity': [...],
    'spectrum': X.tolist()
})

# Include as features or stratify

Reproducibility Mismatch¶

Problem: Results differ across runs despite setting seed¶

Symptoms:

# Run 1
>>> np.random.seed(42)
>>> scores1 = cross_val_score(RandomForestClassifier(random_state=42), X, y, cv=5)
>>> scores1.mean()
0.873

# Run 2 (same code)
>>> np.random.seed(42)
>>> scores2 = cross_val_score(RandomForestClassifier(random_state=42), X, y, cv=5)
>>> scores2.mean()
0.879  # Different!

Causes & Solutions:

1. Missing Random State in CV¶

Problem: CV splitter not seeded

# Wrong
cv = KFold(n_splits=5, shuffle=True)  # No random_state!

# Correct
cv = KFold(n_splits=5, shuffle=True, random_state=42)

2. Library Version Mismatch¶

Check versions:

import sklearn, numpy, scipy, foodspec

print(f"scikit-learn: {sklearn.__version__}")
print(f"numpy: {numpy.__version__}")
print(f"scipy: {scipy.__version__}")
print(f"foodspec: {foodspec.__version__}")

Solution: Document and freeze versions

# Save environment
pip freeze > requirements.txt

# Or use conda
conda env export > environment.yml

# Share requirements.txt with collaborators

Known version-dependent behaviors: - NumPy <1.20 vs ≥1.20: RNG changed (use np.random.Generator for consistency) - scikit-learn 0.24 vs 1.0+: random_state behavior changed in some estimators

3. Parallelism Non-Determinism¶

Problem: n_jobs=-1 causes non-deterministic behavior

Solution: Set n_jobs=1 for reproducibility (slower but deterministic)

# For reproducibility
model = RandomForestClassifier(n_estimators=100, random_state=42, n_jobs=1)

# For speed (may not be perfectly reproducible)
model = RandomForestClassifier(n_estimators=100, random_state=42, n_jobs=-1)

4. Floating-Point Precision¶

Problem: Different hardware (CPU vs GPU, Intel vs ARM) gives slightly different results

Solution: Accept small differences (<1e-6) or use lower precision

# Compare with tolerance
np.testing.assert_allclose(result1, result2, rtol=1e-5, atol=1e-6)

# Round for comparison
result1_rounded = np.round(result1, decimals=5)
result2_rounded = np.round(result2, decimals=5)
assert np.allclose(result1_rounded, result2_rounded)

Quick Diagnostic Script¶

Run this script to diagnose common issues:

import numpy as np
import sys

def diagnose_data(X, y=None, wavenumbers=None, sample_ids=None):
    """Comprehensive data diagnostics."""

    print("="*60)
    print("FOODSPEC DATA DIAGNOSTICS")
    print("="*60)

    # 1. Shape checks
    print(f"\n[1] SHAPE CHECKS")
    print(f"   X shape: {X.shape}")
    print(f"   X dtype: {X.dtype}")
    if y is not None:
        print(f"   y shape: {y.shape}")
        print(f"   y dtype: {y.dtype}")
    if wavenumbers is not None:
        print(f"   wavenumbers shape: {wavenumbers.shape}")
        if X.shape[1] != len(wavenumbers):
            print(f"   ⚠️  WARNING: Shape mismatch! {X.shape[1]} != {len(wavenumbers)}")

    # 2. Missing values
    print(f"\n[2] MISSING VALUES")
    n_nan = np.isnan(X).sum()
    n_inf = np.isinf(X).sum()
    print(f"   NaNs: {n_nan} ({100*n_nan/X.size:.2f}%)")
    print(f"   Infs: {n_inf} ({100*n_inf/X.size:.2f}%)")
    if n_nan > 0 or n_inf > 0:
        print(f"   ⚠️  WARNING: Missing/infinite values detected!")

    # 3. Intensity range
    print(f"\n[3] INTENSITY RANGE")
    print(f"   Min: {X.min():.4f}")
    print(f"   Max: {X.max():.4f}")
    print(f"   Mean: {X.mean():.4f}")
    print(f"   Std: {X.std():.4f}")
    if X.max() > 5.0:
        print(f"   ⚠️  WARNING: Unusually high absorbance (>5.0)")
    if X.min() < -1.0:
        print(f"   ⚠️  WARNING: Negative absorbance (<-1.0)")

    # 4. Flat spectra
    print(f"\n[4] FLAT SPECTRA CHECK")
    stds = X.std(axis=1, ddof=1)
    n_flat = (stds < 1e-4).sum()
    print(f"   Flat spectra (std < 1e-4): {n_flat} ({100*n_flat/len(X):.2f}%)")
    if n_flat > 0:
        print(f"   ⚠️  WARNING: Flat spectra may cause normalization issues")

    # 5. Class balance (if labels provided)
    if y is not None:
        print(f"\n[5] CLASS BALANCE")
        from collections import Counter
        counts = Counter(y)
        for label, count in sorted(counts.items()):
            print(f"   {label}: {count} ({100*count/len(y):.1f}%)")
        min_count = min(counts.values())
        max_count = max(counts.values())
        imbalance_ratio = max_count / min_count
        if imbalance_ratio > 3:
            print(f"   ⚠️  WARNING: Severe class imbalance (ratio: {imbalance_ratio:.1f})")

    # 6. Replicate structure (if sample_ids provided)
    if sample_ids is not None:
        print(f"\n[6] REPLICATE STRUCTURE")
        n_samples = len(np.unique(sample_ids))
        n_spectra = len(sample_ids)
        avg_reps = n_spectra / n_samples
        print(f"   Unique samples: {n_samples}")
        print(f"   Total spectra: {n_spectra}")
        print(f"   Avg replicates/sample: {avg_reps:.1f}")
        if avg_reps > 1:
            print(f"   ⚠️  IMPORTANT: Use grouped CV to prevent replicate leakage!")

    # 7. Feature-to-sample ratio
    print(f"\n[7] OVERFITTING RISK")
    n_samples, n_features = X.shape
    ratio = n_features / n_samples
    print(f"   Samples: {n_samples}")
    print(f"   Features: {n_features}")
    print(f"   Feature/Sample ratio: {ratio:.1f}")
    if ratio > 10:
        print(f"   ⚠️  WARNING: High overfitting risk (p >> n). Consider PCA/feature selection.")

    # 8. System info
    print(f"\n[8] SYSTEM INFO")
    print(f"   Python: {sys.version.split()[0]}")
    try:
        import sklearn, scipy, foodspec
        print(f"   NumPy: {np.__version__}")
        print(f"   SciPy: {scipy.__version__}")
        print(f"   scikit-learn: {sklearn.__version__}")
        print(f"   FoodSpec: {foodspec.__version__}")
    except ImportError as e:
        print(f"   ⚠️  Missing package: {e}")

    print("\n" + "="*60)
    print("DIAGNOSTICS COMPLETE")
    print("="*60)

# Usage:
# diagnose_data(X, y, wavenumbers, sample_ids)

Still Having Issues?¶

If your problem isn't covered here:

Check the FAQ: Common questions answered
Search existing issues: GitHub Issues
Ask for help: Open a new issue with:
FoodSpec version (import foodspec; print(foodspec.__version__))
Python version (python --version)
Minimal reproducible example
Full error traceback
Consult documentation:

FAQ – Frequently asked questions
Reporting & Reproducibility – Document results for publication
How to Cite – Citation instructions for FoodSpec
Validation → Leakage Prevention – Prevent data leakage
Reference → Data Format – Data validation checklist

Troubleshooting¶

Quick Problem Index¶

Installation Issues¶

Problem: pip install foodspec fails¶

1. Python Version Incompatibility¶

2. Outdated pip¶

3. Network/Firewall Issues¶

4. Package Name Confusion¶

Problem: Import errors after installation¶

1. Multiple Python Environments¶

2. Development Installation Not Linked¶

3. PYTHONPATH Issues¶

Data Loading Issues¶

Missing Dependencies¶

Problem: Optional dependencies not installed¶

Problem: Conflicting dependency versions¶

Shape/Axis Mismatch Errors¶

Problem: "Shapes do not match" during preprocessing¶

1. Wavenumber Grid Mismatch¶

2. Row vs. Column Confusion¶

3. 1D vs. 2D Array¶

Problem: "Axis out of range" errors¶

NaNs After Preprocessing¶

Problem: NaNs appear after baseline correction¶

1. Input Contains NaNs¶

2. Division by Zero in Normalization¶

3. Baseline Correction Failure¶

Problem: NaNs after derivative calculation¶

Model Overfitting / Too-Good Accuracy¶

Problem: Suspiciously high accuracy (>95%)¶

1. Replicate Leakage¶

2. Preprocessing Leakage¶

3. Overfitting Small Datasets¶

Domain Shift Failure¶

Problem: Model fails on new batches/instruments¶

1. Instrument/Day Variability¶

2. Temperature/Humidity Drift¶

Reproducibility Mismatch¶

Problem: Results differ across runs despite setting seed¶

1. Missing Random State in CV¶

2. Library Version Mismatch¶

3. Parallelism Non-Determinism¶

4. Floating-Point Precision¶

Quick Diagnostic Script¶

Still Having Issues?¶

Related Pages¶

Problem: `pip install foodspec` fails¶