Skip to content

Instrument & File Formats Guide

FoodSpec normalizes instrument exports into a common representation (FoodSpectrumSet / HDF5 libraries). Vendor formats are input routes; analysis always operates on the normalized form.

Supported formats (overview)

Format type Extension(s) How to load Extra dependency?
CSV (wide) .csv read_spectra("file.csv") No
CSV (folder) .csv in a directory read_spectra("folder/") No
JCAMP-DX .jdx, .dx read_spectra("file.jdx") No (built-in parser)
SPC .spc read_spectra("file.spc") Yes (pip install foodspec[spc])
Bruker OPUS .0, .1, .opus read_spectra("file.0") Yes (pip install foodspec[opus])
TXT .txt read_spectra("file.txt") No

Typical structure and metadata

  • Spectral axis: wavenumber (cm⁻¹), ascending, 1D.
  • Intensity: arbitrary units; one column per spectrum (wide CSV) or one file per spectrum (folder/JCAMP/vendor).
  • Metadata: sample_id (from filename/column), plus any vendor header info (instrument, date) when available.
  • Normalization: vendor loaders return raw intensities; downstream preprocessing handles baselines/normalization.
  • Coverage: document spectral range/resolution; ensure exported range includes target bands (fingerprint, CH stretch).

Examples

from foodspec.io import read_spectra

# CSV (wide)
fs = read_spectra("data/oils_wide.csv")

# Folder of instrument CSV exports
fs_folder = read_spectra("data/export_folder/")

# JCAMP-DX
fs_jdx = read_spectra("data/sample.jdx")

# SPC (requires optional extra)
# pip install foodspec[spc]
fs_spc = read_spectra("data/sample.spc")

# OPUS (requires optional extra)
# pip install foodspec[opus]
fs_opus = read_spectra("data/sample.0")

# Run a quick PCA to verify structure
from foodspec.chemometrics.pca import run_pca
pca, res = run_pca(fs_opus.x, n_components=2)
print(res.explained_variance_ratio_)

# After ingest, run any standard workflow (e.g., oil authentication)
# from foodspec.apps.oils import run_oil_authentication_workflow
# result = run_oil_authentication_workflow(fs_opus, label_column=\"oil_type\")

Notes on formats and quirks

  • CSV (wide/folder): Sometimes missing units; ensure wavenumber (cm⁻¹) and ascending axis. Folder exports often store one spectrum per file; filenames become sample_id.
  • JCAMP-DX (.jdx/.dx): Multi-block files may contain multiple spectra; FoodSpec reads blocks into separate spectra. Check headers for units (wavenumber vs wavelength); we assume cm⁻¹ and convert when obvious.
  • SPC (.spc): Binary; may contain multiple traces. Requires pip install foodspec[spc]. If missing, you’ll see ImportError: SPC support requires the 'spc' extra; install the extra to proceed.
  • OPUS (.0/.1/.opus): Binary Bruker format; may contain multiple spectra. Requires pip install foodspec[opus]. Missing dependency raises ImportError: OPUS support requires the 'opus' extra.
  • TXT: Treat as CSV-like; ensure delimiter and column names; wavenumber column required.

All vendor formats are normalized to the same internal FoodSpectrumSet representation (x, wavenumbers, metadata). Downstream workflows are format-agnostic once loaded.

Example: SPC (commercial) → FoodSpectrumSet → HDF5

from foodspec.io import read_spectra
from foodspec.data.libraries import create_library

# Requires: pip install foodspec[spc]
fs_spc = read_spectra("data/vendor/sample.spc")
create_library(
    spectra=fs_spc.x,
    wavenumbers=fs_spc.wavenumbers,
    metadata=fs_spc.metadata,
    modality=fs_spc.modality,
    path="libraries/sample_spc.h5",
)
If the extra is missing, you’ll see an ImportError mentioning the spc extra—install it and rerun.

Example: OPUS (commercial) → FoodSpectrumSet → HDF5

# Requires: pip install foodspec[opus]
fs_opus = read_spectra("data/vendor/sample.0")
create_library(
    spectra=fs_opus.x,
    wavenumbers=fs_opus.wavenumbers,
    metadata=fs_opus.metadata,
    modality=fs_opus.modality,
    path="libraries/sample_opus.h5",
)
If the extra is missing, an ImportError will suggest installing the opus extra.

Synthetic vendor overlay

Synthetic vendor overlay

Synthetic spectra (mimicking SPC/OPUS after normalization) overlaid to illustrate that vendor imports are reduced to the standard wavenumber/intensity layout before analysis.

Troubleshooting

  • Unsupported format: ensure extension matches table; otherwise convert to CSV/JCAMP.
  • Missing dependency: install the appropriate extra (spc, opus); ImportError messages guide installation.
  • Wavenumber issues: verify axis is ascending cm⁻¹; flip/order if needed before analysis.
  • Sparse metadata: filenames become sample_id; vendor headers may provide instrument/date; add missing metadata manually if needed.

See also