Skip to content

CSV → HDF5 Pipeline

Who needs this? Users with CSV/TXT spectral data who need to convert to FoodSpec's HDF5 format. What problem does this solve? Converting various CSV formats (wide, long, multi-file) into a standardized HDF5 library. When to use this? Immediately after collecting spectra from instruments; before running any FoodSpec workflow. Why it matters? HDF5 libraries preserve metadata, enable faster loading, and are required for all FoodSpec analysis workflows. Time to complete: 5 minutes to understand formats; conversion takes seconds. Prerequisites: CSV or TXT spectral data; FoodSpec installed


Overview

FoodSpec converts CSVs to reusable spectral libraries (HDF5) for all workflows.

Supported inputs

Format Description Required columns Typical use
CSV (wide) One column per spectrum, one row per wavenumber wavenumber, sample columns Fast conversion to HDF5
CSV (long/tidy) One row per (sample_id, wavenumber, intensity) sample_id, wavenumber, intensity Public datasets, tidy data
Folder of TXT/CSV One file per spectrum, aligned axes filename, wavenumber/intensity columns Instrument exports
HDF5 library Serialized FoodSpectrumSet x, wavenumbers, metadata, modality Primary format for workflows

CSV layouts

Wide format

wavenumber,s1,s2,s3
500,10.1,12.2,9.9
502,10.3,12.4,10.0
- wavenumber is the axis; each other column is a spectrum.

Long/tidy format

sample_id,wavenumber,intensity,oil_type
s001,500,10.1,olive
s001,502,10.3,olive
s002,500,12.2,sunflower
- one row per (sample, wavenumber); extra columns become metadata (e.g., oil_type).

CLI: csv-to-library

foodspec csv-to-library \
  data/oils.csv \
  libraries/oils.h5 \
  --format wide \
  --wavenumber-column wavenumber \
  --modality raman \
  --label-column oil_type
- Use --format long with --sample-id-column / --intensity-column for tidy data. - Parent directories are created automatically; output is an HDF5 library.

Python API

from foodspec.io.csv_import import load_csv_spectra
from foodspec.io import create_library

fs = load_csv_spectra("data/oils.csv", format="wide", modality="raman")
create_library("libraries/oils.h5", spectra=fs.x, wavenumbers=fs.wavenumbers, metadata=fs.metadata, modality=fs.modality)

Troubleshooting

  • Missing column errors: check spelling of wavenumber/sample_id/intensity.
  • Non-monotonic axis: sort wavenumbers before conversion.
  • Mixed labels: ensure consistent sample IDs and metadata rows.