Skip to content

IO & Data Loading API

Functions for loading, saving, and converting spectral data across various formats.

The foodspec.io module handles data import/export with support for CSV, HDF5, JCAMP-DX, and vendor-specific formats (Bruker OPUS, SPC).

Primary Functions

load_folder

Load spectra from a directory of text files.

Load spectra from a folder of text files.

Parameters:

Name Type Description Default
folder PathLike

Directory containing spectra files.

required
pattern str

Glob pattern for spectra files.

'*.txt'
modality str

Spectroscopy modality label.

'raman'
metadata_csv Optional[PathLike]

Optional CSV with a sample_id column used to merge metadata by file basename.

None
wavenumber_column int

Column index for wavenumbers in the spectra files.

0
intensity_columns Optional[Sequence[int]]

Optional indices for intensity columns. If multiple are provided, their mean is taken. When omitted, all columns except wavenumber_column are used.

None

Returns:

Type Description
FoodSpectrumSet

A FoodSpectrumSet with a common wavenumber axis.

Raises:

Type Description
ValueError

If no files match the pattern or files are malformed.

read_spectra

Auto-detect format and read spectra.

Read spectra from multiple possible formats into FoodSpectrumSet.

Parameters:

Name Type Description Default
path str | PathLike

File or folder path.

required
format str | None

Optional override for the detected format. One of "csv", "folder_csv", "jcamp", "spc", "opus".

None
**kwargs Any

Extra keyword arguments forwarded to the underlying loader.

{}

Returns:

Type Description
FoodSpectrumSet

A FoodSpectrumSet loaded from the provided path.

Raises:

Type Description
ValueError

If the format is unsupported or cannot be inferred.

detect_format

Identify file format by inspection.

Detect input format based on path.

Parameters:

Name Type Description Default
path str | PathLike

File or directory path.

required

Returns:

Type Description
str

A short string key such as "csv", "folder_csv", "jcamp", "spc",

str

"opus", "txt", or "unknown".

CSV & Text Formats

load_csv_spectra

Load spectra from CSV files (wide or long format).

Load spectra from a CSV file into a FoodSpectrumSet.

Parameters:

Name Type Description Default
csv_path str | Path

Path to the CSV file.

required
format str

"wide" for one row per wavenumber and one column per spectrum, or "long" for one row per (sample_id, wavenumber) with an intensity column.

'wide'
wavenumber_column str

Name of the wavenumber column (both formats).

'wavenumber'
intensity_columns Optional[Iterable[str]]

For "wide" format, which columns contain intensities. If None, all non-wavenumber columns are treated as spectra.

None
sample_id_column str

For "long" format, column giving sample identifiers.

'sample_id'
intensity_column str

For "long" format, column giving intensity values.

'intensity'
label_column Optional[str]

Optional column name to copy into metadata (e.g., label).

None
modality str

Spectroscopy modality (e.g., "raman", "ftir").

'raman'

Returns:

Type Description
FoodSpectrumSet

A FoodSpectrumSet ready for preprocessing and modeling.

Raises:

Type Description
FileNotFoundError

If the CSV file does not exist.

ValueError

If required columns are missing or the format is invalid.

read_jcamp

Read JCAMP-DX spectroscopy files.

Read a JCAMP-DX file (.jdx, .dx) into FoodSpectrumSet.

Minimal parser that extracts numeric pairs (wavenumber, intensity) while ignoring header tags.

Parameters:

Name Type Description Default
path str | Path

Path to a JCAMP-DX file.

required
modality str

Spectroscopy modality label for the dataset.

'raman'

Returns:

Type Description
FoodSpectrumSet

A FoodSpectrumSet containing a single spectrum from the file.

Raises:

Type Description
ValueError

If no spectral data is found in the file.

Vendor Formats

read_opus

Read Bruker OPUS files (requires optional dependency).

Read a Bruker OPUS file into FoodSpectrumSet.

Uses the optional brukeropusreader package when available; raises an informative ImportError otherwise.

Parameters:

Name Type Description Default
path str | Path

Path to the OPUS file.

required
modality str

Spectroscopy modality label.

'ftir'

Returns:

Type Description
FoodSpectrumSet

A FoodSpectrumSet parsed from the OPUS data.

Raises:

Type Description
ImportError

If brukeropusreader is not installed.

read_spc

Read Thermo Galactic SPC files (requires optional dependency).

Read an SPC file into FoodSpectrumSet.

Attempts to import known SPC readers (e.g., "spc" or "spc_io"). If none are available, an informative ImportError is raised.

Parameters:

Name Type Description Default
path str | Path

Path to the SPC file.

required
modality str

Spectroscopy modality label.

'raman'

Returns:

Type Description
FoodSpectrumSet

A FoodSpectrumSet constructed from the SPC data.

Raises:

Type Description
ImportError

If an SPC reader dependency is not installed.

Export Functions

to_hdf5

Save dataset to HDF5 format.

Persist spectra to an HDF5 file.

Stores datasets x, wavenumbers, and metadata_json (serialized via DataFrame.to_json), plus the modality attribute.

Parameters:

Name Type Description Default
spectra FoodSpectrumSet

Dataset to save.

required
path PathLike

Target HDF5 file path.

required

Raises:

Type Description
ImportError

If h5py is not installed.

to_tidy_csv

Export dataset to tidy (long-format) CSV.

Export spectra to a tidy (long-form) CSV file.

Produces columns sample_id, all metadata fields, wavenumber, and intensity.

Parameters:

Name Type Description Default
spectra FoodSpectrumSet

Dataset to export.

required
path PathLike

Output file path where the CSV will be written.

required

See Also