Libraries & Public Datasets¶
Who needs this? Users converting raw spectral files to FoodSpec's HDF5 format; researchers accessing public datasets. What problem does this solve? Understanding FoodSpec's data storage format and how to create/load spectral libraries. When to use this? When you have raw spectral data (CSV/TXT files) and need to create a reusable library for workflows. Why it matters? Proper data organization in HDF5 format enables reproducible analysis, faster loading, and metadata tracking. Time to complete: 10 minutes to read; library creation takes seconds to minutes depending on dataset size. Prerequisites: Raw spectral data files; FoodSpec installed; basic understanding of data formats
Questions this page answers - What is stored in a FoodSpec HDF5 library? - How do I build/load libraries? - Which public datasets are used in the protocol? - How does CSV → FoodSpectrumSet → HDF5 work?
What is stored in an HDF5 library?¶
x: spectra matrix (n_samples × n_wavenumbers)wavenumbers: shared axis (cm⁻¹)metadata: one row per sample (labels, acquisition info)modality: Raman / FTIR / NIR tagprovenance: basic version/timestamp metadata
Building and loading libraries¶
from pathlib import Path
from foodspec.io import create_library
from foodspec import load_library
input_folder = Path("data/oils_raw")
out_path = Path("libraries/oils_raman.h5")
create_library(input_folder, out_path, modality="raman")
fs = load_library(out_path)
Public datasets (used in protocol examples)¶
| Dataset | Modality | Task | Approx size | DOI/URL |
|---|---|---|---|---|
| Mendeley edible oils | Raman/FTIR | Classification | Tens–hundreds spectra | Mendeley Data (edible oils) |
| EVOO–sunflower mixtures | Raman | Regression (fraction) | Tens spectra | DOI: 10.57745/DOGT0E |
| FTIR edible oils (ATR) | FTIR | Classification/adulteration | Tens spectra | Public FTIR oil datasets |
Loaders: load_public_mendeley_oils, load_public_evoo_sunflower_raman, load_public_ftir_oils (data must be pre-downloaded).
CSV → FoodSpectrumSet → HDF5¶
- Use
foodspec csv-to-libraryfor CSV inputs. - Wide: one column per spectrum,
wavenumbercolumn for the axis. - Long/tidy: rows = (sample_id, wavenumber, intensity), extra metadata columns allowed.
- Command:
foodspec csv-to-library data/oils.csv libraries/oils.h5 \ --format wide \ --wavenumber-column wavenumber \ --label-column oil_type \ --modality raman - Internally: CSV → FoodSpectrumSet (validated) → HDF5 with spectra, axis, metadata, modality, provenance.
Public loader examples¶
from foodspec.data import load_public_mendeley_oils, load_public_evoo_sunflower_raman
fs_cls = load_public_mendeley_oils(root="path/to/mendeley")
fs_mix = load_public_evoo_sunflower_raman(root="path/to/evoo_sunflower")
See also - csv_to_library.md - workflows/oil_authentication.md - workflows/mixture_analysis.md