Skip to content

Chemometrics API

Multivariate analysis tools: PCA, PLS, and mixture modeling.

The foodspec.chemometrics module provides classical chemometric methods for dimensionality reduction, classification, and quantitative analysis.

Principal Component Analysis (PCA)

run_pca

Perform PCA on spectral data with comprehensive outputs.

Run PCA on data matrix.

Parameters

X : Array of shape (n_samples, n_features). n_components : Number of components to compute.

Returns

tuple Fitted PCA estimator and PCAResult container.

Partial Least Squares (PLS)

make_pls_da

Create PLS Discriminant Analysis classifier.

Create a PLS-DA (PLS + Logistic Regression) pipeline.

make_pls_regression

Create PLS regression model.

Create a PLS regression pipeline with scaling.

Mixture Modeling

mcr_als

Multivariate Curve Resolution - Alternating Least Squares.

Perform a simplified MCR-ALS decomposition with non-negativity clipping.

Parameters

X: Data matrix of shape (n_samples, n_points). n_components: Number of components to estimate. max_iter: Maximum number of ALS iterations. tol: Convergence tolerance on reconstruction error. random_state: Optional seed for reproducible initialization.

Returns

C: Concentration profiles (n_samples, n_components). S: Spectral profiles (n_points, n_components).

nnls_mixture

Non-negative least squares mixture deconvolution.

Fit a non-negative least squares mixture.

Parameters

spectrum: Array of shape (n_points,) representing the mixture spectrum. pure_spectra: Array of shape (n_points, n_components) containing pure component spectra as columns.

Returns

coefficients: Non-negative coefficients for each component (shape (n_components,)). residual_norm: Euclidean norm of the residual.

Variable Importance

calculate_vip

Variable Importance in Projection scores for PLS models.

Calculate Variable Importance in Projection (VIP) scores for PLS model.

VIP scores indicate the importance of each variable in the PLS model. Variables with VIP scores > 1 are considered highly influential. Variables with VIP scores > 0.8 are considered moderately important.

Parameters

pls_model : PLSRegression or Pipeline Fitted PLS regression model. If Pipeline, the last step must be a PLSRegression. X : array-like of shape (n_samples, n_features) Training data used to fit the model. y : array-like of shape (n_samples,) or (n_samples, n_targets) Training targets used to fit the model.

Returns

vip_scores : ndarray of shape (n_features,) VIP score for each feature. Scores > 1 indicate high importance.

Raises

ValueError If model is not fitted or is not a PLS model. TypeError If pls_model is not PLSRegression or Pipeline containing PLSRegression.

Notes

VIP is calculated as:

.. math:: VIP_j = \sqrt{p \cdot \sum_{a=1}^{A} (SS_a \cdot w_{aj}^2) / \sum_{a=1}^{A} SS_a}

where: - p is the number of variables - A is the number of PLS components - SS_a is the sum of squares explained by component a - w_{aj} is the weight of variable j in component a

Examples

from sklearn.cross_decomposition import PLSRegression from foodspec.chemometrics.vip import calculate_vip import numpy as np

Generate synthetic data

X = np.random.randn(100, 10) y = X[:, 0] + 2 * X[:, 1] + np.random.randn(100) * 0.1

Fit PLS model

pls = PLSRegression(n_components=3) pls.fit(X, y)

Calculate VIP scores

vip_scores = calculate_vip(pls, X, y) print(f"VIP scores shape: {vip_scores.shape}") print(f"Important features (VIP > 1): {np.where(vip_scores > 1)[0]}")

See Also