Chemometrics API¶
Multivariate analysis tools: PCA, PLS, and mixture modeling.
The foodspec.chemometrics module provides classical chemometric methods for dimensionality reduction, classification, and quantitative analysis.
Principal Component Analysis (PCA)¶
run_pca¶
Perform PCA on spectral data with comprehensive outputs.
Partial Least Squares (PLS)¶
make_pls_da¶
Create PLS Discriminant Analysis classifier.
make_pls_regression¶
Create PLS regression model.
Mixture Modeling¶
mcr_als¶
Multivariate Curve Resolution - Alternating Least Squares.
Perform a simplified MCR-ALS decomposition with non-negativity clipping.
Parameters¶
X: Data matrix of shape (n_samples, n_points). n_components: Number of components to estimate. max_iter: Maximum number of ALS iterations. tol: Convergence tolerance on reconstruction error. random_state: Optional seed for reproducible initialization.
Returns¶
C: Concentration profiles (n_samples, n_components). S: Spectral profiles (n_points, n_components).
nnls_mixture¶
Non-negative least squares mixture deconvolution.
Fit a non-negative least squares mixture.
Parameters¶
spectrum: Array of shape (n_points,) representing the mixture spectrum. pure_spectra: Array of shape (n_points, n_components) containing pure component spectra as columns.
Returns¶
coefficients: Non-negative coefficients for each component (shape (n_components,)). residual_norm: Euclidean norm of the residual.
Variable Importance¶
calculate_vip¶
Variable Importance in Projection scores for PLS models.
Calculate Variable Importance in Projection (VIP) scores for PLS model.
VIP scores indicate the importance of each variable in the PLS model. Variables with VIP scores > 1 are considered highly influential. Variables with VIP scores > 0.8 are considered moderately important.
Parameters¶
pls_model : PLSRegression or Pipeline Fitted PLS regression model. If Pipeline, the last step must be a PLSRegression. X : array-like of shape (n_samples, n_features) Training data used to fit the model. y : array-like of shape (n_samples,) or (n_samples, n_targets) Training targets used to fit the model.
Returns¶
vip_scores : ndarray of shape (n_features,) VIP score for each feature. Scores > 1 indicate high importance.
Raises¶
ValueError If model is not fitted or is not a PLS model. TypeError If pls_model is not PLSRegression or Pipeline containing PLSRegression.
Notes¶
VIP is calculated as:
.. math:: VIP_j = \sqrt{p \cdot \sum_{a=1}^{A} (SS_a \cdot w_{aj}^2) / \sum_{a=1}^{A} SS_a}
where: - p is the number of variables - A is the number of PLS components - SS_a is the sum of squares explained by component a - w_{aj} is the weight of variable j in component a
Examples¶
from sklearn.cross_decomposition import PLSRegression from foodspec.chemometrics.vip import calculate_vip import numpy as np
Generate synthetic data¶
X = np.random.randn(100, 10) y = X[:, 0] + 2 * X[:, 1] + np.random.randn(100) * 0.1
Fit PLS model¶
pls = PLSRegression(n_components=3) pls.fit(X, y)
Calculate VIP scores¶
vip_scores = calculate_vip(pls, X, y) print(f"VIP scores shape: {vip_scores.shape}") print(f"Important features (VIP > 1): {np.where(vip_scores > 1)[0]}")
See Also¶
- Chemometrics Methods - Detailed methodology
- PCA Guide - PCA concepts
- Examples - Chemometric workflows