Variable Importance in Projection (VIP) for PLS Models¶
Overview¶
VIP (Variable Importance in Projection) scores quantify which spectral wavenumbers are most important for a PLS or PLS-DA model's predictions. Higher VIP scores indicate more discriminative wavelengths.
What is VIP?¶
VIP is a post-hoc interpretation metric derived from the PLS weight matrix and explained variance. Mathematically:
$$\text{VIP}j = \sqrt{\frac{p \sum_a \text{SS}_a \cdot w$$}^2}{\sum_a \text{SS}_a}
where: - $p$ = number of wavenumber variables - $a$ = PLS latent component index - $\text{SS}a$ = sum of squares explained by component $a$ - $w$ = PLS weight for wavenumber $j$ in component $a$
Interpretation: VIP > 1.0 suggests a wavenumber is above-average importance; VIP < 0.8 suggests below-average.
Using VIP in FoodSpec¶
Extract VIP from PLS-DA Pipeline¶
import numpy as np
from foodspec.chemometrics.models import make_pls_da
# Train PLS-DA model
X = np.random.randn(100, 500) # 100 samples, 500 wavenumbers
y = np.random.randint(0, 3, 100) # 3 classes
pipeline = make_pls_da(n_components=5)
pipeline.fit(X, y)
# Extract VIP scores
pls_projector = pipeline.named_steps["pls_proj"]
vip_scores = pls_projector.get_vip_scores()
print(f"VIP shape: {vip_scores.shape}") # (500,)
print(f"Top 10 VIP: {np.argsort(vip_scores)[::-1][:10]}")
Interpretation & Visualization¶
import matplotlib.pyplot as plt
wavenumbers = np.linspace(400, 4000, 500) # example grid
# Plot VIP across spectrum
fig, ax = plt.subplots(figsize=(12, 4))
ax.bar(wavenumbers, vip_scores, width=2.0, alpha=0.7)
ax.axhline(y=1.0, color="red", linestyle="--", label="VIP = 1.0 threshold")
ax.axhline(y=0.8, color="orange", linestyle="--", label="VIP = 0.8 threshold")
ax.set_xlabel("Wavenumber (cm$^{-1}$)")
ax.set_ylabel("VIP Score")
ax.legend()
plt.tight_layout()
plt.savefig("pls_da_vip_scores.png")
Chemical Interpretation¶
Combine VIP with spectroscopic knowledge:
from foodspec.chemometrics.validation import vip_table_with_meanings
# Get top VIP wavenumbers with chemical meanings
vip_table = vip_table_with_meanings(
vip_scores,
wavenumbers,
top_n=15,
modality="raman",
tolerance=20.0 # ±20 cm^-1 tolerance for band matching
)
print(vip_table)
# Output:
# wavenumber vip meaning
# 0 1602.3 2.145 C=C aromatic stretch
# 1 1450.1 1.892 CH2 bending
# ...
Limitations & Assumptions¶
-
Scale Sensitivity: VIP depends on feature scaling. Always standardize X before PLS fitting.
from sklearn.preprocessing import StandardScaler pipeline = Pipeline([ ("scaler", StandardScaler()), ("pls_da", make_pls_da(n_components=5)) ]) -
Multicollinearity: Highly correlated wavenumbers may have unstable VIP scores. Use regularization.
-
Sample Size: Small n (< 30) → VIP unreliable. Ensure adequate replication.
-
Number of Components: VIP depends on
n_components. Validate via cross-validation.# Use cross-validated component count from sklearn.model_selection import cross_val_score
Outputs in Protocol Results¶
VIP scores are automatically saved in protocol runs when using PLS-DA:
foodspec run-exp config.yml --output-dir results/
# Outputs:
# results/tables/vip_scores.csv
# results/figures/vip_barplot.png
# results/interpretation/top_vip_bands.md
See Also¶
- PLS Regression & PLS-DA
- Model Interpretability
- Feature Interpretation (under review)
- Spectroscopic Databases