Mixture Analysis: Quantification via NNLS¶
Level: Intermediate → Advanced
Runtime: <1 second
Key Concepts: Linear spectral unmixing, NNLS, quantification, mixture models
What You Will Learn¶
In this example, you'll learn how to: - Create pure component spectra and synthetic mixtures - Use Non-Negative Least Squares (NNLS) to estimate component fractions - Assess unmixing accuracy by comparing estimated vs. actual fractions - Understand limitations of linear mixture models - Apply this to real-world adulterant detection
After completing this example, you'll understand how to quantify ingredient blends, detect mineral oil adulterations, and validate mixture assumptions.
Prerequisites¶
- Understanding of linear algebra (matrix equations, least squares)
- Familiarity with optimization problems
numpy,scipyinstalled- Basic knowledge of mixture models (optional, we explain)
Optional background: Read Mixture Models
The Problem¶
Real-world scenario: You've measured spectra for two pure oils (olive and sunflower). A customer supplies an unknown "olive oil" that might be adulterated with sunflower oil. Can you: 1. Build a mixture model from pure components? 2. Unmix the unknown to estimate its composition? 3. Assess whether the claimed olive oil is authentic?
Assumption: Spectra combine linearly (i.e., mixture spectrum = fraction₁ × spectrum₁ + fraction₂ × spectrum₂)
Goal: Recover unknown fractions by unmixing.
Step 1: Create Pure Components & Mixtures¶
import numpy as np
from scipy.optimize import nnls
# Create synthetic pure component spectra
np.random.seed(42)
n_wavelengths = 1500
# Pure component 1 (Olive oil): single broad peak
wavelengths = np.linspace(800, 3000, n_wavelengths)
pure_1 = np.exp(-((wavelengths - 1600) / 200) ** 2) + 0.05 * np.random.randn(n_wavelengths)
# Pure component 2 (Sunflower oil): different peak
pure_2 = np.exp(-((wavelengths - 1500) / 150) ** 2) + 0.05 * np.random.randn(n_wavelengths)
# Mixture: 70% olive + 30% sunflower
true_fractions = np.array([0.70, 0.30])
mixture = true_fractions[0] * pure_1 + true_fractions[1] * pure_2
print(f"Pure 1 intensity range: {pure_1.min():.3f} to {pure_1.max():.3f}")
print(f"Pure 2 intensity range: {pure_2.min():.3f} to {pure_2.max():.3f}")
print(f"Mixture composition: {true_fractions[0]*100:.0f}% Pure1 + {true_fractions[1]*100:.0f}% Pure2")
What's happening:
- pure_1 and pure_2: Two reference spectra with different spectral features
- mixture: Linear combination of pure components with known fractions
- Noise is added to make it realistic
Step 2: Unmix with NNLS¶
# Set up linear system: A @ fractions = mixture
# where A contains pure components as columns
A = np.column_stack([pure_1, pure_2])
# Solve with Non-Negative Least Squares
estimated_fractions, residual = nnls(A, mixture)
# Normalize (fractions should sum to ~1.0)
estimated_fractions = estimated_fractions / estimated_fractions.sum()
print(f"True fractions: {true_fractions}")
print(f"Estimated fractions: {estimated_fractions}")
print(f"Residual (RMSE): {np.sqrt(residual):.6f}")
Interpretation: - NNLS output: Estimated fractions ≥ 0 (enforces physical constraint: can't have negative amounts) - Residual: How well the linear model fits (lower = better) - Normalized: Fractions sum to 1.0 for direct interpretation
Step 3: Assess Accuracy¶
# Compare true vs. estimated
error = np.abs(estimated_fractions - true_fractions)
mae = error.mean()
rmse = np.sqrt((error ** 2).mean())
print(f"\nAccuracy Assessment:")
print(f" Component 1 - True: {true_fractions[0]:.3f}, Estimated: {estimated_fractions[0]:.3f}, Error: {error[0]:.3f}")
print(f" Component 2 - True: {true_fractions[1]:.3f}, Estimated: {estimated_fractions[1]:.3f}, Error: {error[1]:.3f}")
print(f" Mean Absolute Error: {mae:.4f}")
print(f" Root Mean Sq Error: {rmse:.4f}")
# Practical decision: Is adulterant level acceptable?
adulterant_level = estimated_fractions[1] * 100
threshold = 5.0 # tolerance: up to 5% sunflower oil is acceptable
if adulterant_level > threshold:
print(f"\n⚠️ WARNING: Adulterant detected! ({adulterant_level:.1f}% > {threshold}% threshold)")
else:
print(f"\n✓ PASS: Adulterant level acceptable ({adulterant_level:.1f}%)")
What's happening: - We compare estimated vs. true fractions (possible because we created synthetic data) - Mean Absolute Error (MAE) summarizes overall accuracy - Real scenario: Compare against known adulterant thresholds (regulatory limits)
Step 4: Visualize Pure Components & Unmixing¶
import matplotlib.pyplot as plt
fig, axes = plt.subplots(2, 2, figsize=(12, 8))
# Plot 1: Pure components
ax = axes[0, 0]
ax.plot(wavelengths, pure_1, label="Pure Component 1 (Olive)", linewidth=2)
ax.plot(wavelengths, pure_2, label="Pure Component 2 (Sunflower)", linewidth=2)
ax.set_xlabel("Wavenumber (cm⁻¹)")
ax.set_ylabel("Intensity")
ax.set_title("Pure Component Spectra")
ax.legend()
ax.grid(True, alpha=0.3)
# Plot 2: Mixture spectrum
ax = axes[0, 1]
ax.plot(wavelengths, mixture, label="Unknown Mixture", linewidth=2, color="red")
ax.set_xlabel("Wavenumber (cm⁻¹)")
ax.set_ylabel("Intensity")
ax.set_title(f"Mixture Spectrum (True: 70% Comp1, 30% Comp2)")
ax.legend()
ax.grid(True, alpha=0.3)
# Plot 3: Composition comparison
ax = axes[1, 0]
x = np.arange(2)
width = 0.35
ax.bar(x - width/2, true_fractions, width, label="True", alpha=0.8)
ax.bar(x + width/2, estimated_fractions, width, label="Estimated", alpha=0.8)
ax.set_ylabel("Fraction")
ax.set_title("Composition: True vs. Estimated")
ax.set_xticks(x)
ax.set_xticklabels(["Component 1", "Component 2"])
ax.legend()
ax.set_ylim([0, 1])
ax.grid(True, alpha=0.3, axis="y")
# Plot 4: Residual (fit quality)
reconstructed = A @ estimated_fractions
residuals = mixture - reconstructed
ax = axes[1, 1]
ax.plot(wavelengths, residuals, label="Unmixing Residual", linewidth=1, color="gray")
ax.axhline(0, color="black", linestyle="--", alpha=0.5)
ax.fill_between(wavelengths, residuals, 0, alpha=0.3)
ax.set_xlabel("Wavenumber (cm⁻¹)")
ax.set_ylabel("Residual Intensity")
ax.set_title(f"Unmixing Error (RMSE={np.sqrt(residual):.4f})")
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig("mixture_analysis_unmixing.png", dpi=150, bbox_inches="tight")
plt.show()
Figure interpretation: - Top-left: Pure components have distinct spectral features - Top-right: Mixture shows combination of both features - Bottom-left: Bar chart compares true vs. estimated fractions (should match closely) - Bottom-right: Residual shows unmixing error (should be small noise)
Full Working Script¶
See the production script with multiple synthetic mixtures and detailed accuracy assessment:
📄 examples/mixture_analysis_quickstart.py – Full working code (55 lines)
Key Takeaways¶
✅ Linear mixture model: Assumes spectra combine additively
✅ NNLS solver: Enforces non-negativity (physically realistic)
✅ Quantification workflow: Measure pure components → Create standard → Unmix unknown
✅ Validation: Compare estimated vs. known fractions to assess accuracy
Assumptions & Limitations¶
⚠️ Linear mixing: Works for Raman/IR, not always for fluorescence or imaging ⚠️ Known pure components: Must measure or obtain reference spectra ⚠️ Spectral stability: Component spectra shouldn't change with concentration ⚠️ No chemical reactions: Binary mixtures are simpler than complex food matrices
Real-World Applications¶
- 🫒 Oil adulterant detection: Quantify mineral oil, seed oil additions
- 🍯 Honey verification: Estimate corn syrup, high fructose content
- 🧈 Butter authenticity: Detect margarine or vegetable oil blends
- 🧂 Salt purity: Measure mineral contaminants
- 🍶 Alcohol purity: Quantify water and flavor additives
Advanced Topics¶
Want to go deeper? - Multiple components: Unmix > 2 components simultaneously - Constraints: Add inequality bounds on fractions - Regularization: Use Ridge or Lasso regression for ill-posed problems - Non-linear unmixing: Handle concentration-dependent spectral shifts
See Mixture Models for complete details.
Next Steps¶
- Try it: Use actual oils or other food samples as pure components
- Explore: Add measurement noise and assess robustness
- Learn more: Read Mixture Models
- Advance: Combine with Oil Authentication for classification + quantification
Interactive Notebook¶
For step-by-step exploration with multiple components:
📓 examples/tutorials/03_mixture_analysis_teaching.ipynb
Figure provenance¶
- Generated by scripts/generate_docs_figures.py
- Output: ../assets/figures/cv_boxplot.png