Documentation Style Guide¶
Purpose: Maintain consistent, high-quality documentation across the FoodSpec project.
Audience: Contributors writing or updating documentation (tutorials, workflows, API docs, etc.).
Quick Checklist (Definition of Done)¶
Before submitting any new documentation page, verify:
- [ ] Headings follow hierarchy (H1 โ H2 โ H3, no skipping)
- [ ] All code blocks have language tags (
python,bash, etc.) - [ ] All links are relative (no absolute URLs to internal docs)
- [ ] All images have descriptive alt text
- [ ] At least one working code example (if applicable)
- [ ] Cross-references to related pages
- [ ] Runs
mkdocs buildwithout errors - [ ] Passes
scripts/check_docs_links.py(no broken links) - [ ] Passes
markdownlint(if configured) - [ ] Added to
mkdocs.ymlnavigation - [ ] Reviewed for spelling/grammar (use spell checker)
1. Writing Style & Tone¶
General Principles¶
โ DO: - Write in active voice: "Run the command" (not "The command should be run") - Be concise: Remove filler words ("just", "simply", "basically") - Use present tense: "FoodSpec processes spectra" (not "will process") - Address the reader as "you": "You can use..." (not "One can use...") - Explain why, not just what: "Use ALS baseline correction because it preserves sharp peaks"
โ DON'T: - Use jargon without defining it (define technical terms on first use) - Write walls of text (break into sections, lists, or tables) - Assume prior knowledge (link to background docs for complex concepts) - Use emojis excessively (sparingly OK for emphasis: โ โ ๐)
Example: Before/After¶
โ BAD (Passive, wordy):
"The baseline correction functionality can be applied to the spectra by using the baseline_als() function, which should be called with appropriate parameters."
โ GOOD (Active, concise):
"Correct the baseline using
baseline_als(). Setlam=1e6for FTIR orlam=1e5for Raman."
2. Document Structure¶
2.1 Headings¶
Rules:
1. One H1 per page (# Title) โ matches the page title
2. Hierarchical nesting: H1 โ H2 โ H3 (never skip levels: H1 โ H3 โ)
3. Sentence case: "How to preprocess spectra" (not "How To Preprocess Spectra")
4. No trailing periods: ## Installation (not ## Installation.)
5. Descriptive, not generic: ## Load FTIR Spectra from CSV (not ## Step 1)
Example Hierarchy:
# Oil Authentication Workflow โ H1 (once per page)
## Overview โ H2
Brief description...
## Step 1: Load Data โ H2
Instructions...
### CSV Format โ H3 (nested under H2)
Details...
### HDF5 Format โ H3
Details...
## Step 2: Preprocess โ H2 (back to H2 level)
...
2.2 Page Template¶
Every tutorial/workflow should follow this structure:
# [Title]
**Purpose:** One-sentence summary (e.g., "Authenticate olive oils using FTIR spectroscopy").
**Audience:** Who is this for? (e.g., "Food scientists with basic Python knowledge").
**Time:** Estimated completion time (e.g., "20 minutes").
**Prerequisites:** Required knowledge/tools (e.g., "Install FoodSpec, have CSV data").
---
## Overview
- What you'll learn
- What problem this solves
- Key concepts
## Step 1: [Action]
Instructions...
## Step 2: [Action]
Instructions...
## Complete Code
Full working example (copy-paste ready)
## Expected Output
What the user should see
## Troubleshooting
Common issues and fixes
## Next Steps
Links to related tutorials/workflows
---
**Related Pages:**
- Link 1: path/to/page.md
- Link 2: path/to/page.md
3. Code Blocks¶
3.1 Language Tags (REQUIRED)¶
Always specify the language:
โ GOOD:
```python
from foodspec import SpectralDataset
```
โ BAD (missing language tag):
```python
from foodspec import SpectralDataset
```
Common language tags:
- python โ Python code
- bash โ Shell commands (Linux/macOS)
- console โ Terminal output (use $ prefix for commands)
- yaml โ YAML configuration files
- json โ JSON data
- text โ Plain text (logs, output)
- markdown โ Markdown examples
3.2 Python Code Rules¶
1. Imports at the top:
# โ
GOOD
from foodspec import SpectralDataset
from sklearn.ensemble import RandomForestClassifier
import numpy as np
ds = SpectralDataset.from_csv('data.csv')
# โ BAD (imports scattered)
from foodspec import SpectralDataset
ds = SpectralDataset.from_csv('data.csv')
from sklearn.ensemble import RandomForestClassifier # โ Import later โ
2. Use print() for outputs:
# โ
GOOD (shows what user will see)
accuracy = 0.95
print(f"Accuracy: {accuracy:.3f}")
# Output: Accuracy: 0.950
3. Include comments for non-obvious steps:
# โ
GOOD
# ALS removes baseline drift; lambda=1e6 for smooth baseline
X_corrected = baseline_als(X, lam=1e6, p=0.01)
4. Keep code blocks short (<30 lines): - If longer, break into multiple blocks with explanatory text between them - Or provide a "Complete Code" section at the end
3.3 Shell Command Rules¶
Use $ prefix for commands, no prefix for output:
# โ
GOOD
$ foodspec --version
FoodSpec 1.0.0
$ ls data/
oils.csv spectra.h5
For multi-line commands, use backslash:
$ foodspec-run-protocol \
--input oils.csv \
--protocol oil_authentication \
--output-dir runs/demo
3.4 Expected Output¶
Always show what the user should expect:
from foodspec import SpectralDataset
ds = SpectralDataset.from_csv('oils.csv')
print(f"Loaded {len(ds)} spectra")
# Expected output:
# Loaded 30 spectra
4. Links¶
4.1 Internal Links (Relative Paths)¶
โ ALWAYS use relative paths:
[Preprocessing Guide](../methods/preprocessing/normalization_smoothing.md)
โ NEVER use absolute URLs:
<!-- โ BAD -->
[Preprocessing Guide](../methods/preprocessing/normalization_smoothing.md)
Path Rules:
- From docs/tutorials/beginner/01-load-and-plot.md to docs/methods/preprocessing/normalization_smoothing.md:
[Preprocessing](../methods/preprocessing/normalization_smoothing.md)
[Home](../index.md)
- Child directory: tutorials/oil_auth.md (example)
4.2 Anchor Links¶
Link to specific sections using #heading-slug:
See [Installation โ Python Requirements](../getting-started/installation.md#python-requirements)
Heading slug rules:
- Lowercase
- Replace spaces with hyphens
- Remove special characters
- Example: ## Python 3.8+ Requirements โ #python-38-requirements
4.3 External Links¶
Format:
[Link text](https://example.com)
For long URLs, use reference-style:
See the [NumPy documentation][numpy-docs] for details.
[numpy-docs]: https://numpy.org/doc/stable/
4.4 Code References¶
Link to specific functions in API docs:
Use [`baseline_als()`](../api/preprocessing.md#baseline_als) to correct the baseline.
Inline code (backticks):
The `SpectralDataset` class provides...
5. Images & Figures¶
5.1 Naming Convention¶
Format: <section>_<description>_<optional-detail>.png
Examples:
- preprocessing_baseline_comparison.png โ Shows before/after baseline correction
- workflow_oil_auth_pipeline.png โ Oil authentication workflow diagram
- tutorial_pca_scores_plot.png โ PCA scores plot from tutorial
Rules:
- Lowercase with underscores
- Descriptive (not image1.png โ)
- Use .png for screenshots/plots, .svg for diagrams (vector graphics)
5.2 Storage Location¶
Store images in:
docs/assets/images/<section>/
Examples:
- docs/assets/images/tutorials/oil_auth_result.png
- docs/assets/images/workflows/heating_quality_flowchart.svg
- docs/assets/images/preprocessing/baseline_als_demo.png
5.3 Alt Text (REQUIRED for Accessibility)¶
โ GOOD (descriptive alt text):

โ BAD (generic alt text - example only):
<!-- Example of bad alt text - DO NOT USE; placeholder commented to avoid broken link checks -->
<!--  -->
Alt text rules: - Describe what the image shows, not what it is - ~10-20 words - No need for "Image of..." (screen readers add this)
5.4 Image Size¶
Guidelines:
- Screenshots: 800-1200px wide (readable but not huge)
- Plots: 600-800px wide (matplotlib: figsize=(8, 6))
- Diagrams: SVG (scalable) or PNG at 1000px wide
Optimize file size:
# Install optimizer
$ pip install pillow
# Optimize PNG (reduce file size without quality loss)
$ python -m PIL.Image input.png --optimize --quality=85 output.png
5.5 Captions¶
Use italics for captions:

*Figure 1: PCA scores plot shows clear separation between EVOO (blue) and lampante (red) olive oils.*
(Note: Example images in this style guide are for illustration only and may not exist)
6. Lists & Tables¶
6.1 Unordered Lists¶
Use - (hyphen) consistently:
- Item 1
- Item 2
- Sub-item 2.1 (indent with 2 spaces)
- Sub-item 2.2
- Item 3
Not * or + (inconsistent with existing docs).
6.2 Ordered Lists¶
Use 1. for all items (auto-numbered):
1. First step
1. Second step
1. Third step
(Markdown auto-increments: 1, 2, 3...)
6.3 Tables¶
Always include header row:
| Column 1 | Column 2 | Column 3 |
|----------|----------|----------|
| Value A | Value B | Value C |
| Value D | Value E | Value F |
Alignment:
- Left: |----------|
- Center: |:--------:|
- Right: |---------:|
Example:
| Method | Accuracy | Speed |
|--------|:--------:|------:|
| ALS | 95% | 12 ms |
| Poly | 92% | 3 ms |
7. Admonitions (Callouts)¶
Use Material for MkDocs admonitions for emphasis:
!!! note "Optional Title"
This is a note. Use for supplementary information.
!!! warning
This is a warning. Use for potential pitfalls.
!!! tip
This is a tip. Use for helpful advice.
!!! danger
This is a danger box. Use for critical warnings (data loss, etc.).
!!! info
This is an info box. Use for general information.
!!! example
This is an example box. Use for worked examples.
!!! quote
This is a quote box. Use for citations or quotes.
Example Usage:
!!! warning "Data Leakage Risk"
Always use `GroupKFold` to prevent replicate leakage. See [Validation Guide](../methods/validation/cross_validation_and_leakage.md).
!!! tip
Use `baseline_als(lam=1e6)` for FTIR and `lam=1e5` for Raman spectra.
8. How to Add a New Workflow¶
Step-by-step process:
1. Create the markdown file¶
Location: docs/workflows/<workflow_name>.md
Example: docs/workflows/olive_oil_authentication.md
Template:
# [Workflow Name]
**Purpose:** One-sentence summary.
**Use case:** Who needs this? What problem does it solve?
**Time:** 20-30 minutes
**Prerequisites:**
- FoodSpec installed
- Basic Python knowledge
- CSV data with labels
---
## Overview
Brief description of the workflow...
## Step 1: Load and Inspect Data
...
## Step 2: Preprocess
...
## Step 3: Train Model
...
## Step 4: Validate
...
## Complete Code
Full working example...
## Expected Output
What the user should see...
## Troubleshooting
Common issues...
## Next Steps
- Related Workflow 1 (example)
- Related Workflow 2 (example)
---
**Related Pages:**
- [Preprocessing Guide](../methods/preprocessing/normalization_smoothing.md)
- [Cross-Validation & Leakage](../methods/validation/cross_validation_and_leakage.md)
2. Add to mkdocs.yml navigation¶
Open: mkdocs.yml
Add under Applications & Workflows section:
# Level 6: Applications & Workflows (Domain Expertise)
- Applications & Workflows:
- Workflows Overview: workflows/index.md
- Authentication & Identification:
- Oil Authentication: workflows/oil_authentication.md
- [NEW WORKFLOW]: workflows/authentication/oil_authentication.md # โ Add here
3. Add to workflows index¶
Open: docs/workflows/index.md
Add to the workflow list:
## Available Workflows
| Workflow | Use Case | Difficulty | Time |
|----------|----------|------------|------|
| Olive Oil Authentication (authentication/oil_authentication.md) | Classify oil varieties | Beginner | 20 min |
4. Test locally¶
# Build docs
$ mkdocs build
# Check for broken links
$ python scripts/check_docs_links.py
# Serve locally (preview at http://localhost:8000)
$ mkdocs serve
5. Create example data (if needed)¶
If your workflow needs sample data:
- Create CSV in
examples/data/<workflow_name>_demo.csv - Document the format in the workflow page
- Provide download link or generation script
Example:
# Generate demo data
import numpy as np
import pandas as pd
np.random.seed(42)
wavenumbers = np.linspace(1000, 3000, 150)
X = np.random.normal(5, 0.3, (30, 150))
y = ['EVOO'] * 15 + ['Lampante'] * 15
df = pd.DataFrame(X, columns=[f'{w:.1f}' for w in wavenumbers])
df.insert(0, 'label', y)
df.to_csv('examples/data/olive_oil_demo.csv', index=False)
9. How to Add a New Tutorial¶
Similar to workflows, but more educational (step-by-step learning).
1. Choose the difficulty level¶
- Level 1 (Beginner): 5-15 minutes, single concept
- Level 2 (Applied): 20-40 minutes, real-world example
- Level 3 (Advanced): 45-90 minutes, complex integration
2. Create the file¶
Location: docs/tutorials/<category>/<name>.md
Example: docs/tutorials/beginner/01-load-and-plot.md
3. Follow the tutorial template¶
# [Tutorial Name]
**Difficulty:** Level 1 (Beginner) / Level 2 (Applied) / Level 3 (Advanced)
**Time:** 10 minutes
**Learning objectives:**
- Learn X
- Understand Y
- Practice Z
**Prerequisites:**
- FoodSpec installed
- (Optional) Jupyter notebook
---
## Background
Brief theory (1-2 paragraphs)...
## Hands-On Exercise
### Step 1: [Action]
Instructions...
Code here¶
**Expected output:**
### Step 2: [Action]
...
## Summary
What you learned...
## Practice Exercises
1. Try modifying X to see Y
2. Apply this to your own data
## Next Tutorial
[Next Level Tutorial](../tutorials/intermediate/01-oil-authentication.md)
---
**Related Pages:**
- [Theory](../theory/spectroscopy_basics.md)
- [API Reference](../api/index.md)
4. Add to mkdocs.yml¶
Add under appropriate level:
# Level 7: Tutorials & Learning (Step-by-Step)
- Tutorials:
- Tutorial Ladder: 02-tutorials/index.md
- Level 1 - Beginner (5-15 min):
- Baseline Correction: 02-tutorials/level1_baseline_and_smoothing.md # โ Add here
```yaml
---
### 5. Update tutorial index
**Open:** `docs/02-tutorials/index.md`
**Add to the ladder:**
Level 1: Beginner (5โ15 min)¶
| Tutorial | What You'll Learn | Time |
|---|---|---|
| Baseline Correction | Remove baseline drift using ALS | 10 min |
|
||
| # Main Title (H1) | ||
| ### Subsection (H3) โ โ Skipped H2! | ||
|
||
| Guide โ โ | ||
|
||
| ```` |
Fix: Add python tag.
โ Pitfall 4: Untested code examples¶
```python
โ This code doesn't actually work!¶
ds = SpectralDataset.from_csv('nonexistent_file.csv') ```
Fix: Test all code examples before publishing.
โ Pitfall 5: No expected output¶
```python print(f"Accuracy: {accuracy:.3f}")
โ User doesn't know what to expect¶
```
Fix: Add comment: ```python print(f"Accuracy: {accuracy:.3f}")
Expected output: Accuracy: 0.950¶
```
12. Documentation Linting & Validation¶
Automated checks to maintain quality.
12.1 Markdownlint¶
Install:
bash
$ npm install -g markdownlint-cli
Run:
bash
$ markdownlint docs/**/*.md
Configuration: See .markdownlint.json (if configured).
12.2 Link Checker¶
Script: scripts/check_docs_links.py
Run:
bash
$ python scripts/check_docs_links.py
What it checks: - Broken internal links (missing files) - Invalid anchor links (non-existent headings) - Broken external URLs (optional, slower)
12.3 MkDocs Build¶
Always run before submitting:
bash
$ mkdocs build
Look for:
- ERROR lines (broken links, missing files)
- WARNING lines (investigate and fix if critical)
13. Docstring Standards (FoodSpec API)¶
Standard: Google style docstrings for all public APIs (modules, classes, functions, methods).
Required sections (in this order):
- Args: Each parameter with type, shape, and units if applicable (e.g., np.ndarray, shape (n_samples, n_wavenumbers), units: a.u.)
- Returns: Type (and shape/units if array-like)
- Raises: Explicit exceptions and conditions
- Notes: "When to use / when not to use" for important functions; mention performance or data-quality caveats
- Examples: One minimal runnable snippet (keep <15 lines)
Conventions:
- Prefer precise types (np.ndarray, Sequence[float], Mapping[str, Any]), not "array-like" unless truly generic
- Document expected dtypes and units for spectral axes (e.g., wavenumber cmโปยน, wavelength nm)
- For batch inputs, include axis meaning (e.g., axis 0 = samples)
- If randomness is involved, mention random_state handling
- If heavy I/O or memory use is expected, note it in Notes
Example docstring (representative of foodspec.metrics.signal_to_noise_ratio)
```python def signal_to_noise_ratio(spectra: np.ndarray, baseline: float = 0.0) -> float: """Compute mean signal-to-noise ratio for a batch of spectra.
Args:
spectra (np.ndarray): Array of spectra, shape (n_samples, n_wavenumbers),
units: arbitrary intensity. Axis 0 = samples, axis 1 = spectral points.
baseline (float): Baseline offset to subtract before SNR estimation, units: intensity.
Returns:
float: Mean signal-to-noise ratio across samples (unitless).
Raises:
ValueError: If `spectra.ndim != 2` or contains NaN/inf values.
Notes:
When to use: quick QC on raw spectra prior to preprocessing.
When not to use: already normalized/denoised dataโprefer post-QC metrics in `foodspec.metrics`.
Examples:
>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> spectra = 1.0 + 0.02 * rng.standard_normal((3, 5)) # mean signal ~1, noise ~0.02
>>> round(signal_to_noise_ratio(spectra), 1) >= 30
True
"""
...
```
14. Maintainer Instructions¶
For maintainers reviewing documentation PRs:
Pre-Merge Checklist¶
-
Automated checks pass:
bash $ mkdocs build # No errors $ python scripts/check_docs_links.py # No broken links $ markdownlint docs/**/*.md # (If configured) -
Manual review:
- [ ] Code examples tested (copy-paste and run)
- [ ] Screenshots/images are clear and appropriately sized
- [ ] Tone is consistent with existing docs
-
[ ] No duplicate content (link to existing docs instead)
-
Test locally:
bash $ mkdocs serve # Open http://localhost:8000 # Navigate to the new page # Check rendering, links, images -
Check mobile rendering:
- Resize browser to mobile width
- Verify tables don't overflow
- Ensure images scale properly
Post-Merge Tasks¶
- Verify deployment:
- Wait for GitHub Actions to deploy
-
Check live site: https://chandrasekarnarayana.github.io/foodspec/
-
Update index pages:
- If new tutorial โ update
docs/02-tutorials/index.md -
If new workflow โ update
docs/workflows/index.md -
Announce in release notes:
- Add to
CHANGELOG.mdunder "Documentation" - Mention in next release notes
15. Tools & Resources¶
Recommended Tools¶
| Tool | Purpose | Link |
|---|---|---|
| VS Code | Markdown editor with preview | code.visualstudio.com |
| Markdown All in One | VS Code extension (auto-formatting, TOC) | Marketplace |
| markdownlint | Markdown linter (CLI or VS Code extension) | GitHub |
| MkDocs Material | Theme documentation | squidfunk.github.io/mkdocs-material |
| Grammarly | Spell/grammar checker | grammarly.com |
Style References¶
- Microsoft Writing Style Guide: docs.microsoft.com/style-guide
- Google Developer Documentation Style Guide: developers.google.com/style
- Markdown Guide: markdownguide.org
16. Examples of Good Documentation¶
Internal examples (FoodSpec): - Oil Authentication Workflow โ Clear structure, working code - Troubleshooting Guide โ Problem โ Solution format - Validation & Leakage โ Theory + practice
External examples (inspiration): - scikit-learn: scikit-learn.org/stable/user_guide.html - FastAPI: fastapi.tiangolo.com - NumPy: numpy.org/doc/stable/user/
17. FAQ (Meta)¶
Q: How long should a tutorial be?
A: Level 1: 5-15 min, Level 2: 20-40 min, Level 3: 45-90 min. Break long tutorials into multiple parts.
Q: Should I include theory or just code?
A: Balance both. Start with brief theory (1-2 paragraphs), then code. Link to detailed theory in "Theory & Background" section.
Q: How many code examples per page?
A: At least 1-2 working examples. More is better, but keep each block <30 lines.
Q: Can I reuse content from other pages?
A: No. Link to the existing page instead. Reduces maintenance burden.
Q: What if I find outdated documentation?
A: Open a GitHub issue or submit a PR to update it.
18. Contact & Questions¶
Need help with documentation?
- Ask in GitHub Discussions: foodspec/discussions
- Open an issue: foodspec/issues (label: documentation)
- Tag maintainers: @chandrasekarnarayana (for urgent issues)
Last updated: December 28, 2024 | Version: 1.0.0