Section Length¶
Rule ID: content-section-length
Warn about markdown sections longer than ~500 tokens
| Severity | info (auto) |
| Autofix | llm |
| Since | v0.7.0 |
Configuration¶
| Parameter | Description | Default |
|---|---|---|
max-tokens |
Maximum estimated tokens per section before triggering a warning | 500 |
Research Basis¶
Warns about markdown sections exceeding ~500 estimated tokens.
Long monolithic text blocks degrade both human readability and LLM attention. The lost-in-the-middle effect operates within sections: the longer a contiguous block of text, the worse recall becomes for information in its interior. Breaking content into smaller sections with headings creates natural retrieval anchors.
The ~500 token threshold aligns with RAG chunking research. Pinecone's chunking
guide recommends ~512 tokens as the standard baseline for optimal retrieval and
comprehension. The threshold is configurable via the max-tokens parameter.
References:
- Liu et al., Lost in the Middle — Attention degrades within long contiguous blocks
- Chroma, Context Rot — Attention dilution is quadratic in token count
- Pinecone: Chunking Strategies for LLM Applications — 512 tokens as standard chunking baseline
- Miller, G. A. (1956), The Magical Number Seven, Plus or Minus Two — Working memory limits and the value of chunking