Section Length¶

Rule ID: content-section-length

Warn about markdown sections longer than ~500 tokens


Severity	info (auto)
Autofix	llm
Since	v0.7.0

Configuration¶

Parameter	Description	Default
`max-tokens`	Maximum estimated tokens per section before triggering a warning	`500`

Research Basis¶

Warns about markdown sections exceeding ~500 estimated tokens.

Long monolithic text blocks degrade both human readability and LLM attention. The lost-in-the-middle effect operates within sections: the longer a contiguous block of text, the worse recall becomes for information in its interior. Breaking content into smaller sections with headings creates natural retrieval anchors.

The ~500 token threshold aligns with RAG chunking research. Pinecone's chunking guide recommends ~512 tokens as the standard baseline for optimal retrieval and comprehension. The threshold is configurable via the max-tokens parameter.

References:

Liu et al., Lost in the Middle — Attention degrades within long contiguous blocks
Chroma, Context Rot — Attention dilution is quadratic in token count
Pinecone: Chunking Strategies for LLM Applications — 512 tokens as standard chunking baseline
Miller, G. A. (1956), The Magical Number Seven, Plus or Minus Two — Working memory limits and the value of chunking