Skip to content

content-section-length

Warn about markdown sections longer than ~500 tokens

Severity info (auto)
Autofix -
Since v0.7.0
Category Content Intelligence

Why

Long, unbroken sections exceed the model's working-memory span for a single topic. When a section runs past ~500 tokens, instructions near its end compete with instructions near its start for the model's attention — and the ones in the middle lose.

Examples

Bad:

A single ## Setup section spanning 200 lines covering environment, dependencies, database, Docker, and CI configuration.

Good:

## Environment setup
...

## Database setup
...

## Docker
...

How to fix

Split long sections into focused subsections, each under its own heading one level deeper than the parent. Aim for roughly 10–30 lines per subsection. A coding agent can add headings automatically.

Tuning

Adjust the token threshold per section:

rules:
  content-section-length:
    max-tokens: 800

Configuration

rules:
  content-section-length:
    enabled: auto  # true | false | auto
    severity: info
Parameter Description Default
max-tokens Maximum estimated tokens per section before triggering a warning 500

Research Basis

Warns about markdown sections exceeding ~500 estimated tokens.

Long monolithic text blocks degrade both human readability and LLM attention. The lost-in-the-middle effect operates within sections: the longer a contiguous block of text, the worse recall becomes for information in its interior. Breaking content into smaller sections with headings creates natural retrieval anchors.

The ~500 token threshold aligns with RAG chunking research. Pinecone's chunking guide recommends ~512 tokens as the standard baseline for optimal retrieval and comprehension. The threshold is configurable via the max-tokens parameter.

References:

Run skillsaw explain content-section-length to see this documentation and the rule's effective configuration in your terminal.