Your AI coding agent burns 15,000–20,000 tokens on instruction files before you type a word. I found exactly where compression breaks behavior — and built a tool to automate the safe zone.
Read the full article on DEV ↗ GitHub repoThere's a threshold where compression stops being lossless. Beyond ~47–54% total reduction, the model's compliance with safety rules becomes probabilistic instead of deterministic.
| Content Type | Safe Reduction | Strategy |
|---|---|---|
| Paths, references, lists | 60–70% | Maximum compression |
| Personality, style rules | 50–60% | Heavy compression |
| Safety rules, preferences | 20–30% | Formatting only |
| Code examples | 0% | No compression |
Drag the compression line to see how token streams degrade past the cliff:
pip install context-compress
# LLM compression (best results, needs kiro-cli)
context-compress llm ~/.kiro/steering/ -o ~/.kiro/steering-compressed/
# Regex compression (fast, offline)
context-compress compress-dir ~/.kiro/steering/ -o ~/.kiro/steering-compressed/
# Find duplicates across your context stack
context-compress dedup ~/.kiro/steering/
# Token usage stats
context-compress stats ~/.kiro/steering/
Regex compression on lean steering files: 2.7%. LLM semantic compression on the same files: 24%. The LLM understands which words carry meaning and which are scaffolding.
Merging 8 safety bullets into 3 sentences (same meaning, 54% reduction) made compliance probabilistic. The verbose version asked permission every time; the merged version asked 1 out of 3 times.
| Layer | Strategy | Target |
|---|---|---|
| 1 | Skills over steering | Load prose on demand |
| 2 | Cache-aware ordering | Stable content above dynamic |
| 3 | LLM compression | Semantic compression of remaining prose |
| 4 | TOON encoding | Token-efficient structured payloads |