Compression
How naxxen compression benefits you and how to tune it.
Benefits
naxxen compression reduces input tokens sent to your LLM provider. This gives you three benefits, all equally important:
Token cost savings
Fewer input tokens = lower bill from your provider. Depending on your prompts, expect 30–60% savings on input token costs. System prompts, chat history, and tool descriptions compress well. Short messages, code, and structured data pass through unchanged.
Context window expansion
The same context window fits more content after compression. If you're hitting context limits with long conversations or large system prompts, compression lets you include more turns before the window fills up.
Faster responses
Fewer input tokens means less processing time for the provider. This translates to lower time-to-first-token, especially on large prompts.
Settings
Each API key has independent compression settings. Configure them in your dashboard settings.
Compression toggle
Turn compression on or off per key. When off, all requests pass through unchanged (pure proxy, zero overhead).
Compression rate
Controls how aggressively text is compressed:
| Rate | Token reduction | Best for |
|---|---|---|
| Light | ~30% | Conservative — preserves more nuance |
| Medium (default) | ~50% | Balanced — good savings with minimal quality impact |
| Aggressive | ~65% | Maximum savings — best for verbose system prompts |
Minimum token threshold
Text blocks shorter than this threshold are skipped (not compressed). Default: 200 tokens. Compressing very short text adds latency without meaningful savings.
Skip code blocks
When enabled (default), code fences, JSON, XML, and structured data pass through uncompressed. Recommended — compressing code risks breaking syntax.
What gets compressed
| Content type | Compressed? | Why |
|---|---|---|
| System prompts | Yes | Biggest savings — often verbose |
| Chat history (older messages) | Yes | Conversation context accumulates tokens |
| Tool descriptions | Yes | Can be very verbose |
Code blocks (...) | No | One wrong character breaks code |
| JSON / XML / structured data | No | Structural integrity matters |
| Images, audio, PDFs | No | Binary content, not text |
| Last user message | No | Your intent — never touched |
| Short text (below threshold) | No | Overhead exceeds savings |
| Thinking blocks | No | Model reasoning preserved |
Passthrough
If a request has nothing to compress (e.g., only images, only short messages, all code), it passes through with zero overhead. These show up as "passthrough" turns in your dashboard — you can filter for them in the Usage page.