Compression

Benefits

naxxen compression reduces input tokens sent to your LLM provider. This gives you three benefits, all equally important:

Fewer input tokens = lower bill from your provider. Depending on your prompts, expect 30–60% savings on input token costs. System prompts, chat history, and tool descriptions compress well. Short messages, code, and structured data pass through unchanged.

Context window expansion

The same context window fits more content after compression. If you're hitting context limits with long conversations or large system prompts, compression lets you include more turns before the window fills up.

Faster responses

Fewer input tokens means less processing time for the provider. This translates to lower time-to-first-token, especially on large prompts.

Settings

Each API key has independent compression settings. Configure them in your dashboard settings.

Compression toggle

Turn compression on or off per key. When off, all requests pass through unchanged (pure proxy, zero overhead).

Compression rate

Controls how aggressively text is compressed:

Rate	Token reduction	Best for
Light	~30%	Conservative — preserves more nuance
Medium (default)	~50%	Balanced — good savings with minimal quality impact
Aggressive	~65%	Maximum savings — best for verbose system prompts

Minimum token threshold

Text blocks shorter than this threshold are skipped (not compressed). Default: 200 tokens. Compressing very short text adds latency without meaningful savings.

Skip code blocks

When enabled (default), code fences, JSON, XML, and structured data pass through uncompressed. Recommended — compressing code risks breaking syntax.

What gets compressed

Content type	Compressed?	Why
System prompts	Yes	Biggest savings — often verbose
Chat history (older messages)	Yes	Conversation context accumulates tokens
Tool descriptions	Yes	Can be very verbose
Code blocks (`...`)	No	One wrong character breaks code
JSON / XML / structured data	No	Structural integrity matters
Images, audio, PDFs	No	Binary content, not text
Last user message	No	Your intent — never touched
Short text (below threshold)	No	Overhead exceeds savings
Thinking blocks	No	Model reasoning preserved

Passthrough

If a request has nothing to compress (e.g., only images, only short messages, all code), it passes through with zero overhead. These show up as "passthrough" turns in your dashboard — you can filter for them in the Usage page.