na{xx}en

Compression

How naxxen compression benefits you and how to tune it.

Benefits

naxxen compression reduces input tokens sent to your LLM provider. This gives you three benefits, all equally important:

Token cost savings

Fewer input tokens = lower bill from your provider. Depending on your prompts, expect 30–60% savings on input token costs. System prompts, chat history, and tool descriptions compress well. Short messages, code, and structured data pass through unchanged.

Context window expansion

The same context window fits more content after compression. If you're hitting context limits with long conversations or large system prompts, compression lets you include more turns before the window fills up.

Faster responses

Fewer input tokens means less processing time for the provider. This translates to lower time-to-first-token, especially on large prompts.

Settings

Each API key has independent compression settings. Configure them in your dashboard settings.

Compression toggle

Turn compression on or off per key. When off, all requests pass through unchanged (pure proxy, zero overhead).

Compression rate

Controls how aggressively text is compressed:

RateToken reductionBest for
Light~30%Conservative — preserves more nuance
Medium (default)~50%Balanced — good savings with minimal quality impact
Aggressive~65%Maximum savings — best for verbose system prompts

Minimum token threshold

Text blocks shorter than this threshold are skipped (not compressed). Default: 200 tokens. Compressing very short text adds latency without meaningful savings.

Skip code blocks

When enabled (default), code fences, JSON, XML, and structured data pass through uncompressed. Recommended — compressing code risks breaking syntax.

What gets compressed

Content typeCompressed?Why
System promptsYesBiggest savings — often verbose
Chat history (older messages)YesConversation context accumulates tokens
Tool descriptionsYesCan be very verbose
Code blocks (...)NoOne wrong character breaks code
JSON / XML / structured dataNoStructural integrity matters
Images, audio, PDFsNoBinary content, not text
Last user messageNoYour intent — never touched
Short text (below threshold)NoOverhead exceeds savings
Thinking blocksNoModel reasoning preserved

Passthrough

If a request has nothing to compress (e.g., only images, only short messages, all code), it passes through with zero overhead. These show up as "passthrough" turns in your dashboard — you can filter for them in the Usage page.