Amir-Zecharia/compress-tokens
MCP server that compresses text by removing unnecessary tokens using local LLM surprisal scoring
Platform-specific configuration:
{
"mcpServers": {
"compress-tokens": {
"command": "npx",
"args": [
"-y",
"compress-tokens"
]
}
}
}Add the config above to .claude/settings.json under the mcpServers key.
An MCP server that compresses text by removing low-information tokens using local LLM surprisal scoring via candle. No API keys. No cloud. Everything runs on your machine.
Each token in the input is scored by its surprisal — how unexpected it is given all preceding tokens, computed by a local quantized LLM. Tokens with low surprisal (predictable filler) are dropped; tokens with high surprisal (informative content) are kept. The remaining tokens are decoded back to text.
The primary use case is reducing context window usage: Claude Code can call compress_file on a large file and get back a shorter version that preserves the information-dense parts before reasoning over it.
| Tool | Description | |---|---| | compress_text | Compress text with an explicit keep_ratio (fraction of tokens to keep, default 0.7) | | compress_text_auto | Compress text with automatic keep ratio via elbow detection on the surprisal curve | | compress_file | Read a file, compress it, and return the result. Optionally write to output_path. Large files are chunked at 2048 tokens. |
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh)git clone https://github.com/Amir-Zecharia/compress-tokens
cd compress-tokens
cargo build --releaseOn macOS, enable Metal GPU acceleration:
cargo build --release --features metalclaude mcp add compress-tokens /path/to/compress-tokens/target/release/compress-tokens --scope userOn first use, the server downloads the default model (~700MB) from HuggingFace and caches it locally. All subsequent starts load from cache.
The server loads the model at startup and exits automatically after 60 seconds of inactivity, freeing
Loading reviews...