ShipItAndPray/mcp-turboquant
MCP server for LLM quantization. Compress any model to GGUF/GPTQ/AWQ in one tool call. First MCP server for model compression.
Platform-specific configuration:
{
"mcpServers": {
"mcp-turboquant": {
"command": "npx",
"args": [
"-y",
"mcp-turboquant"
]
}
}
}Add the config above to .claude/settings.json under the mcpServers key.
The first MCP server for LLM quantization. Compress any Hugging Face model to GGUF, GPTQ, or AWQ format in a single tool call.
Built on TurboQuant — the unified CLI for model compression.
LLM quantization is one of the most common tasks in the open-source AI workflow, yet there has been no way for AI assistants to do it autonomously. Until now.
With mcp-turboquant, Claude (or any MCP-compatible agent) can:
pip install turboquantAdd to your Claude Code MCP settings (~/.claude/settings.json):
{
"mcpServers": {
"turboquant": {
"command": "npx",
"args": ["-y", "mcp-turboquant"]
}
}
}Or run locally:
{
"mcpServers": {
"turboquant": {
"command": "node",
"args": ["/path/to/mcp-turboquant/index.js"]
}
}
}Add to claude_desktop_config.json:
{
"mcpServers": {
"turboquant": {
"command": "npx",
"args": ["-y", "mcp-turboquant"]
}
}
}| Tool | Description | |------|-------------| | quantize | Quantize a HF model to GGUF/GPTQ/AWQ. Params: model (required), format (gguf/gptq/awq), bits (2-8), output (path) | | info | Get model info — param count, architecture, size estimates | | recommend | Hardware-aware recommendation for best format and bit width | | check | List available quantization backends on the system |
Once configured, just ask Claude:
> "Quantize meta-llama/Llama-3.1-8B to 4-bit GGUF"
> "What quantization format should
Loading reviews...