loaditout.ai
SkillsPacksTrendingLeaderboardAPI DocsBlogSubmitRequestsCompareAgentsXPrivacyDisclaimer
{}loaditout.ai
Skills & MCPPacksBlog

mcp-turboquant

MCP Tool

ShipItAndPray/mcp-turboquant

MCP server for LLM quantization. Compress any model to GGUF/GPTQ/AWQ in one tool call. First MCP server for model compression.

Install

$ npx loaditout add ShipItAndPray/mcp-turboquant

Platform-specific configuration:

.claude/settings.json
{
  "mcpServers": {
    "mcp-turboquant": {
      "command": "npx",
      "args": [
        "-y",
        "mcp-turboquant"
      ]
    }
  }
}

Add the config above to .claude/settings.json under the mcpServers key.

About

mcp-turboquant

The first MCP server for LLM quantization. Compress any Hugging Face model to GGUF, GPTQ, or AWQ format in a single tool call.

Built on TurboQuant — the unified CLI for model compression.

Why?

LLM quantization is one of the most common tasks in the open-source AI workflow, yet there has been no way for AI assistants to do it autonomously. Until now.

With mcp-turboquant, Claude (or any MCP-compatible agent) can:

  • Quantize models — convert any HF model to GGUF/GPTQ/AWQ with specified bit widths
  • Inspect models — get parameter counts, architecture details, and size estimates
  • Recommend settings — analyze your hardware and suggest optimal format + bits
  • Check backends — verify which quantization engines are installed
Install
Prerequisites
pip install turboquant
Claude Code

Add to your Claude Code MCP settings (~/.claude/settings.json):

{
  "mcpServers": {
    "turboquant": {
      "command": "npx",
      "args": ["-y", "mcp-turboquant"]
    }
  }
}

Or run locally:

{
  "mcpServers": {
    "turboquant": {
      "command": "node",
      "args": ["/path/to/mcp-turboquant/index.js"]
    }
  }
}
Claude Desktop

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "turboquant": {
      "command": "npx",
      "args": ["-y", "mcp-turboquant"]
    }
  }
}
Tools

| Tool | Description | |------|-------------| | quantize | Quantize a HF model to GGUF/GPTQ/AWQ. Params: model (required), format (gguf/gptq/awq), bits (2-8), output (path) | | info | Get model info — param count, architecture, size estimates | | recommend | Hardware-aware recommendation for best format and bit width | | check | List available quantization backends on the system |

Examples

Once configured, just ask Claude:

> "Quantize meta-llama/Llama-3.1-8B to 4-bit GGUF"

> "What quantization format should

Tags

ggufllmmcpmcp-serverquantizationturboquant

Reviews

Loading reviews...

Quality Signals

0
Installs
Last updated24 days ago
Security: AREADME

Safety

Risk Levelmedium
Data Access
read
Network Accessnone

Details

Sourcegithub-crawl
Last commit3/25/2026
View on GitHub→

Embed Badge

[![Loaditout](https://loaditout.ai/api/badge/ShipItAndPray/mcp-turboquant)](https://loaditout.ai/skills/ShipItAndPray/mcp-turboquant)