loaditout.ai
SkillsPacksTrendingLeaderboardAPI DocsBlogSubmitRequestsCompareAgentsXPrivacyDisclaimer
{}loaditout.ai
Skills & MCPPacksBlog

MCP_server_local_image_and_music_generation

MCP Tool

davidemodolo/MCP_server_local_image_and_music_generation

A local, open-source MCP (Model Context Protocol) server for high-quality batched image and audio generation using SSD-1B and AudioLDM 2.

Install

$ npx loaditout add davidemodolo/MCP_server_local_image_and_music_generation

Platform-specific configuration:

.claude/settings.json
{
  "mcpServers": {
    "MCP_server_local_image_and_music_generation": {
      "command": "npx",
      "args": [
        "-y",
        "MCP_server_local_image_and_music_generation"
      ]
    }
  }
}

Add the config above to .claude/settings.json under the mcpServers key.

About

Local MCP Server for Batched Image and Audio Generation

This repository provides a local MCP (Model Context Protocol) server exposing open source image and audio generation tools. Inference runs on your machine using fast and lightweight modern models.

What You Get
  • Image Generation: Driven by Segmind SSD-1B (A distilled SDXL model providing 1024x1024 high quality with half the size of SDXL). Batched prompts are supported.
  • Audio Generation: Driven by AudioLDM 2 for generating music, sound effects, and speech-style prompts.
  • StdIO MCP server for integration with MCP-compatible clients.
  • Local-first workflow (models downloaded on first run, outputs saved locally).
  • Unified, clean dependencies using Hugging Face diffusers.
Supported Models
Image Generation
  • Model: segmind/SSD-1B
  • Why: Distilled version of Stable Diffusion XL (SDXL). It achieves massive quality leaps in prompt adherence, lighting, and anatomy, and it can accurately generate text—all while being 50% smaller and 60% faster than SDXL.
  • Download Size: ~2.5GB
  • VRAM Requirement: ~4GB+ for optimal hardware generation.
Audio Generation
  • Model: cvssp/audioldm2
  • Why: Replaces earlier sequential generators like MusicGen. AudioLDM 2 gracefully handles a wide range of tasks—including complex music, environmental sound effects, and speech synthesis—in a single streamlined model.
  • Download Size: ~1.1GB
  • VRAM Requirement: ~2GB+
Architecture
  • mcp_server/main.py: MCP tool definitions and input validation
  • image_gen/image_generator.py: SSD-1B pipeline and batched image generation
  • audio_gen/audio_generator.py: AudioLDM2 loading and audio generation
  • config.py: defaults and limits (including image batch limit)
Requirements
  • Python 3.8+
  • Recommended for image generation: CUDA GPU with 6GB+ VRAM
  • Recommended RAM: 16GB+
  • For audio proces

Tags

audioldmgenerative-ailocal-aimcp-servermodel-context-protocolstable-diffusiontext-to-audiotext-to-image

Reviews

Loading reviews...

Quality Signals

0
Installs
Last updated24 days ago
Security: AREADME

Safety

Risk Levelmedium
Data Access
read
Network Accessnone

Details

Sourcegithub-crawl
Last commit3/23/2026
View on GitHub→

Embed Badge

[![Loaditout](https://loaditout.ai/api/badge/davidemodolo/MCP_server_local_image_and_music_generation)](https://loaditout.ai/skills/davidemodolo/MCP_server_local_image_and_music_generation)