MCP_server_local_image_and_music_generation

MCP Tool

davidemodolo/MCP_server_local_image_and_music_generation

A local, open-source MCP (Model Context Protocol) server for high-quality batched image and audio generation using SSD-1B and AudioLDM 2.

Install

$ npx loaditout add davidemodolo/MCP_server_local_image_and_music_generation

Platform-specific configuration:

.claude/settings.json

{
  "mcpServers": {
    "MCP_server_local_image_and_music_generation": {
      "command": "npx",
      "args": [
        "-y",
        "MCP_server_local_image_and_music_generation"
      ]
    }
  }
}

Add the config above to .claude/settings.json under the mcpServers key.

About

Local MCP Server for Batched Image and Audio Generation

This repository provides a local MCP (Model Context Protocol) server exposing open source image and audio generation tools. Inference runs on your machine using fast and lightweight modern models.

What You Get

Image Generation: Driven by Segmind SSD-1B (A distilled SDXL model providing 1024x1024 high quality with half the size of SDXL). Batched prompts are supported.
Audio Generation: Driven by AudioLDM 2 for generating music, sound effects, and speech-style prompts.
StdIO MCP server for integration with MCP-compatible clients.
Local-first workflow (models downloaded on first run, outputs saved locally).
Unified, clean dependencies using Hugging Face diffusers.

Supported Models

Image Generation

Model: segmind/SSD-1B
Why: Distilled version of Stable Diffusion XL (SDXL). It achieves massive quality leaps in prompt adherence, lighting, and anatomy, and it can accurately generate text—all while being 50% smaller and 60% faster than SDXL.
Download Size: ~2.5GB
VRAM Requirement: ~4GB+ for optimal hardware generation.

Audio Generation

Model: cvssp/audioldm2
Why: Replaces earlier sequential generators like MusicGen. AudioLDM 2 gracefully handles a wide range of tasks—including complex music, environmental sound effects, and speech synthesis—in a single streamlined model.
Download Size: ~1.1GB
VRAM Requirement: ~2GB+

Architecture

mcp_server/main.py: MCP tool definitions and input validation
image_gen/image_generator.py: SSD-1B pipeline and batched image generation
audio_gen/audio_generator.py: AudioLDM2 loading and audio generation
config.py: defaults and limits (including image batch limit)

Requirements

Python 3.8+
Recommended for image generation: CUDA GPU with 6GB+ VRAM
Recommended RAM: 16GB+
For audio proces

Reviews

Loading reviews...

Quality Signals

Installs

Last updated24 days ago

Security: AREADME

Safety

Risk Levelmedium

Data Access

read

Network Accessnone

Details

Sourcegithub-crawl

Last commit3/23/2026

View on GitHub→

Embed Badge

[![Loaditout](https://loaditout.ai/api/badge/davidemodolo/MCP_server_local_image_and_music_generation)](https://loaditout.ai/skills/davidemodolo/MCP_server_local_image_and_music_generation)

MCP_server_local_image_and_music_generation

MCP Tool

davidemodolo/MCP_server_local_image_and_music_generation

A local, open-source MCP (Model Context Protocol) server for high-quality batched image and audio generation using SSD-1B and AudioLDM 2.

Install

$ npx loaditout add davidemodolo/MCP_server_local_image_and_music_generation

Platform-specific configuration:

.claude/settings.json

{
  "mcpServers": {
    "MCP_server_local_image_and_music_generation": {
      "command": "npx",
      "args": [
        "-y",
        "MCP_server_local_image_and_music_generation"
      ]
    }
  }
}

Add the config above to .claude/settings.json under the mcpServers key.

About

Local MCP Server for Batched Image and Audio Generation

This repository provides a local MCP (Model Context Protocol) server exposing open source image and audio generation tools. Inference runs on your machine using fast and lightweight modern models.

What You Get

Image Generation: Driven by Segmind SSD-1B (A distilled SDXL model providing 1024x1024 high quality with half the size of SDXL). Batched prompts are supported.
Audio Generation: Driven by AudioLDM 2 for generating music, sound effects, and speech-style prompts.
StdIO MCP server for integration with MCP-compatible clients.
Local-first workflow (models downloaded on first run, outputs saved locally).
Unified, clean dependencies using Hugging Face diffusers.

Supported Models

Image Generation

Model: segmind/SSD-1B
Why: Distilled version of Stable Diffusion XL (SDXL). It achieves massive quality leaps in prompt adherence, lighting, and anatomy, and it can accurately generate text—all while being 50% smaller and 60% faster than SDXL.
Download Size: ~2.5GB
VRAM Requirement: ~4GB+ for optimal hardware generation.

Audio Generation

Model: cvssp/audioldm2
Why: Replaces earlier sequential generators like MusicGen. AudioLDM 2 gracefully handles a wide range of tasks—including complex music, environmental sound effects, and speech synthesis—in a single streamlined model.
Download Size: ~1.1GB
VRAM Requirement: ~2GB+

Architecture

mcp_server/main.py: MCP tool definitions and input validation
image_gen/image_generator.py: SSD-1B pipeline and batched image generation
audio_gen/audio_generator.py: AudioLDM2 loading and audio generation
config.py: defaults and limits (including image batch limit)

Requirements

Python 3.8+
Recommended for image generation: CUDA GPU with 6GB+ VRAM
Recommended RAM: 16GB+
For audio proces

Reviews

Loading reviews...

Quality Signals

Installs

Last updated24 days ago

Security: AREADME

Safety

Risk Levelmedium

Data Access

read

Network Accessnone

Details

Sourcegithub-crawl

Last commit3/23/2026

View on GitHub→

Embed Badge

[![Loaditout](https://loaditout.ai/api/badge/davidemodolo/MCP_server_local_image_and_music_generation)](https://loaditout.ai/skills/davidemodolo/MCP_server_local_image_and_music_generation)

MCP_server_local_image_and_music_generation

Install

About

Tags

Reviews

Quality Signals

Safety

Details

Embed Badge

MCP_server_local_image_and_music_generation

Install

About

Tags

Reviews

Quality Signals

Safety

Details

Embed Badge