Stable DiffusionImage GenerationGPU CloudComfyUI

How to Run Stable Diffusion in the Cloud: Complete 2026 Guide

Deploy Stable Diffusion XL, ControlNet, and ComfyUI on cloud GPU servers. Complete setup guide with optimization tips for production image generation at scale.

10 min read

Ready to deploy a GPU server?

Referral credits subject to Vultr's official program terms.

Why Cloud GPUs Are Essential for Stable Diffusion

Stable Diffusion has transformed AI image generation, enabling anyone to create photorealistic images, art, and designs from text prompts. But running it effectively requires GPU acceleration — and cloud GPUs make this accessible without buying expensive hardware.

On a modern GPU like the NVIDIA A100, you can generate 512x512 images in under a second. At 1024x1024 resolution with Stable Diffusion XL, generation takes 2-5 seconds. Scale this to a production API serving hundreds of concurrent users, and cloud GPU infrastructure becomes the only practical choice.

Understanding Stable Diffusion Models

Before deploying, understand the model ecosystem:

Stable Diffusion 1.5 (SD 1.5)

The original public release that started the revolution. Requires only 4-6GB VRAM, making it highly accessible. Massive community support with thousands of fine-tuned models (LoRAs, checkpoints) on Hugging Face and Civitai.

Use case: Legacy workflows, maximum compatibility, smaller GPU instances

Stable Diffusion XL (SDXL)

A major upgrade with a two-stage pipeline: a base model at 1024x1024 resolution and a refiner model for detail enhancement. Requires 8-16GB VRAM for the base model.

Use case: High-quality commercial image generation, photography-style outputs

SDXL Turbo and LCM (Latent Consistency Models)

Distilled models that generate images in 1-4 steps instead of 20-50. This provides up to 10x speed improvement at the cost of some quality.

Use case: Real-time generation, interactive applications, high-throughput APIs

FLUX.1

The latest generation from Black Forest Labs (Stability AI founders). Significantly improved text rendering, composition, and photorealism over SDXL. Requires 16-24GB VRAM for the full model.

Use case: State-of-the-art quality for commercial applications

Setting Up ComfyUI on a Cloud GPU

ComfyUI is the most powerful and flexible Stable Diffusion interface, built on a node-based workflow system.

Initial Setup

# Clone ComfyUI

git clone https://github.com/comfyanonymous/ComfyUI

cd ComfyUI

# Install dependencies

pip install -r requirements.txt

# Install additional nodes (optional but recommended)

cd custom_nodes

git clone https://github.com/ltdrdata/ComfyUI-Manager

# Start ComfyUI

python main.py --listen 0.0.0.0 --port 8188

Configure for Remote Access

Since you're running on a cloud server, you need to access the UI remotely:

# Option 1: SSH tunnel (most secure)

ssh -L 8188:localhost:8188 user@your-gpu-server-ip

# Option 2: Nginx reverse proxy with authentication

# Install nginx and certbot for HTTPS

apt install nginx certbot python3-certbot-nginx

Downloading and Managing Models

Using Hugging Face CLI

pip install huggingface_hub

# Download SDXL base model

python -c "

from huggingface_hub import hf_hub_download

hf_hub_download(

repo_id='stabilityai/stable-diffusion-xl-base-1.0',

filename='sd_xl_base_1.0.safetensors',

local_dir='./models/checkpoints'

)

"

Organizing Your Model Directory

ComfyUI/

├── models/

│ ├── checkpoints/ # Main SDXL/SD models

│ ├── loras/ # LoRA fine-tuning adapters

│ ├── controlnet/ # ControlNet models

│ ├── vae/ # VAE models

│ ├── upscale_models/ # ESRGAN, etc.

│ └── embeddings/ # Textual inversions

Building a Production Image Generation API

For serving image generation at scale, you need an API layer:

Using the ComfyUI API

ComfyUI has a built-in REST API. Here's how to call it programmatically:

import requests

import json

import uuid

import websocket

import threading

COMFYUI_URL = "http://localhost:8188"

def generate_image(prompt: str, negative_prompt: str = "", steps: int = 20):

# Load your workflow JSON

with open("workflow.json") as f:

workflow = json.load(f)

# Modify prompt nodes

workflow["6"]["inputs"]["text"] = prompt

workflow["7"]["inputs"]["text"] = negative_prompt

# Submit to queue

client_id = str(uuid.uuid4())

response = requests.post(

f"{COMFYUI_URL}/prompt",

json={"prompt": workflow, "client_id": client_id}

)

prompt_id = response.json()["prompt_id"]

# Wait for completion via websocket

# ... (implementation details)

return get_image(prompt_id)

Scaling with Multiple GPU Workers

For production scale, run multiple ComfyUI instances:

# Worker 1 on GPU 0

CUDA_VISIBLE_DEVICES=0 python main.py --port 8188

# Worker 2 on GPU 1

CUDA_VISIBLE_DEVICES=1 python main.py --port 8189

Use a load balancer (nginx upstream or a Python queue) to distribute requests across workers.

Optimization Techniques for Maximum Throughput

Enable xFormers

xFormers provides memory-efficient attention and can improve generation speed significantly:

pip install xformers --pre --index-url https://download.pytorch.org/whl/nightly/cu121

# ComfyUI will automatically use xformers if installed

python main.py --use-pytorch-cross-attention # Alternative if xformers has issues

Optimize Batch Size

Processing multiple images in a batch is more efficient than sequential generation:

# In ComfyUI workflow, set batch_size in the KSampler node

workflow["3"]["inputs"]["batch_size"] = 4 # Generate 4 images at once

Use Float16 Precision

# Start ComfyUI with float16 for faster generation

python main.py --fp16-vae --bf16-unet

Implement Request Queuing

For API services, implement proper queuing to prevent GPU memory overflow:

from queue import Queue

from threading import Thread

request_queue = Queue(maxsize=50)

def worker():

while True:

request = request_queue.get()

result = generate_image(request["prompt"])

request["callback"](result)

request_queue.task_done()

# Start worker thread

Thread(target=worker, daemon=True).start()

Advanced Workflows: ControlNet and IP-Adapter

ControlNet for Precise Control

ControlNet allows you to control image composition using:

  • Canny edges: Match the outline of reference images
  • Depth maps: Control 3D composition
  • Pose estimation: Match human poses
  • Segmentation: Control region-by-region content
# ControlNet preprocessing

from controlnet_aux import CannyDetector

detector = CannyDetector()

control_image = detector(reference_image, low_threshold=100, high_threshold=200)

IP-Adapter for Style Transfer

IP-Adapter allows you to use an image as a style reference while generating with a text prompt:

# Download IP-Adapter models

cd ComfyUI/models/ipadapter

wget https://huggingface.co/h94/IP-Adapter/resolve/main/models/ip-adapter-plus_sd15.bin

Performance Benchmarks

On NVIDIA A100 80GB with ComfyUI (SDXL, 20 steps, 1024x1024):

ConfigurationTime per ImageImages/Hour SDXL Standard~5 seconds~720 SDXL + xFormers~3.5 seconds~1,030 SDXL Turbo (4 steps)~0.8 seconds~4,500 FLUX.1-schnell~2 seconds~1,800

Conclusion

Running Stable Diffusion on cloud GPUs gives you access to state-of-the-art image generation capabilities without the upfront cost of purchasing GPU hardware. Cloud infrastructure enables you to scale from a single development instance to a production API serving thousands of concurrent users.

FAQ

Q: What's the minimum GPU for running Stable Diffusion XL?

A: SDXL requires at least 8GB VRAM for basic operation. 16GB is recommended for comfortable workflows with ControlNet and refiners enabled.

Q: Can I run multiple models simultaneously?

A: Yes, with sufficient VRAM. SDXL base (5.1GB) + refiner (6.7GB) can fit in 16GB VRAM. Keep models loaded in VRAM for faster switching.

Q: How do I handle NSFW content filtering?

A: Use the built-in CLIP safety checker or implement a separate content moderation API. Responsible use of generative AI is critical for any production deployment.

Maria Santos

Maria Santos

ML Engineer & AI Researcher

Maria é engenheira de machine learning com PhD em IA pela USP. Especialista em fine-tuning de LLMs e otimização de inferência. Publicou pesquisas sobre eficiência em modelos de linguagem grandes.

Published: March 1, 2026

Updated: March 1, 2026

Sources & References

Posts Relacionados

Apply This Knowledge Today

Deploy your GPU server and put these techniques into practice. Referral credits subject to Vultr's official program terms.