Realistic Vision V6.0 Inpainting - CoreML
CoreML conversion of Realistic Vision V6.0 Inpainting optimized for Apple Silicon devices (iPhone, iPad, Mac).
Model Details
| Property | Value |
|---|---|
| Base Model | Stable Diffusion 1.5 Inpainting |
| Fine-tune | Realistic Vision V6.0 |
| Resolution | 512x512 |
| UNet Channels | 9 (latent + mask + masked image) |
| Prediction Type | Epsilon |
| Attention | SPLIT_EINSUM (optimized for ANE) |
| Safety Checker | Included |
Files
| File | Size | Description |
|---|---|---|
realistic-vision-inpaint-safe.zip |
2.4 GB | Full model with NSFW safety checker |
realistic-vision-inpaint-coreml.zip |
2.0 GB | Model without safety checker (legacy) |
Bundle Contents
Resources/
βββ TextEncoder.mlmodelc # CLIP text encoder
βββ Unet.mlmodelc # 9-channel inpainting UNet
βββ VAEDecoder.mlmodelc # Latent to image decoder
βββ VAEEncoder.mlmodelc # Image to latent encoder
βββ SafetyChecker.mlmodelc # NSFW content filter
βββ vocab.json # Tokenizer vocabulary
βββ merges.txt # BPE merges
Usage
This model is designed for use with iOS/macOS apps using CoreML. It requires a custom inpainting pipeline that:
- Encodes the input image to latent space using VAEEncoder
- Prepares a 9-channel input:
[noised_latent(4) + mask(1) + masked_image_latent(4)] - Runs denoising with the UNet
- Decodes the result with VAEDecoder
- Checks output with SafetyChecker (optional but recommended)
Input Format
- Image: 512x512 RGB
- Mask: 512x512 grayscale (white = regenerate, black = keep)
- Prompt: Text description of desired content in masked area
Performance
| Device | Generation Time (20 steps) |
|---|---|
| iPhone 15 Pro | ~15-20 seconds |
| M1 Mac | ~10-15 seconds |
| M2/M3 Mac | ~8-12 seconds |
Safety Checker
The realistic-vision-inpaint-safe.zip includes a CLIP-based safety checker that filters NSFW content. When integrated:
- Generated images are analyzed before being returned
- NSFW content is blocked with an error
- Safe content passes through normally
Recommended for App Store distribution.
License
This model is released under the CreativeML Open RAIL-M License.
You CAN:
- Use commercially
- Redistribute
- Modify and create derivatives
You MUST:
- Include license and attribution
- Not use for illegal purposes
- Not generate content exploiting minors
- Not use for harassment or deception
Attribution
- Original Model: Realistic Vision V6.0 by SG_161222
- Inpainting Variant: stablediffusionapi/realistic-vision-v6.0-b1-inpaint
- CoreML Conversion: Using Apple ml-stable-diffusion
Conversion Details
Converted using Apple's ml-stable-diffusion toolkit:
python -m python_coreml_stable_diffusion.torch2coreml \
--model-version stablediffusionapi/realistic-vision-v6.0-b1-inpaint \
--convert-unet \
--convert-text-encoder \
--convert-vae-decoder \
--convert-vae-encoder \
--convert-safety-checker \
--attention-implementation SPLIT_EINSUM \
--bundle-resources-for-swift-cli \
-o output