wangkanai
/

flux-dev-fp16

@@ -1,4 +1,4 @@
-<!-- README Version: v1.0 -->
 ---
 license: apache-2.0
@@ -12,6 +12,8 @@ tags:
   - fp16
   - diffusion
   - stable-diffusion
 base_model: black-forest-labs/FLUX.1-dev
 ---
@@ -48,8 +50,12 @@ flux-dev-fp16/
 │   └── t5xxl_fp16.safetensors              (9.2 GB) # T5-XXL text encoder
 ├── clip/
 │   └── t5xxl_fp16.safetensors              (9.2 GB) # T5-XXL encoder (alternate location)
-└── clip_vision/
-    └── clip_vision_h.safetensors           (1.2 GB) # CLIP vision encoder
 Total Repository Size: 72 GB
 ```
@@ -58,6 +64,8 @@ Total Repository Size: 72 GB
 - **Main Model**: `flux1-dev-fp16.safetensors` (23 GB) - Core diffusion transformer
 - **Text Encoders**: CLIP-L, CLIP-G, T5-XXL for advanced text understanding
 - **Vision Encoder**: CLIP vision model for image understanding capabilities
 ## Hardware Requirements
@@ -180,6 +188,45 @@ image = pipe(
 image.save("optimized_output.png")
 ```
 ## Model Specifications
 | Specification | Details |
@@ -200,6 +247,7 @@ image.save("optimized_output.png")
 - Multi-aspect ratio generation
 - Img2img workflows
 - Inpainting and outpainting
 - ControlNet compatibility
 - LoRA fine-tuning support

+<!-- README Version: v1.1 -->
 ---
 license: apache-2.0
   - fp16
   - diffusion
   - stable-diffusion
+  - ip-adapter
+  - style-transfer
 base_model: black-forest-labs/FLUX.1-dev
 ---
 │   └── t5xxl_fp16.safetensors              (9.2 GB) # T5-XXL text encoder
 ├── clip/
 │   └── t5xxl_fp16.safetensors              (9.2 GB) # T5-XXL encoder (alternate location)
+├── clip_vision/
+│   └── clip_vision_h.safetensors           (1.2 GB) # CLIP vision encoder
+├── vae/flux/
+│   └── flux-vae-bf16.safetensors           (160 MB) # VAE decoder in BF16 precision
+└── ipadapter-flux/
+    └── ip-adapter.bin                       (5.0 GB) # IP-Adapter for image prompting
 Total Repository Size: 72 GB
 ```
 - **Main Model**: `flux1-dev-fp16.safetensors` (23 GB) - Core diffusion transformer
 - **Text Encoders**: CLIP-L, CLIP-G, T5-XXL for advanced text understanding
 - **Vision Encoder**: CLIP vision model for image understanding capabilities
+- **VAE**: `flux-vae-bf16.safetensors` (160 MB) - Variational autoencoder for latent/image conversion
+- **IP-Adapter**: `ip-adapter.bin` (5.0 GB) - Image prompt adapter for style transfer and image conditioning
 ## Hardware Requirements
 image.save("optimized_output.png")
 ```
+### IP-Adapter Image Prompting
+```python
+import torch
+from diffusers import FluxPipeline
+from ip_adapter import IPAdapter
+# Load FLUX pipeline
+pipe = FluxPipeline.from_single_file(
+    "E:/huggingface/flux-dev-fp16/checkpoints/flux/flux1-dev-fp16.safetensors",
+    torch_dtype=torch.float16
+)
+pipe.to("cuda")
+# Load IP-Adapter for image conditioning
+ip_adapter = IPAdapter(
+    pipe,
+    image_encoder_path="E:/huggingface/flux-dev-fp16/clip_vision",
+    ip_ckpt="E:/huggingface/flux-dev-fp16/ipadapter-flux/ip-adapter.bin",
+    device="cuda"
+)
+# Load reference image for style/composition transfer
+reference_image = "reference_style.jpg"
+# Generate image with text prompt + image reference
+image = ip_adapter.generate(
+    pil_image=reference_image,
+    prompt="A landscape in the style of the reference image",
+    num_inference_steps=50,
+    guidance_scale=7.5,
+    scale=0.6,  # IP-Adapter influence strength (0.0-1.0)
+    height=1024,
+    width=1024
+)[0]
+image.save("style_transfer_output.png")
+```
 ## Model Specifications
 | Specification | Details |
 - Multi-aspect ratio generation
 - Img2img workflows
 - Inpainting and outpainting
+- IP-Adapter image prompting and style transfer
 - ControlNet compatibility
 - LoRA fine-tuning support