Instructions to use mlx-community/Lance-3B-8bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use mlx-community/Lance-3B-8bit with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir Lance-3B-8bit mlx-community/Lance-3B-8bit
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
📂 Part of the Lance MLX collection on mlx-community.
Lance-3B-8bit (MLX, image specialist, 8-bit quantized)
8-bit groupwise affine quantization of mlx-community/Lance-3B-bf16, the image-specialist Lance checkpoint. Produced via mlx-lm's quantize_model utility with a per-tower skip predicate (time_embedder, llm2vae, and vae_in_proj kept at bf16 for numerical safety; the bulk LLM weights — attention projections, MLP, embeddings, lm_head — quantized).
Status
🟢 Production-ready for image tasks on Apple Silicon as of 2026-05-21.
| Capability | Status | Speedup vs bf16 |
|---|---|---|
| t2i (text → image) | ✅ Photorealistic, prompt-aligned | ~2.7× faster (75 s vs 201 s for 768² × 30 steps × CFG=4.0) |
| image_edit (instruction-based) | ✅ Identity + style preservation | ~2.5× faster expected |
| x2t_image (image VQA) | ✅ Content-correct | similar / faster |
Memory footprint: 6.59 GB on disk (53% of the bf16 12.37 GB). Runtime RAM ~8–10 GB, comfortable on a 16 GB Mac.
Quality notes vs bf16
- Photorealism + content fidelity preserved. Cats, dragons, portraits, etc., all generate cleanly.
- Fine text on generated objects shows slight degradation. E.g. "STOP" on a sign may render as "SNICS" or similar near-miss. The content is otherwise correct (correct color, correct rectangular sign shape, recognizable text-like glyphs).
- For prompts that don't require legible in-image text, output is visually indistinguishable from bf16 to a casual eye.
Quickstart
from huggingface_hub import snapshot_download
weights = snapshot_download("mlx-community/Lance-3B-8bit")
Text-to-image
from lance_mlx.pipeline.t2i import TextToImagePipeline
pipe = TextToImagePipeline.from_pretrained(
lance_weights_dir=weights,
vae_safetensors=f"{weights}/vae.safetensors",
)
image = pipe.generate(
"A photorealistic tabby cat in a sunlit window.",
height=768, width=768, num_steps=30, cfg_scale=4.0, seed=42,
)
image.save("cat.png")
Image editing + VQA
Same API as the bf16 variant — ImageEditPipeline and UnderstandingPipeline both pick up the quantization block in config.json automatically via lance_mlx.model._loader.load_lance_model.
What's quantized vs skipped
| Component | Quantization | Why |
|---|---|---|
embed_tokens (151,936 × 2,048) |
✅ 8-bit | Big, tolerant |
lm_head (151,936 × 2,048) |
✅ 8-bit | Big, used in AR decode only |
32 layers × q/k/v/o_proj (UND) |
✅ 8-bit | Bulk of LLM compute |
32 layers × q/k/v/o_proj_moe_gen (GEN) |
✅ 8-bit | Bulk of GEN compute |
32 layers × mlp.{up,gate,down}_proj |
✅ 8-bit | Bulk of LLM compute |
32 layers × mlp_moe_gen.{up,gate,down} |
✅ 8-bit | Bulk of GEN compute |
time_embedder.proj_in/out |
❌ bf16 | Timestep info, numerically sensitive |
llm2vae (flow head, 2048 × 48) |
❌ bf16 | Tiny + critical to flow prediction |
vae_in_proj.vae2llm (2048 × 48) |
❌ bf16 | Auto-skipped (input_dim 48 ≠ 64*k) |
latent_pos_embed.pos_embed |
❌ bf16 | Custom param holder, no to_quantized |
| All RMSNorms + QK-norms | ❌ bf16 | F32 / bf16 norm scales preserved |
| Wan2.2 VAE (encoder + decoder) | ❌ bf16 | Pixel fidelity matters |
| Qwen2.5-VL ViT | ❌ bf16 | Semantic fidelity matters for x2t |
Recipe: 8-bit affine, group_size 64. quantization_report.json in this repo has full provenance.
Why no Video 8-bit yet
The video specialist (Lance_3B_Video) does not quantize cleanly to 8-bit with this recipe — t2v output collapses to a gray gradient regardless of whether the GEN tower is included or skipped, and finer group_sizes don't help. The video-specialist fine-tune has different weight distributions that affine 8-bit can't capture.
Reza2kn/lance-quant's findings suggest DWQ (dynamic weight quantization) with calibration is the right approach for Lance video at 8-bit and below. That's a Phase 5c project. For now, use mlx-community/Lance-3B-Video-bf16 at bf16 for video tasks.
Files in this repo
| File | Size | Notes |
|---|---|---|
model.safetensors |
6.59 GB | Quantized LLM weights (2033 tensors: each Linear becomes weight + scales + biases) |
vit.safetensors |
1.34 GB | bf16 (not quantized) |
vae.safetensors |
1.41 GB | bf16 (not quantized) |
config.json |
– | With quantization block (bits=8, group_size=64, mode=affine) |
quantization_report.json |
– | Provenance + footprint stats |
tokenizer.json / vocab.json |
– | Qwen2.5-VL vocabulary |
Architecture (same as the bf16 variant)
See mlx-community/Lance-3B-bf16 for the full architecture description.
License
This MLX port + quantization: Apache 2.0.
Underlying weights:
- Lance: Apache 2.0 (ByteDance Intelligent Creation Lab).
- Wan2.2 VAE: Apache 2.0 (Alibaba).
- Qwen2.5-VL: Apache 2.0 (Alibaba).
Citation
@article{fu2026lance,
title={Lance: Unified Multimodal Modeling by Multi-Task Synergy},
author={Fu, Fengyi and Huang, Mengqi and Wu, Shaojin and others},
journal={arXiv preprint arXiv:2605.18678},
year={2026}
}
Links
- MLX port code:
github.com/xocialize/lance-mlx - bf16 source:
mlx-community/Lance-3B-bf16 - Standalone VAE:
mlx-community/Wan2.2-VAE-Lance-bf16 - Video specialist (bf16, alpha 8-bit pending):
mlx-community/Lance-3B-Video-bf16
- Downloads last month
- 21
8-bit