Qwen3-Embedding-0.6B CoreML (ANE-optimized)
CoreML model optimized for Apple Neural Engine. Fixed-shape profiles for maximum ANE throughput. Converted using tooktang/qwen3-ane-embed converter.
Performance (M1 Pro)
| Profile | Speed | Throughput | CPU/GPU usage |
|---|---|---|---|
| b1_s128 | 25ms | 40 items/sec | zero |
| b1_s512 | 140ms | 7 items/sec | zero |
Runs entirely on Apple Neural Engine β no CPU or GPU load.
Profiles
| Profile | Batch | Seq len | Use case |
|---|---|---|---|
| b1_s128 | 1 | 128 | Tags, short text |
| b1_s512 | 1 | 512 | Documents, episodes |
Files
compiled/b1_s128.mlmodelcβ Pre-compiled, load directly with MLModelcompiled/b1_s512.mlmodelcβ Pre-compiledpackages/b1_s128.mlpackageβ Source package (for re-compilation)packages/b1_s512.mlpackageβ Source packagetokenizer/β Qwen3 tokenizer files
Input/Output
- Inputs:
input_ids(int32),attention_mask(int32) β left-padded to profile seq_len - Output:
embedding(float32, 1024-dim, L2-normalized) - Pad token: 151643 (Qwen3 eos_token)
Usage
Left-pad input tokens to the profile seq_len. Pick the smallest profile that fits your input. The model handles last-token pooling and L2 normalization internally.
Attribution
ANE-optimized conversion based on tooktang/Qwen3-Embedding-0.6B-CoreML.
- Downloads last month
- 11
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support