Qwen3-Embedding-0.6B CoreML (ANE-optimized)

CoreML model optimized for Apple Neural Engine. Fixed-shape profiles for maximum ANE throughput. Converted using tooktang/qwen3-ane-embed converter.

Performance (M1 Pro)

Profile	Speed	Throughput	CPU/GPU usage
b1_s128	25ms	40 items/sec	zero
b1_s512	140ms	7 items/sec	zero

Runs entirely on Apple Neural Engine — no CPU or GPU load.

Profiles

Profile	Batch	Seq len	Use case
b1_s128	1	128	Tags, short text
b1_s512	1	512	Documents, episodes

Files

compiled/b1_s128.mlmodelc — Pre-compiled, load directly with MLModel
compiled/b1_s512.mlmodelc — Pre-compiled
packages/b1_s128.mlpackage — Source package (for re-compilation)
packages/b1_s512.mlpackage — Source package
tokenizer/ — Qwen3 tokenizer files

Input/Output

Inputs: input_ids (int32), attention_mask (int32) — left-padded to profile seq_len
Output: embedding (float32, 1024-dim, L2-normalized)
Pad token: 151643 (Qwen3 eos_token)

Usage

Left-pad input tokens to the profile seq_len. Pick the smallest profile that fits your input. The model handles last-token pooling and L2 normalization internally.

Attribution

ANE-optimized conversion based on tooktang/Qwen3-Embedding-0.6B-CoreML.

Downloads last month: 11

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for neuradex/Qwen3-Embedding-0.6B-CoreML-ANE

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-Embedding-0.6B

Quantized

(60)

this model