Qwen3-Embedding-0.6B CoreML (ANE-optimized)

CoreML model optimized for Apple Neural Engine. Fixed-shape profiles for maximum ANE throughput. Converted using tooktang/qwen3-ane-embed converter.

Performance (M1 Pro)

Profile Speed Throughput CPU/GPU usage
b1_s128 25ms 40 items/sec zero
b1_s512 140ms 7 items/sec zero

Runs entirely on Apple Neural Engine β€” no CPU or GPU load.

Profiles

Profile Batch Seq len Use case
b1_s128 1 128 Tags, short text
b1_s512 1 512 Documents, episodes

Files

  • compiled/b1_s128.mlmodelc β€” Pre-compiled, load directly with MLModel
  • compiled/b1_s512.mlmodelc β€” Pre-compiled
  • packages/b1_s128.mlpackage β€” Source package (for re-compilation)
  • packages/b1_s512.mlpackage β€” Source package
  • tokenizer/ β€” Qwen3 tokenizer files

Input/Output

  • Inputs: input_ids (int32), attention_mask (int32) β€” left-padded to profile seq_len
  • Output: embedding (float32, 1024-dim, L2-normalized)
  • Pad token: 151643 (Qwen3 eos_token)

Usage

Left-pad input tokens to the profile seq_len. Pick the smallest profile that fits your input. The model handles last-token pooling and L2 normalization internally.

Attribution

ANE-optimized conversion based on tooktang/Qwen3-Embedding-0.6B-CoreML.

Downloads last month
11
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for neuradex/Qwen3-Embedding-0.6B-CoreML-ANE

Quantized
(60)
this model