MMS-LID 1024 (Core ML, 6-bit Palettized)

Core ML conversion of facebook/mms-lid-1024 for on-device speech language identification. This variant uses 6-bit palettization (k-means): smaller than 8-bit, with slightly more divergence from the float16 base on some inputs.

  • Source: facebook/mms-lid-1024
  • Input: Raw 16 kHz mono waveform, fixed 10 seconds (160,000 samples), shape (1, 160000) float32
  • Output: Logits shape (1, 1024); argmax → class index. Map to ISO 639-3 via labels.json or mms_lid_id2label.json

Contents

File Description
Core ML .mlpackage (repo root) 6-bit palettized MMS-LID 1024
labels.json Ordered list of 1024 ISO 639-3 language codes
mms_lid_id2label.json Index → language code mapping

When to use this variant

Usage on iOS / macOS

Same as other MMS-LID Core ML repos: load the .mlpackage, feed 10 s of 16 kHz mono as input_values, take argmax of logits, and look up the language in labels.json.

Limitations

Same as base: fixed 10 s input, L2 accent misclassification, English ↔ Hawaiian/Maori confusion. Use chunking and confidence threshold where appropriate.

Mac smoke test (Core ML)

On-device smoke run: each file under INPUT/audio was resampled to 16 kHz mono float32, padded or trimmed to 160,000 samples (10 s), then passed to input_values; pred is ISO 639-3 from argmax(logits); conf is softmax mass on the predicted class (runner-side).

Note: Filenames are hints only (e.g. English.mp3 is not ground truth). Low conf or known MMS-LID confusions (e.g. English vs haw) may still appear.

Raw runner log
MMS-LID 1024 Core ML — Mac smoke test
Model: https://huggingface.co/aoiandroid/mms-lid-1024-coreml-6bit
Model dir: $TRANSLATEBLUE/Log/mms_lid_1024_6bit_mac_test/model_repo
Audio dir: $TRANSLATEBLUE/INPUT/audio
Compiled temp: /var/folders/ky/nmbswxzs0s79wdxndfw1y6wh0000gn/T/model_repo.mlmodelc
Compute: MLComputeUnits(rawValue: 2)
Input: input_values  Output: logits
Labels: 1024
Host: ams-macbook-air.local  macOS: Version 26.3.1 (a) (Build 25D771280a)
English.mp3  pcm_samples=9054841  pred=haw  conf=0.2631  max_logit=7.6055  time_ms=1150.9
Euskara.mp3  pcm_samples=1865769  pred=hin  conf=0.3606  max_logit=7.6602  time_ms=410.0
Guaraní.mp3  pcm_samples=1682285  pred=grn  conf=0.9993  max_logit=14.7656  time_ms=404.3
Yorùbá.mp3  pcm_samples=1067049  pred=nia  conf=0.2906  max_logit=7.3477  time_ms=371.8
afrikaasns.mp3  pcm_samples=2387800  pred=nld  conf=0.9994  max_logit=15.0781  time_ms=465.9
arabic.mp3  pcm_samples=2060120  pred=ara  conf=0.9992  max_logit=14.5234  time_ms=435.7
bengali.m4a  pcm_samples=7836432  pred=ben  conf=0.9984  max_logit=14.4297  time_ms=554.0
chinese.mp3  pcm_samples=12904245  pred=cmn  conf=0.9992  max_logit=14.3203  time_ms=1261.9
isiZulu.mp3  pcm_samples=1396819  pred=heb  conf=0.5155  max_logit=8.0859  time_ms=399.8
kiswahili.mp3  pcm_samples=1888757  pred=swh  conf=0.9989  max_logit=14.2344  time_ms=416.6
korean.mp3  pcm_samples=2364395  pred=kor  conf=0.9995  max_logit=15.1797  time_ms=429.2
russinan.m4a  pcm_samples=15431029  pred=rus  conf=0.2657  max_logit=7.8242  time_ms=747.3
test.mp3  pcm_samples=274560  pred=jpn  conf=0.9983  max_logit=14.4844  time_ms=341.5
日本語.mp3  pcm_samples=1798234  pred=jpn  conf=0.9984  max_logit=14.5312  time_ms=467.0

License

CC-BY-NC-4.0 (inherited from facebook/mms-lid-1024).

Citation

@article{pratap2023mms,
  title={Scaling Speech Technology to 1,000+ Languages},
  author={Pratap, Vineel and others},
  journal={arXiv preprint arXiv:2305.13516},
  year={2023}
}
Downloads last month
12
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including aoiandroid/mms-lid-1024-coreml-6bit

Paper for aoiandroid/mms-lid-1024-coreml-6bit