MMS-LID 1024 (Core ML, 6-bit Palettized)
Core ML conversion of facebook/mms-lid-1024 for on-device speech language identification. This variant uses 6-bit palettization (k-means): smaller than 8-bit, with slightly more divergence from the float16 base on some inputs.
- Source: facebook/mms-lid-1024
- Input: Raw 16 kHz mono waveform, fixed 10 seconds (160,000 samples), shape
(1, 160000)float32 - Output: Logits shape
(1, 1024);argmax→ class index. Map to ISO 639-3 vialabels.jsonormms_lid_id2label.json
Contents
| File | Description |
|---|---|
Core ML .mlpackage (repo root) |
6-bit palettized MMS-LID 1024 |
labels.json |
Ordered list of 1024 ISO 639-3 language codes |
mms_lid_id2label.json |
Index → language code mapping |
When to use this variant
- Prefer minimum size while staying on the same 1024-class head as other variants.
- If you need maximum agreement with PyTorch / float16, prefer mms-lid-1024-coreml or mms-lid-1024-coreml-8bit.
Usage on iOS / macOS
Same as other MMS-LID Core ML repos: load the .mlpackage, feed 10 s of 16 kHz mono as input_values, take argmax of logits, and look up the language in labels.json.
Limitations
Same as base: fixed 10 s input, L2 accent misclassification, English ↔ Hawaiian/Maori confusion. Use chunking and confidence threshold where appropriate.
Mac smoke test (Core ML)
On-device smoke run: each file under INPUT/audio was resampled to 16 kHz mono float32, padded or trimmed to 160,000 samples (10 s), then passed to input_values; pred is ISO 639-3 from argmax(logits); conf is softmax mass on the predicted class (runner-side).
Note: Filenames are hints only (e.g. English.mp3 is not ground truth). Low conf or known MMS-LID confusions (e.g. English vs haw) may still appear.
Raw runner log
MMS-LID 1024 Core ML — Mac smoke test
Model: https://huggingface.co/aoiandroid/mms-lid-1024-coreml-6bit
Model dir: $TRANSLATEBLUE/Log/mms_lid_1024_6bit_mac_test/model_repo
Audio dir: $TRANSLATEBLUE/INPUT/audio
Compiled temp: /var/folders/ky/nmbswxzs0s79wdxndfw1y6wh0000gn/T/model_repo.mlmodelc
Compute: MLComputeUnits(rawValue: 2)
Input: input_values Output: logits
Labels: 1024
Host: ams-macbook-air.local macOS: Version 26.3.1 (a) (Build 25D771280a)
English.mp3 pcm_samples=9054841 pred=haw conf=0.2631 max_logit=7.6055 time_ms=1150.9
Euskara.mp3 pcm_samples=1865769 pred=hin conf=0.3606 max_logit=7.6602 time_ms=410.0
Guaraní.mp3 pcm_samples=1682285 pred=grn conf=0.9993 max_logit=14.7656 time_ms=404.3
Yorùbá.mp3 pcm_samples=1067049 pred=nia conf=0.2906 max_logit=7.3477 time_ms=371.8
afrikaasns.mp3 pcm_samples=2387800 pred=nld conf=0.9994 max_logit=15.0781 time_ms=465.9
arabic.mp3 pcm_samples=2060120 pred=ara conf=0.9992 max_logit=14.5234 time_ms=435.7
bengali.m4a pcm_samples=7836432 pred=ben conf=0.9984 max_logit=14.4297 time_ms=554.0
chinese.mp3 pcm_samples=12904245 pred=cmn conf=0.9992 max_logit=14.3203 time_ms=1261.9
isiZulu.mp3 pcm_samples=1396819 pred=heb conf=0.5155 max_logit=8.0859 time_ms=399.8
kiswahili.mp3 pcm_samples=1888757 pred=swh conf=0.9989 max_logit=14.2344 time_ms=416.6
korean.mp3 pcm_samples=2364395 pred=kor conf=0.9995 max_logit=15.1797 time_ms=429.2
russinan.m4a pcm_samples=15431029 pred=rus conf=0.2657 max_logit=7.8242 time_ms=747.3
test.mp3 pcm_samples=274560 pred=jpn conf=0.9983 max_logit=14.4844 time_ms=341.5
日本語.mp3 pcm_samples=1798234 pred=jpn conf=0.9984 max_logit=14.5312 time_ms=467.0
License
CC-BY-NC-4.0 (inherited from facebook/mms-lid-1024).
Citation
@article{pratap2023mms,
title={Scaling Speech Technology to 1,000+ Languages},
author={Pratap, Vineel and others},
journal={arXiv preprint arXiv:2305.13516},
year={2023}
}
- Downloads last month
- 12