OCR CoreML Detector 224
Detector224.mlpackage is a batch-1 CoreML conversion of the detector stage
from NVIDIA Nemotron OCR v2.
It is intended for Apple-device OCR pipelines that need a packaged text
detection model and will implement their own post-processing, recognition, and
layout stages.
SwiftPM package: github.com/mweinbach/OCRCoreMLDetector
What This Is
- Source model:
nvidia/nemotron-ocr-v2,v2_englishdetector - CoreML artifact:
Detector224.mlpackage - Conversion config:
experiments/224_detector_ane_decomposed_int8_768 - Input size:
768 x 768 - Batch size:
1 - Weight quantization: int8 per-channel linear symmetric
- Compute precision: fp16
- Minimum deployment target used during conversion: iOS 18
This is not a complete OCR system. The package returns detector tensors only. Downstream code still needs thresholding, rotated-box decoding, non-maximum suppression, crop/rectify, recognition, and reading-order/layout logic.
Files
| file | purpose |
|---|---|
Detector224.mlpackage/ |
CoreML model package |
conversion_config.yaml |
conversion/benchmark config used to create the artifact |
bench.md |
local CoreML latency results |
parity.json |
PyTorch-vs-CoreML parity summary |
checksums.sha256 |
SHA-256 checksums for the CoreML package files |
LICENSE |
NVIDIA Open Model License plus Apache 2.0 source license text from upstream |
NOTICE |
redistribution attribution notice |
Input Contract
The model expects one CoreML input named image:
- shape:
Float32[1, 3, 768, 768] - layout: RGB planar
- normalization: pixel values in
[0, 1]
The SwiftPM wrapper linked above includes a helper that converts a CGImage to
this tensor shape.
Output Contract
The model returns:
| output | shape | meaning |
|---|---|---|
prob |
Float32[1, 192, 192] |
text probability map |
rboxes |
Float32[1, 192, 192, 5] |
rotated-box geometry |
features |
Float32[1, 128, 192, 192] |
detector feature map |
Local Performance
Measured on the bundled sample image after warmup:
| compute units | median prediction latency |
|---|---|
ALL |
13.53 ms |
CPU_AND_GPU |
13.65 ms |
CPU_AND_NE |
54.51 ms |
CPU_ONLY |
298.28 ms |
For lowest single-image latency in the current test environment, use GPU or
CoreML ALL. CPU+ANE is available but slower for this detector.
Use From Swift
Add the Swift package:
.package(url: "https://github.com/mweinbach/OCRCoreMLDetector.git", from: "0.1.0")
Then:
import CoreML
import OCRCoreMLDetector
let detector = try OCRDetector(computeUnits: .cpuAndGPU)
let prediction = try detector.prediction(for: cgImage)
let prob = prediction.output.prob
let rboxes = prediction.output.rboxes
let features = prediction.output.features
License
The converted model weights inherit the
NVIDIA Open Model License Agreement.
The upstream source code and helper scripts are Apache 2.0. See LICENSE and
NOTICE for redistribution terms and attribution.
- Downloads last month
- 5
Model tree for mweinbach1/ocr-coreml-detector-224
Base model
nvidia/nemotron-ocr-v2