OCR CoreML Detector 224

Detector224.mlpackage is a batch-1 CoreML conversion of the detector stage from NVIDIA Nemotron OCR v2. It is intended for Apple-device OCR pipelines that need a packaged text detection model and will implement their own post-processing, recognition, and layout stages.

SwiftPM package: github.com/mweinbach/OCRCoreMLDetector

What This Is

  • Source model: nvidia/nemotron-ocr-v2, v2_english detector
  • CoreML artifact: Detector224.mlpackage
  • Conversion config: experiments/224_detector_ane_decomposed_int8_768
  • Input size: 768 x 768
  • Batch size: 1
  • Weight quantization: int8 per-channel linear symmetric
  • Compute precision: fp16
  • Minimum deployment target used during conversion: iOS 18

This is not a complete OCR system. The package returns detector tensors only. Downstream code still needs thresholding, rotated-box decoding, non-maximum suppression, crop/rectify, recognition, and reading-order/layout logic.

Files

file purpose
Detector224.mlpackage/ CoreML model package
conversion_config.yaml conversion/benchmark config used to create the artifact
bench.md local CoreML latency results
parity.json PyTorch-vs-CoreML parity summary
checksums.sha256 SHA-256 checksums for the CoreML package files
LICENSE NVIDIA Open Model License plus Apache 2.0 source license text from upstream
NOTICE redistribution attribution notice

Input Contract

The model expects one CoreML input named image:

  • shape: Float32[1, 3, 768, 768]
  • layout: RGB planar
  • normalization: pixel values in [0, 1]

The SwiftPM wrapper linked above includes a helper that converts a CGImage to this tensor shape.

Output Contract

The model returns:

output shape meaning
prob Float32[1, 192, 192] text probability map
rboxes Float32[1, 192, 192, 5] rotated-box geometry
features Float32[1, 128, 192, 192] detector feature map

Local Performance

Measured on the bundled sample image after warmup:

compute units median prediction latency
ALL 13.53 ms
CPU_AND_GPU 13.65 ms
CPU_AND_NE 54.51 ms
CPU_ONLY 298.28 ms

For lowest single-image latency in the current test environment, use GPU or CoreML ALL. CPU+ANE is available but slower for this detector.

Use From Swift

Add the Swift package:

.package(url: "https://github.com/mweinbach/OCRCoreMLDetector.git", from: "0.1.0")

Then:

import CoreML
import OCRCoreMLDetector

let detector = try OCRDetector(computeUnits: .cpuAndGPU)
let prediction = try detector.prediction(for: cgImage)

let prob = prediction.output.prob
let rboxes = prediction.output.rboxes
let features = prediction.output.features

License

The converted model weights inherit the NVIDIA Open Model License Agreement. The upstream source code and helper scripts are Apache 2.0. See LICENSE and NOTICE for redistribution terms and attribution.

Downloads last month
5
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mweinbach1/ocr-coreml-detector-224

Quantized
(3)
this model