OCR CoreML Detector 224

Detector224.mlpackage is a batch-1 CoreML conversion of the detector stage from NVIDIA Nemotron OCR v2. It is intended for Apple-device OCR pipelines that need a packaged text detection model and will implement their own post-processing, recognition, and layout stages.

SwiftPM package: github.com/mweinbach/OCRCoreMLDetector

What This Is

Source model: nvidia/nemotron-ocr-v2, v2_english detector
CoreML artifact: Detector224.mlpackage
Conversion config: experiments/224_detector_ane_decomposed_int8_768
Input size: 768 x 768
Batch size: 1
Weight quantization: int8 per-channel linear symmetric
Compute precision: fp16
Minimum deployment target used during conversion: iOS 18

This is not a complete OCR system. The package returns detector tensors only. Downstream code still needs thresholding, rotated-box decoding, non-maximum suppression, crop/rectify, recognition, and reading-order/layout logic.

Files

file	purpose
`Detector224.mlpackage/`	CoreML model package
`conversion_config.yaml`	conversion/benchmark config used to create the artifact
`bench.md`	local CoreML latency results
`parity.json`	PyTorch-vs-CoreML parity summary
`checksums.sha256`	SHA-256 checksums for the CoreML package files
`LICENSE`	NVIDIA Open Model License plus Apache 2.0 source license text from upstream
`NOTICE`	redistribution attribution notice

Input Contract

The model expects one CoreML input named image:

shape: Float32[1, 3, 768, 768]
layout: RGB planar
normalization: pixel values in [0, 1]

The SwiftPM wrapper linked above includes a helper that converts a CGImage to this tensor shape.

Output Contract

The model returns:

output	shape	meaning
`prob`	`Float32[1, 192, 192]`	text probability map
`rboxes`	`Float32[1, 192, 192, 5]`	rotated-box geometry
`features`	`Float32[1, 128, 192, 192]`	detector feature map

Local Performance

Measured on the bundled sample image after warmup:

compute units	median prediction latency
`ALL`	13.53 ms
`CPU_AND_GPU`	13.65 ms
`CPU_AND_NE`	54.51 ms
`CPU_ONLY`	298.28 ms

For lowest single-image latency in the current test environment, use GPU or CoreML ALL. CPU+ANE is available but slower for this detector.

Use From Swift

Add the Swift package:

.package(url: "https://github.com/mweinbach/OCRCoreMLDetector.git", from: "0.1.0")

Then:

import CoreML
import OCRCoreMLDetector

let detector = try OCRDetector(computeUnits: .cpuAndGPU)
let prediction = try detector.prediction(for: cgImage)

let prob = prediction.output.prob
let rboxes = prediction.output.rboxes
let features = prediction.output.features

License

The converted model weights inherit the NVIDIA Open Model License Agreement. The upstream source code and helper scripts are Apache 2.0. See LICENSE and NOTICE for redistribution terms and attribution.

Downloads last month: 5

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mweinbach1/ocr-coreml-detector-224

Base model

nvidia/nemotron-ocr-v2

Quantized

(3)

this model