InsightFace Batch-Optimized Models (Max Batch 32)
Re-exported InsightFace models with proper dynamic batch support and no cross-frame contamination.
Version Comparison
| Repository | Max Batch | Recommendation |
|---|---|---|
| This repo | 1-32 | โ Recommended - Optimal performance |
| alonsorobots/scrfd_320_batched_64 | 1-64 | For experimentation |
Batch=32 is optimal. Testing on RTX 5090 shows batch=64 provides no additional throughput benefit.
Why These Models?
The original InsightFace ONNX models have issues with batch inference:
buffalo_ldetection model: hardcoded batch=1buffalo_l_batchdetection model: broken - has cross-frame contamination due to reshape operations that flatten the batch dimension
These re-exports fix the dynamic_axes in the ONNX graph for true batch inference.
Models
| Model | Task | Input Shape | Output | Batch | Speedup |
|---|---|---|---|---|---|
scrfd_10g_320_batch.onnx |
Face Detection | [N, 3, 320, 320] |
boxes, landmarks | 1-32 | 6ร |
arcface_w600k_r50_batch.onnx |
Face Embedding | [N, 3, 112, 112] |
512-dim vectors | 1-32 | 10ร |
Performance (TensorRT FP16, RTX 5090)
SCRFD Face Detection
| Batch Size | FPS | ms/frame |
|---|---|---|
| 1 | 867 | 1.15 |
| 16 | 5,498 | 0.18 |
ArcFace Embeddings
| Batch Size | FPS | ms/embedding |
|---|---|---|
| 1 | 292 | 3.4 |
| 16 | 3,029 | 0.33 |
Usage
import numpy as np
import onnxruntime as ort
# Load model
sess = ort.InferenceSession("scrfd_10g_320_batch.onnx",
providers=["TensorrtExecutionProvider", "CUDAExecutionProvider"])
# Batch inference
batch = np.random.randn(16, 3, 320, 320).astype(np.float32)
outputs = sess.run(None, {"input.1": batch})
# outputs[0-2]: scores per FPN level (stride 8, 16, 32)
# outputs[3-5]: bboxes per FPN level
# outputs[6-8]: keypoints per FPN level
Verified: No Batch Contamination
# Same frame processed alone vs in batch = identical results
single_output = sess.run(None, {"input.1": frame[np.newaxis, ...]})
batch[7] = frame
batch_output = sess.run(None, {"input.1": batch})
max_diff = np.max(np.abs(single_output[0] - batch_output[0][7]))
# max_diff < 1e-5 โ
Re-export Process
These models were re-exported from InsightFace's PyTorch source using MMDetection with proper dynamic_axes:
dynamic_axes = {
"input.1": {0: "batch"},
"score_8": {0: "batch"},
"score_16": {0: "batch"},
# ... all outputs
}
See SCRFD_320_EXPORT_INSTRUCTIONS.md for details.
License
Non-commercial research purposes only - per InsightFace license.
For commercial licensing, contact: [email protected]
Credits
- Original models: InsightFace by Jia Guo et al.
- SCRFD paper: Sample and Computation Redistribution for Efficient Face Detection
- ArcFace paper: ArcFace: Additive Angular Margin Loss for Deep Face Recognition