RysOCR - Polish OCR LoRA for PaddleOCR-VL

A LoRA adapter fine-tuned on PaddleOCR-VL specifically for Polish text recognition, with emphasis on correct handling of Polish diacritics (ą, ć, ę, ł, ń, ó, ś, ź, ż).

Motivation

Polish is underrepresented in OCR training data. Most vision-language OCR models struggle with Polish diacritics, often substituting:

  • ąa
  • ęe
  • łl or t
  • óo
  • etc.

This model addresses that gap by fine-tuning on synthetic Polish document images covering addresses, invoices, receipts, names, and common phrases.

Model Details

Property Value
Base Model PaddlePaddle/PaddleOCR-VL
Method LoRA (Low-Rank Adaptation)
LoRA Rank 16
LoRA Alpha 32
Target Modules q_proj, k_proj, v_proj, o_proj
Training Framework PEFT 0.18.0 + Transformers

Usage

from transformers import AutoModelForCausalLM, AutoProcessor
from peft import PeftModel
from PIL import Image

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "PaddlePaddle/PaddleOCR-VL",
    trust_remote_code=True,
    torch_dtype="auto",
    device_map="auto"
)

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "anon13370/RysOCR")

processor = AutoProcessor.from_pretrained(
    "anon13370/RysOCR",
    trust_remote_code=True
)

# Run inference
image = Image.open("your_document.png")
prompt = "OCR: "

inputs = processor(images=image, text=prompt, return_tensors="pt")
inputs = {k: v.to(model.device) for k, v in inputs.items()}

outputs = model.generate(**inputs, max_new_tokens=256)
text = processor.decode(outputs[0], skip_special_tokens=True)
print(text)

Training Details

  • Training Data: 10,000 synthetic Polish document images
  • Categories: Addresses, invoice lines, receipt lines, dates, names, prices, phrases
  • Hardware: Trained with LoRA to enable fine-tuning on consumer hardware (4-6GB VRAM)
  • Epochs: 1 epoch over full dataset
  • Optimizer: AdamW with linear learning rate schedule

Baseline Performance (Pre-Fine-Tuning)

Baseline PaddleOCR-VL performance on Polish test set:

Metric Value
Character Error Rate (CER) 5.58%
Word Error Rate (WER) 13.37%
Exact Match 74.00%
Diacritic Accuracy 74.14%

Improved version: Summary:

Baseline Fine-tuned
CER 5.58% 1.60%
WER 13.37% 7.21%
Exact 74% 76%

Key diacritic confusions in baseline:

  • ł frequently confused with l or t
  • ę sometimes rendered as e
  • ś confused with š

Limitations

  • Optimized for printed Polish text; handwritten recognition may vary
  • Best results on clean document scans; heavily degraded images may still have errors
  • Inference requires loading both base model and LoRA weights

License

Apache 2.0 (same as base model)

Citation

If you use this model, please cite:

@misc{rysocr2024,
  title={RysOCR: Polish OCR LoRA for PaddleOCR-VL},
  author={Kacper Wikieł},
  year={2024},
  publisher={Hugging Face},
  url={https://huggingface.co/anon13370/RysOCR}
}
Downloads last month
126
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kacperwikiel/RysOCR

Adapter
(6)
this model