--- language: pl tags: - ner - polish - token-classification - onnx datasets: - custom widget: - text: "Wczoraj kupiłem w Biedronce mleko za 4,99 zł" - text: "Spotkałem Pawła Kowalskiego w Warszawie" - text: "W przyszły wtorek jadę do Krakowa" --- # nodoki-ner-polish Custom Polish NER model for [nodoki.app](https://github.com/yourusername/nodoki) - a local-first PWA for personal information management. ## Model Description This model is fine-tuned DistilBERT for Polish Named Entity Recognition with custom entity types: - **PERSON** - Names of people (👤 osoba) - **POLISH_CITY** - Polish cities (📍 miasto) - **FOREIGN_CITY** - International cities (📍 miasto) - **POLISH_STORE** - Polish retail chains (🛒 sklep) - **AMOUNT_PLN** - Money amounts in PLN (💰 kwota PLN) - **MONTH** - Polish month names (📅 miesiąc) - **WEEKDAY** - Days of the week (📆 dzień tygodnia) - **DATE_RELATIVE** - Relative dates like wczoraj, dzisiaj, jutro (🕐 data względna) ## Performance - **F1 Score**: 0.9985 - **Format**: ONNX quantized (uint8) - **Size**: ~130MB ## Usage with Transformers.js ```javascript import { pipeline } from '@xenova/transformers'; const classifier = await pipeline( 'token-classification', 'HerqAI/nodoki-ner-polish' ); const result = await classifier('Wczoraj kupiłem w Biedronce mleko za 4,99 zł'); console.log(result); ``` ## Training Details - Base model: `distilbert-base-multilingual-cased` - Training samples: 5,000 - Validation samples: 1,000 - Test samples: 750 - Epochs: 4 - Learning rate: 2e-5 - Batch size: 16 ## License Apache 2.0