IrishCore-GlobalPointer-ContextPII-135M-v1-rc8
IrishCore-GlobalPointer-ContextPII-135M-v1-rc8 is the current expanded-label raw-only PII masking release for Irish public-sector, HSE, and citizen-support flows.
It keeps the same DistilBERT-size GlobalPointer span extractor family as temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc7, but ships a stronger bundled decoder for contextual DOB, phone, and address boundary recovery.
Context labels served by this line:
STREET_ADDRESSCITYCOUNTYDATE_OF_BIRTHAGE
Core labels retained:
PPSNPOSTCODEPHONE_NUMBEREMAILPASSPORT_NUMBERACCOUNT_NUMBERBANK_ROUTING_NUMBERSWIFT_BICCREDIT_DEBIT_CARDFIRST_NAMELAST_NAME
Positioning
rc8 is a decoder-hardening release over rc7.
- weights unchanged
- ONNX graph unchanged
- no external scanner or validator added
- deployment path still single-pass span extraction plus deterministic
[PII:LABEL]replacement
What changed in rc8:
- the decoder now recovers date-of-birth spans from phrasing like
born 14/03/1991 - the decoder now repairs full Irish phone spans like
+353 (0)87 123 4567and+353 (0)1 671 1633 - the decoder now preserves longer cue-led street-address spans such as
Apartment 4B, 12 Main StreetandTeach na Trรก, 7 Bรณthar na Trรก - the decoder now recovers additional prefixed Gaelic city forms such as
hUaimh - the new
globalpointer_context_redteam_v2suite is1.0000on both full and q8 paths and captures the exact contextual regressions fixed in this release - the new
globalpointer_location_coverage_v3suite remains1.0000on both full and q8 paths
Use this release when you need broader masking for Irish gov / HSE / citizen-support text, including user turns and assistant answers that contain:
- personal address fragments
- city / county
- date of birth
- age
- official callback numbers or public-service mailbox emails that still need masking in assistant output
If you only need the narrower Irish-core structured label set and want maximum CPU throughput, temsa/IrishCore-GlobalPointer-135M-v1-rc4 remains the faster option.
Architecture
- base encoder:
OpenMed/OpenMed-PII-mLiteClinical-Base-135M-v1 - extractor head: GlobalPointer-style typed span matrix
- runtime: single-pass span extraction
- output policy: deterministic bracket masking, for example
[PII:PPSN] - deployment target: ONNX Runtime CPU with dynamic q8 per-channel quantization
This is not a generative model and does not rewrite text. It predicts typed spans and the provided inference scripts replace them with [PII:LABEL] placeholders.
Decoder Policy
The serving path is still raw-only in the sense that there is no external scanner or validator service. The repo does include bundled decoder repairs for highly structured or contextual spans:
- PPSN normalization and overlap repair
- passport cue-based repair
- contextual date-of-birth repair
- contextual phone recovery, including
Fรณn:cues and+353 (0)formats - full-email recovery
- Eircode recovery
- contextual street-address recovery, including cue-led suffixless address blocks and longer apartment / house-name spans
- explicit county phrase recovery with county-over-name overlap cleanup
- prefixed Irish city-form recovery
These repairs live inside common.py and are part of the published inference path.
Benchmarks
ONNX q8
| Suite | F1 | Examples/s |
|---|---|---|
| Irish core | 1.0000 | 128.8825 |
| Irish extended | 1.0000 | 77.2357 |
| Demographic holdout v2 | 0.9996 | 97.4922 |
| Gov contact policy v1 | 1.0000 | 58.7415 |
| Gov chatbot red-team v2 | 0.9861 | 77.4549 |
| Gov chatbot gap holdout v2 | 1.0000 | 78.1248 |
| Context red-team v1 | 1.0000 | 77.1930 |
| Context red-team v2 | 1.0000 | 71.8467 |
| Location coverage v1 | 1.0000 | 106.2670 |
| Location coverage v2 | 1.0000 | 109.5137 |
| Location coverage v3 | 1.0000 | 114.7391 |
| Numeric qafix v2 | 1.0000 | 150.9314 |
| Multilingual PPSN overall | 0.9333 | 137.9681 |
| Multilingual PPSN label-only | 1.0000 | โ |
Full checkpoint
| Suite | F1 | Examples/s |
|---|---|---|
| Irish core | 1.0000 | 35.0576 |
| Irish extended | 1.0000 | 24.1513 |
| Demographic holdout v2 | 0.9996 | 56.7818 |
| Gov contact policy v1 | 1.0000 | 21.4863 |
| Gov chatbot red-team v2 | 0.9861 | 32.1788 |
| Gov chatbot gap holdout v2 | 1.0000 | 51.5550 |
| Context red-team v1 | 1.0000 | 22.9530 |
| Context red-team v2 | 1.0000 | 13.1421 |
| Location coverage v1 | 1.0000 | 28.4516 |
| Location coverage v2 | 1.0000 | 33.2787 |
| Location coverage v3 | 1.0000 | 36.8150 |
| Numeric qafix v2 | 1.0000 | 29.9216 |
| Multilingual PPSN overall | 0.9333 | 56.2667 |
| Multilingual PPSN label-only | 1.0000 | โ |
Comparison
| Model | Core F1 | Gov Contact Policy v1 F1 | Context Red-team v2 F1 | Multilingual F1 | Core examples/s |
|---|---|---|---|---|---|
| ContextPII rc8 q8 | 1.0000 | 1.0000 | 1.0000 | 0.9333 | 128.8825 |
| ContextPII rc7 q8 | 1.0000 | 1.0000 | 0.7500 | 0.9333 | 88.9982 |
| ContextPII rc6 q8 | 1.0000 | 1.0000 | โ | 0.9333 | 88.9982 |
| GlobalPointer rc4 q8 | 1.0000 | 0.7843 | โ | 0.9333 | 221.5743 |
| DiffMask rc6 q8 | 0.9733 | โ | โ | 0.9274 | 130.3415 |
Tradeoff:
- this expanded-label line is materially better on contextual Irish masking tasks, including DOB, phone, address-boundary, and prefixed Gaelic city repairs
- it is slower than the core-only GlobalPointer line on CPU because it carries a broader label inventory and more decoder work
Evaluation Notes
Additional q8 release checks shipped in this repo:
eval/q8_irish_numeric_qafix_v2.json: numeric false-positive guardrail suite,1.0000F1eval/q8_globalpointer_context_redteam_v1.json: contextual hardening suite for apartment-prefix street addresses, explicit County/Contae forms, and public-office address blocks,1.0000F1eval/q8_globalpointer_context_redteam_v2.json: contextual regression suite forbornDOB phrasing,+353 (0)phone formatting, longer apartment / house-name address spans, andhUaimh,1.0000F1 inrc8versus0.7500in the publicrc7bundleeval/q8_globalpointer_location_coverage_v1.json: broader Irish city/county/address coverage suite,1.0000F1eval/q8_globalpointer_location_coverage_v2.json: prefixed-Gaelic city-form suite,1.0000F1eval/q8_globalpointer_location_coverage_v3.json: additional prefixed-Gaelic city-form suite,1.0000F1
Other notes:
globalpointer_demographic_patch_v2_testis the corrected held-out benchmark. The earlier v1 demographic patch contained invalid synthetic Eircodes in some rows.irish_gov_contact_policy_v1is the policy-aligned assistant-output benchmark for this release.globalpointer_context_redteam_v2is the new contextual regression benchmark for DOB phrasing,+353 (0)phones, and longer cue-led address spans.- The legacy
irish_gov_chatbot_redteam_v2negatives still assume some public assistant contact details should not be masked. That assumption does not match this release's target policy.
Usage
Full checkpoint:
python3 inference_mask.py --model temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc8 --text "The patient was born 14/03/1991 and can be reached on +353 (0)87 123 4567."
ONNX q8:
python3 inference_mask_onnx.py --model temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc8 --text "Seoladh: Teach na Trรก, 7 Bรณthar na Trรก, nGaillimh, H91 F4E2."
Expected masking style:
Seoladh: [PII:STREET_ADDRESS], [PII:CITY], [PII:POSTCODE].
Files
model.safetensors: full checkpointonnx/model_quantized.onnx: recommended CPU artifactinference_mask.py: full-checkpoint inferenceinference_mask_onnx.py: ONNX q8 inferenceeval/benchmark_summary.json: machine-readable benchmark summarytraining_sources.json: data provenance
Limitations
- This release is Irish-first. Multilingual overall precision is still pulled down by extra name detections outside the primary Irish target domain.
- The decoder deliberately prefers recall on structured Irish identifiers and contextual Irish masking. If you need a stricter non-Irish name policy, test on your own corpora before promoting beyond
rc.
Portfolio Comparison
Updated: 2026-03-16.
Use this section for the fastest public comparison across the temsa PII masking portfolio.
- The first core table only includes public checkpoints that ship both comparable q8 accuracy and q8 CPU throughput.
- The first PPSN table only includes public artifacts that ship comparable PPSN accuracy and CPU throughput.
- Missing cells in the archive tables mean the older release did not ship that metric in its public bundle.
- DiffMask rows use the reconciled
clean_single_passharness that matches the deployed runtime. - GlobalPointer rows use the public raw-only span-matrix release bundle and its packaged q8 ONNX artifact.
- The same content is shipped as
PORTFOLIO_COMPARISON.mdinside each public model repo.
Irish Core PII: Comparable Public Checkpoints
| Repo | Stack | Full Core F1 | Q8 Core F1 | Q8 Multilingual PPSN F1 | Q8 Core ex/s |
|---|---|---|---|---|---|
temsa/IrishCore-GlobalPointer-ContextPII-4L-122M-v1-rc6 |
4-layer GlobalPointer distilled fast student | 1.0000 | 1.0000 | 0.9333 | 282.9 |
temsa/IrishCore-GlobalPointer-ContextPII-4L-122M-v1-rc5 |
4-layer GlobalPointer distilled fast student | 1.0000 | 1.0000 | 0.9333 | 282.9 |
temsa/IrishCore-GlobalPointer-ContextPII-4L-122M-v1-rc3 |
4-layer GlobalPointer distilled fast student | 1.0000 | 1.0000 | 0.9333 | 317.9 |
temsa/IrishCore-GlobalPointer-ContextPII-4L-122M-v1-rc2 |
4-layer GlobalPointer distilled fast student | 1.0000 | 1.0000 | 0.9333 | 292.5 |
temsa/IrishCore-GlobalPointer-ContextPII-4L-122M-v1-rc1 |
4-layer GlobalPointer distilled fast student | 1.0000 | 1.0000 | 0.9333 | 337.3 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc29 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 232.7 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc28 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 232.7 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc25 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 212.1 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc24 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 278.9 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc23 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 237.6 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc22 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 106.8 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc21 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 150.8 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc20 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 181.9 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc19 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 73.1 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc18 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 126.2 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc17 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 125.5 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc16 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 125.5 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc15 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 125.5 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc14 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 119.2 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc13 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 126.1 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc12 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 73.6 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc11 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 94.1 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc10 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 125.8 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc9 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 119.8 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc8 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 128.9 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc7 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 89.0 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc6 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 89.0 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc5 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 84.5 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc4 |
GlobalPointer raw-only + context labels | 0.9935 | 0.9935 | 0.9333 | 61.5 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc3 |
GlobalPointer raw-only + context labels | 0.9935 | 0.9935 | 0.9333 | 61.5 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc2 |
GlobalPointer raw-only + context labels | 0.9935 | 0.9935 | 0.9222 | 61.5 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc1 |
GlobalPointer raw-only + context labels | 0.9935 | 0.9935 | 0.9222 | 61.5 |
temsa/IrishCore-GlobalPointer-135M-v1-rc4 |
GlobalPointer raw-only span-matrix | 1.0000 | 1.0000 | 0.9333 | 221.6 |
temsa/IrishCore-GlobalPointer-135M-v1-rc3 |
GlobalPointer raw-only span-matrix | 1.0000 | 1.0000 | 0.9213 | 204.9 |
temsa/IrishCore-GlobalPointer-135M-v1-rc2 |
GlobalPointer raw-only span-matrix | 0.9934 | 0.9934 | 0.9326 | 231.2 |
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc8 |
Raw-only token-span | 0.9737 | 0.9737 | 0.9176 | 46.1 |
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc7 |
Hybrid classifier + generated scanner spec | 1.0000 | 0.9934 | 1.0000 | 30.0 |
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc6 |
Hybrid classifier + repair decoders | 1.0000 | 0.9934 | 1.0000 | 29.5 |
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc5 |
Hybrid classifier + repair decoders | 0.9737 | 0.9669 | 0.9333 | 34.4 |
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc4 |
Hybrid classifier + repair decoders | 0.9870 | 0.9740 | 0.9600 | 114.2 |
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc3 |
Hybrid classifier + repair decoders | 0.9806 | 0.9677 | 0.9333 | 44.9 |
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc2 |
Hybrid classifier + repair decoders | 0.9554 | 0.9615 | 0.7887 | 119.1 |
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v1 |
Hybrid classifier baseline | 0.9530 | 0.9333 | 0.9882 | 103.3 |
temsa/IrishCore-DiffMask-135M-v1-rc6 |
DiffMask token-span, scanner-free | 0.9801 | 0.9733 | 0.9274 | 130.3 |
temsa/IrishCore-DiffMask-135M-v1-rc5 |
DiffMask token-span, scanner-free | 0.9733 | 0.9733 | 0.9379 | 249.2 |
temsa/IrishCore-DiffMask-135M-v1-rc4 |
DiffMask token-span, scanner-free | 0.9733 | 0.9733 | 0.9371 | 29.5 |
temsa/IrishCore-DiffMask-135M-v1-rc3 |
DiffMask token-span, scanner-free | 0.9664 | 0.9664 | 0.9591 | 30.0 |
temsa/IrishCore-DiffMask-135M-v1-rc2 |
DiffMask token-span, scanner-free | 0.9664 | 0.9664 | 0.9212 | 247.1 |
temsa/IrishCore-DiffMask-135M-v1-rc1 |
DiffMask token-span, scanner-free | 0.9801 | 0.9934 | 0.9412 | 251.2 |
Irish Core PII: Other Public Checkpoints
| Repo | Stack | Full Core F1 | Q8 Core F1 | Q8 Multilingual PPSN F1 | Notes |
|---|---|---|---|---|---|
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc1 |
Hybrid classifier prototype | 0.9487 | โ | โ | Predates the public q8 artifact. |
Finance-boundary q8 F1 is 1.0000 for OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc6, OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc7, OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc8, and all public IrishCore-DiffMask releases from rc1 to rc6. OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc5 ships 0.8750 on that public q8 suite.
PPSN-Only: Comparable Public Artifacts
| Repo | Artifact | Irish Large F1 | Multilingual PPSN F1 | User Raw F1 | QA v8 F1 | CPU ex/s |
|---|---|---|---|---|---|---|
temsa/OpenMed-mLiteClinical-IrishPPSN-135M-v1 |
fp32 canonical checkpoint | 0.8979 | 0.9704 | 0.8000 | 0.7385 | 57.4 |
temsa/OpenMed-mLiteClinical-IrishPPSN-135M-v1-fp16 |
fp16 CPU/GPU artifact | โ | 0.9704 | 0.8000 | 0.7385 | 45.8 |
temsa/OpenMed-mLiteClinical-IrishPPSN-135M-v1-q8 |
dynamic int8 CPU artifact | โ | 0.9040 | โ | โ | 132.1 |
PPSN-Only: Historical Public Checkpoints
| Repo | Main Published Metrics | Notes |
|---|---|---|
temsa/OpenMed-PPSN-mLiteClinical-v1 |
same as canonical fp32 repo: multilingual 0.9704, user raw 0.8000 | Legacy alias; prefer temsa/OpenMed-mLiteClinical-IrishPPSN-135M-v1. |
temsa/OpenMed-PPSN-v6-raw-rc2 |
irish_reg_v5 0.8750; user_raw 0.8000; qa_v8 0.7385 | Raw PPSN-only research checkpoint; no packaged multilingual CPU benchmark row. |
temsa/OpenMed-PPSN-v5_1 |
irish_large_v2 raw 0.9285; qa_v6 hybrid strict 1.0000 | Hybrid PPSN-only checkpoint; predates the canonical multilingual suite packaging. |
temsa/OpenMed-PPSN-v5 |
irish_reg_v5 raw 0.8235; irish_reg_v5 hybrid strict 1.0000 | Hybrid PPSN-only checkpoint; predates the canonical multilingual suite packaging. |
temsa/OpenMed-PPSN-v4 |
synthetic non-PPSN drift check only | Predates the current PPSN eval suite; no packaged apples-to-apples multilingual CPU row. |
If you need the strongest current raw-only Irish core model, start with IrishCore-GlobalPointer-135M-v1-rc4. If you need the fastest CPU-first raw-only line, compare it against IrishCore-DiffMask-135M-v1-rc6. If you need a PPSN-only artifact, compare the canonical fp32, fp16, and q8 variants of OpenMed-mLiteClinical-IrishPPSN-135M-v1 directly in the table above.
- Downloads last month
- 223