IrishCore-GlobalPointer-ContextPII-135M-v1-rc8

IrishCore-GlobalPointer-ContextPII-135M-v1-rc8 is the current expanded-label raw-only PII masking release for Irish public-sector, HSE, and citizen-support flows.

It keeps the same DistilBERT-size GlobalPointer span extractor family as temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc7, but ships a stronger bundled decoder for contextual DOB, phone, and address boundary recovery.

Context labels served by this line:

  • STREET_ADDRESS
  • CITY
  • COUNTY
  • DATE_OF_BIRTH
  • AGE

Core labels retained:

  • PPSN
  • POSTCODE
  • PHONE_NUMBER
  • EMAIL
  • PASSPORT_NUMBER
  • ACCOUNT_NUMBER
  • BANK_ROUTING_NUMBER
  • SWIFT_BIC
  • CREDIT_DEBIT_CARD
  • FIRST_NAME
  • LAST_NAME

Positioning

rc8 is a decoder-hardening release over rc7.

  • weights unchanged
  • ONNX graph unchanged
  • no external scanner or validator added
  • deployment path still single-pass span extraction plus deterministic [PII:LABEL] replacement

What changed in rc8:

  • the decoder now recovers date-of-birth spans from phrasing like born 14/03/1991
  • the decoder now repairs full Irish phone spans like +353 (0)87 123 4567 and +353 (0)1 671 1633
  • the decoder now preserves longer cue-led street-address spans such as Apartment 4B, 12 Main Street and Teach na Trรก, 7 Bรณthar na Trรก
  • the decoder now recovers additional prefixed Gaelic city forms such as hUaimh
  • the new globalpointer_context_redteam_v2 suite is 1.0000 on both full and q8 paths and captures the exact contextual regressions fixed in this release
  • the new globalpointer_location_coverage_v3 suite remains 1.0000 on both full and q8 paths

Use this release when you need broader masking for Irish gov / HSE / citizen-support text, including user turns and assistant answers that contain:

  • personal address fragments
  • city / county
  • date of birth
  • age
  • official callback numbers or public-service mailbox emails that still need masking in assistant output

If you only need the narrower Irish-core structured label set and want maximum CPU throughput, temsa/IrishCore-GlobalPointer-135M-v1-rc4 remains the faster option.

Architecture

  • base encoder: OpenMed/OpenMed-PII-mLiteClinical-Base-135M-v1
  • extractor head: GlobalPointer-style typed span matrix
  • runtime: single-pass span extraction
  • output policy: deterministic bracket masking, for example [PII:PPSN]
  • deployment target: ONNX Runtime CPU with dynamic q8 per-channel quantization

This is not a generative model and does not rewrite text. It predicts typed spans and the provided inference scripts replace them with [PII:LABEL] placeholders.

Decoder Policy

The serving path is still raw-only in the sense that there is no external scanner or validator service. The repo does include bundled decoder repairs for highly structured or contextual spans:

  • PPSN normalization and overlap repair
  • passport cue-based repair
  • contextual date-of-birth repair
  • contextual phone recovery, including Fรณn: cues and +353 (0) formats
  • full-email recovery
  • Eircode recovery
  • contextual street-address recovery, including cue-led suffixless address blocks and longer apartment / house-name spans
  • explicit county phrase recovery with county-over-name overlap cleanup
  • prefixed Irish city-form recovery

These repairs live inside common.py and are part of the published inference path.

Benchmarks

ONNX q8

Suite F1 Examples/s
Irish core 1.0000 128.8825
Irish extended 1.0000 77.2357
Demographic holdout v2 0.9996 97.4922
Gov contact policy v1 1.0000 58.7415
Gov chatbot red-team v2 0.9861 77.4549
Gov chatbot gap holdout v2 1.0000 78.1248
Context red-team v1 1.0000 77.1930
Context red-team v2 1.0000 71.8467
Location coverage v1 1.0000 106.2670
Location coverage v2 1.0000 109.5137
Location coverage v3 1.0000 114.7391
Numeric qafix v2 1.0000 150.9314
Multilingual PPSN overall 0.9333 137.9681
Multilingual PPSN label-only 1.0000 โ€”

Full checkpoint

Suite F1 Examples/s
Irish core 1.0000 35.0576
Irish extended 1.0000 24.1513
Demographic holdout v2 0.9996 56.7818
Gov contact policy v1 1.0000 21.4863
Gov chatbot red-team v2 0.9861 32.1788
Gov chatbot gap holdout v2 1.0000 51.5550
Context red-team v1 1.0000 22.9530
Context red-team v2 1.0000 13.1421
Location coverage v1 1.0000 28.4516
Location coverage v2 1.0000 33.2787
Location coverage v3 1.0000 36.8150
Numeric qafix v2 1.0000 29.9216
Multilingual PPSN overall 0.9333 56.2667
Multilingual PPSN label-only 1.0000 โ€”

Comparison

Model Core F1 Gov Contact Policy v1 F1 Context Red-team v2 F1 Multilingual F1 Core examples/s
ContextPII rc8 q8 1.0000 1.0000 1.0000 0.9333 128.8825
ContextPII rc7 q8 1.0000 1.0000 0.7500 0.9333 88.9982
ContextPII rc6 q8 1.0000 1.0000 โ€” 0.9333 88.9982
GlobalPointer rc4 q8 1.0000 0.7843 โ€” 0.9333 221.5743
DiffMask rc6 q8 0.9733 โ€” โ€” 0.9274 130.3415

Tradeoff:

  • this expanded-label line is materially better on contextual Irish masking tasks, including DOB, phone, address-boundary, and prefixed Gaelic city repairs
  • it is slower than the core-only GlobalPointer line on CPU because it carries a broader label inventory and more decoder work

Evaluation Notes

Additional q8 release checks shipped in this repo:

  • eval/q8_irish_numeric_qafix_v2.json: numeric false-positive guardrail suite, 1.0000 F1
  • eval/q8_globalpointer_context_redteam_v1.json: contextual hardening suite for apartment-prefix street addresses, explicit County/Contae forms, and public-office address blocks, 1.0000 F1
  • eval/q8_globalpointer_context_redteam_v2.json: contextual regression suite for born DOB phrasing, +353 (0) phone formatting, longer apartment / house-name address spans, and hUaimh, 1.0000 F1 in rc8 versus 0.7500 in the public rc7 bundle
  • eval/q8_globalpointer_location_coverage_v1.json: broader Irish city/county/address coverage suite, 1.0000 F1
  • eval/q8_globalpointer_location_coverage_v2.json: prefixed-Gaelic city-form suite, 1.0000 F1
  • eval/q8_globalpointer_location_coverage_v3.json: additional prefixed-Gaelic city-form suite, 1.0000 F1

Other notes:

  • globalpointer_demographic_patch_v2_test is the corrected held-out benchmark. The earlier v1 demographic patch contained invalid synthetic Eircodes in some rows.
  • irish_gov_contact_policy_v1 is the policy-aligned assistant-output benchmark for this release.
  • globalpointer_context_redteam_v2 is the new contextual regression benchmark for DOB phrasing, +353 (0) phones, and longer cue-led address spans.
  • The legacy irish_gov_chatbot_redteam_v2 negatives still assume some public assistant contact details should not be masked. That assumption does not match this release's target policy.

Usage

Full checkpoint:

python3 inference_mask.py   --model temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc8   --text "The patient was born 14/03/1991 and can be reached on +353 (0)87 123 4567."

ONNX q8:

python3 inference_mask_onnx.py   --model temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc8   --text "Seoladh: Teach na Trรก, 7 Bรณthar na Trรก, nGaillimh, H91 F4E2."

Expected masking style:

Seoladh: [PII:STREET_ADDRESS], [PII:CITY], [PII:POSTCODE].

Files

  • model.safetensors: full checkpoint
  • onnx/model_quantized.onnx: recommended CPU artifact
  • inference_mask.py: full-checkpoint inference
  • inference_mask_onnx.py: ONNX q8 inference
  • eval/benchmark_summary.json: machine-readable benchmark summary
  • training_sources.json: data provenance

Limitations

  • This release is Irish-first. Multilingual overall precision is still pulled down by extra name detections outside the primary Irish target domain.
  • The decoder deliberately prefers recall on structured Irish identifiers and contextual Irish masking. If you need a stricter non-Irish name policy, test on your own corpora before promoting beyond rc.

Portfolio Comparison

Updated: 2026-03-16.

Use this section for the fastest public comparison across the temsa PII masking portfolio.

  • The first core table only includes public checkpoints that ship both comparable q8 accuracy and q8 CPU throughput.
  • The first PPSN table only includes public artifacts that ship comparable PPSN accuracy and CPU throughput.
  • Missing cells in the archive tables mean the older release did not ship that metric in its public bundle.
  • DiffMask rows use the reconciled clean_single_pass harness that matches the deployed runtime.
  • GlobalPointer rows use the public raw-only span-matrix release bundle and its packaged q8 ONNX artifact.
  • The same content is shipped as PORTFOLIO_COMPARISON.md inside each public model repo.

Irish Core PII: Comparable Public Checkpoints

Repo Stack Full Core F1 Q8 Core F1 Q8 Multilingual PPSN F1 Q8 Core ex/s
temsa/IrishCore-GlobalPointer-ContextPII-4L-122M-v1-rc6 4-layer GlobalPointer distilled fast student 1.0000 1.0000 0.9333 282.9
temsa/IrishCore-GlobalPointer-ContextPII-4L-122M-v1-rc5 4-layer GlobalPointer distilled fast student 1.0000 1.0000 0.9333 282.9
temsa/IrishCore-GlobalPointer-ContextPII-4L-122M-v1-rc3 4-layer GlobalPointer distilled fast student 1.0000 1.0000 0.9333 317.9
temsa/IrishCore-GlobalPointer-ContextPII-4L-122M-v1-rc2 4-layer GlobalPointer distilled fast student 1.0000 1.0000 0.9333 292.5
temsa/IrishCore-GlobalPointer-ContextPII-4L-122M-v1-rc1 4-layer GlobalPointer distilled fast student 1.0000 1.0000 0.9333 337.3
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc29 GlobalPointer raw-only + context labels 1.0000 1.0000 0.9333 232.7
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc28 GlobalPointer raw-only + context labels 1.0000 1.0000 0.9333 232.7
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc25 GlobalPointer raw-only + context labels 1.0000 1.0000 0.9333 212.1
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc24 GlobalPointer raw-only + context labels 1.0000 1.0000 0.9333 278.9
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc23 GlobalPointer raw-only + context labels 1.0000 1.0000 0.9333 237.6
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc22 GlobalPointer raw-only + context labels 1.0000 1.0000 0.9333 106.8
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc21 GlobalPointer raw-only + context labels 1.0000 1.0000 0.9333 150.8
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc20 GlobalPointer raw-only + context labels 1.0000 1.0000 0.9333 181.9
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc19 GlobalPointer raw-only + context labels 1.0000 1.0000 0.9333 73.1
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc18 GlobalPointer raw-only + context labels 1.0000 1.0000 0.9333 126.2
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc17 GlobalPointer raw-only + context labels 1.0000 1.0000 0.9333 125.5
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc16 GlobalPointer raw-only + context labels 1.0000 1.0000 0.9333 125.5
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc15 GlobalPointer raw-only + context labels 1.0000 1.0000 0.9333 125.5
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc14 GlobalPointer raw-only + context labels 1.0000 1.0000 0.9333 119.2
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc13 GlobalPointer raw-only + context labels 1.0000 1.0000 0.9333 126.1
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc12 GlobalPointer raw-only + context labels 1.0000 1.0000 0.9333 73.6
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc11 GlobalPointer raw-only + context labels 1.0000 1.0000 0.9333 94.1
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc10 GlobalPointer raw-only + context labels 1.0000 1.0000 0.9333 125.8
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc9 GlobalPointer raw-only + context labels 1.0000 1.0000 0.9333 119.8
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc8 GlobalPointer raw-only + context labels 1.0000 1.0000 0.9333 128.9
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc7 GlobalPointer raw-only + context labels 1.0000 1.0000 0.9333 89.0
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc6 GlobalPointer raw-only + context labels 1.0000 1.0000 0.9333 89.0
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc5 GlobalPointer raw-only + context labels 1.0000 1.0000 0.9333 84.5
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc4 GlobalPointer raw-only + context labels 0.9935 0.9935 0.9333 61.5
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc3 GlobalPointer raw-only + context labels 0.9935 0.9935 0.9333 61.5
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc2 GlobalPointer raw-only + context labels 0.9935 0.9935 0.9222 61.5
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc1 GlobalPointer raw-only + context labels 0.9935 0.9935 0.9222 61.5
temsa/IrishCore-GlobalPointer-135M-v1-rc4 GlobalPointer raw-only span-matrix 1.0000 1.0000 0.9333 221.6
temsa/IrishCore-GlobalPointer-135M-v1-rc3 GlobalPointer raw-only span-matrix 1.0000 1.0000 0.9213 204.9
temsa/IrishCore-GlobalPointer-135M-v1-rc2 GlobalPointer raw-only span-matrix 0.9934 0.9934 0.9326 231.2
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc8 Raw-only token-span 0.9737 0.9737 0.9176 46.1
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc7 Hybrid classifier + generated scanner spec 1.0000 0.9934 1.0000 30.0
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc6 Hybrid classifier + repair decoders 1.0000 0.9934 1.0000 29.5
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc5 Hybrid classifier + repair decoders 0.9737 0.9669 0.9333 34.4
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc4 Hybrid classifier + repair decoders 0.9870 0.9740 0.9600 114.2
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc3 Hybrid classifier + repair decoders 0.9806 0.9677 0.9333 44.9
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc2 Hybrid classifier + repair decoders 0.9554 0.9615 0.7887 119.1
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v1 Hybrid classifier baseline 0.9530 0.9333 0.9882 103.3
temsa/IrishCore-DiffMask-135M-v1-rc6 DiffMask token-span, scanner-free 0.9801 0.9733 0.9274 130.3
temsa/IrishCore-DiffMask-135M-v1-rc5 DiffMask token-span, scanner-free 0.9733 0.9733 0.9379 249.2
temsa/IrishCore-DiffMask-135M-v1-rc4 DiffMask token-span, scanner-free 0.9733 0.9733 0.9371 29.5
temsa/IrishCore-DiffMask-135M-v1-rc3 DiffMask token-span, scanner-free 0.9664 0.9664 0.9591 30.0
temsa/IrishCore-DiffMask-135M-v1-rc2 DiffMask token-span, scanner-free 0.9664 0.9664 0.9212 247.1
temsa/IrishCore-DiffMask-135M-v1-rc1 DiffMask token-span, scanner-free 0.9801 0.9934 0.9412 251.2

Irish Core PII: Other Public Checkpoints

Repo Stack Full Core F1 Q8 Core F1 Q8 Multilingual PPSN F1 Notes
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc1 Hybrid classifier prototype 0.9487 โ€” โ€” Predates the public q8 artifact.

Finance-boundary q8 F1 is 1.0000 for OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc6, OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc7, OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc8, and all public IrishCore-DiffMask releases from rc1 to rc6. OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc5 ships 0.8750 on that public q8 suite.

PPSN-Only: Comparable Public Artifacts

Repo Artifact Irish Large F1 Multilingual PPSN F1 User Raw F1 QA v8 F1 CPU ex/s
temsa/OpenMed-mLiteClinical-IrishPPSN-135M-v1 fp32 canonical checkpoint 0.8979 0.9704 0.8000 0.7385 57.4
temsa/OpenMed-mLiteClinical-IrishPPSN-135M-v1-fp16 fp16 CPU/GPU artifact โ€” 0.9704 0.8000 0.7385 45.8
temsa/OpenMed-mLiteClinical-IrishPPSN-135M-v1-q8 dynamic int8 CPU artifact โ€” 0.9040 โ€” โ€” 132.1

PPSN-Only: Historical Public Checkpoints

Repo Main Published Metrics Notes
temsa/OpenMed-PPSN-mLiteClinical-v1 same as canonical fp32 repo: multilingual 0.9704, user raw 0.8000 Legacy alias; prefer temsa/OpenMed-mLiteClinical-IrishPPSN-135M-v1.
temsa/OpenMed-PPSN-v6-raw-rc2 irish_reg_v5 0.8750; user_raw 0.8000; qa_v8 0.7385 Raw PPSN-only research checkpoint; no packaged multilingual CPU benchmark row.
temsa/OpenMed-PPSN-v5_1 irish_large_v2 raw 0.9285; qa_v6 hybrid strict 1.0000 Hybrid PPSN-only checkpoint; predates the canonical multilingual suite packaging.
temsa/OpenMed-PPSN-v5 irish_reg_v5 raw 0.8235; irish_reg_v5 hybrid strict 1.0000 Hybrid PPSN-only checkpoint; predates the canonical multilingual suite packaging.
temsa/OpenMed-PPSN-v4 synthetic non-PPSN drift check only Predates the current PPSN eval suite; no packaged apples-to-apples multilingual CPU row.

If you need the strongest current raw-only Irish core model, start with IrishCore-GlobalPointer-135M-v1-rc4. If you need the fastest CPU-first raw-only line, compare it against IrishCore-DiffMask-135M-v1-rc6. If you need a PPSN-only artifact, compare the canonical fp32, fp16, and q8 variants of OpenMed-mLiteClinical-IrishPPSN-135M-v1 directly in the table above.

Downloads last month
223
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support