HaluGate Sentinel — Prompt Fact-Check Switch for Hallucination Gatekeeper
HaluGate Sentinel is a ModernBERT + LoRA classifier that decides whether an incoming user prompt requires factual verification.
It does not check facts itself. Instead, it acts as a frontline switch in an LLM routing / gateway system, deciding whether a request should enter a fact-checking / RAG / hallucination-mitigation pipeline.
The model classifies prompts into:
FACT_CHECK_NEEDED:
Information-seeking queries that depend on external/world knowledge- e.g., “When was the Eiffel Tower built?”
- e.g., “What is the GDP of Japan in 2023?”
NO_FACT_CHECK_NEEDED:
Creative, coding, opinion, or pure reasoning/math tasks- e.g., “Write a poem about spring”
- e.g., “Implement quicksort in Python”
- e.g., “What is the meaning of life?”
This model is part of the Hallucination Gatekeeper stack for llm-semantic-router.
Model Details
- Model name:
HaluGate Sentinel - Repository:
llm-semantic-router/halugate-sentinel - Task: Binary text classification (prompt-level)
- Labels:
0→NO_FACT_CHECK_NEEDED1→FACT_CHECK_NEEDED
- Base model:
answerdotai/ModernBERT-base - Fine-tuning method: LoRA (rank = 16, alpha = 32)
- Validation Accuracy: 96.4%
- Validation F1 Score: 0.965
- Edge-case accuracy: 100% on a 27-sample curated test set of borderline prompt types
Position in a Hallucination Mitigation Pipeline
HaluGate Sentinel is designed as Stage 0 in a multi-stage hallucination mitigation architecture:
Stage 0 — HaluGate Sentinel (this model)
Classifies user prompts and decides whether fact-checking is needed:NO_FACT_CHECK_NEEDED→ Route directly to LLM generation.FACT_CHECK_NEEDED→ Route into the Hallucination Gatekeeper path (RAG, tools, verifiers).
Stage 1+ — Answer-level hallucination models (e.g., “HaluGate Verifier”)
Operate on (query, answer, evidence) to detect hallucinations and enforce trust policies.
HaluGate Sentinel focuses solely on prompt intent classification to minimize unnecessary compute while preserving safety for factual queries.
Usage
Basic Inference
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
MODEL_ID = "llm-semantic-router/halugate-sentinel"
model = AutoModelForSequenceClassification.from_pretrained(MODEL_ID)
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
id2label = model.config.id2label # {0: 'NO_FACT_CHECK_NEEDED', 1: 'FACT_CHECK_NEEDED'}
def classify_prompt(text: str):
inputs = tokenizer(
text,
return_tensors="pt",
truncation=True,
max_length=512,
)
with torch.no_grad():
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=-1)[0]
pred_id = int(torch.argmax(probs).item())
label = id2label.get(pred_id, str(pred_id))
confidence = float(probs[pred_id].item())
return label, confidence
# Examples
print(classify_prompt("When was the Eiffel Tower built?"))
# → ('FACT_CHECK_NEEDED', 0.99...)
print(classify_prompt("Write a poem about spring"))
# → ('NO_FACT_CHECK_NEEDED', 0.98...)
print(classify_prompt("Implement a binary search in Python"))
# → ('NO_FACT_CHECK_NEEDED', 0.97...)
Example: Integrating with a Router / Gateway
Pseudocode for a routing decision:
label, prob = classify_prompt(user_prompt)
FACT_CHECK_THRESHOLD = 0.6 # configurable based on your risk appetite
if label == "FACT_CHECK_NEEDED" and prob >= FACT_CHECK_THRESHOLD:
route = "hallucination_gatekeeper" # RAG / tools / verifiers
else:
route = "direct_generation"
# Use `route` to select downstream pipelines in your LLM gateway.
Training Data
Balanced dataset of 50,000 prompts:
FACT_CHECK_NEEDED (25,000 samples)
Information-seeking and knowledge-intensive questions drawn from:
- NISQ-ISQ: Gold-standard information-seeking questions
- HaluEval: Hallucination-focused QA benchmark
- FaithDial: Information-seeking dialogue questions
- FactCHD: Fact-conflicting / hallucination-prone queries
- SQuAD, TriviaQA, HotpotQA: Standard factual QA datasets
- TruthfulQA: High-risk factual queries
- CoQA: Conversational factual questions
NO_FACT_CHECK_NEEDED (25,000 samples)
Tasks that typically do not require external factual verification:
- NISQ-NonISQ: Non-information-seeking questions
- Databricks Dolly: Creative writing, summarization, brainstorming
- WritingPrompts: Creative writing prompts
- Alpaca: Coding, math, opinion, and general instructions
The objective is to approximate “does this prompt require world knowledge / external facts?” rather than “is the answer true?”.
Intended Use
Primary Use Cases
LLM Gateway / Router
- Decide if a prompt must be routed into a fact-aware pipeline (RAG, tools, knowledge base, verifiers).
- Avoid unnecessary compute for creative / coding / opinion tasks.
Hallucination Gatekeeper Frontline
- Only enable expensive hallucination detection for prompts labeled
FACT_CHECK_NEEDED. - Implement different safety and latency policies for the two classes.
- Only enable expensive hallucination detection for prompts labeled
Traffic Analytics & Risk Scoring
- Monitor proportion of factual vs non-factual traffic.
- Adjust infrastructure sizing for retrieval / tool-heavy pipelines accordingly.
Non-Goals
- It does not verify the correctness of any answer.
- It should not be used as a generic toxicity / safety classifier.
- It does not handle non-English prompts reliably (trained on English only).
How It Works
Architecture:
- ModernBERT-base encoder
- Classification head on top of
[CLS]/ pooled representation
Fine-tuning:
- LoRA on the base encoder
- Binary cross-entropy / cross-entropy loss on the two labels
- Balanced sampling between FACT_CHECK_NEEDED and NO_FACT_CHECK_NEEDED
Decision Boundary:
- Borderline / philosophical / highly abstract questions may be assigned lower confidence.
- Downstream systems are encouraged to use the confidence score as a soft signal, not a hard oracle.
Limitations
Language:
- Trained on English data only.
- Performance on other languages is not guaranteed.
Borderline Queries:
- Philosophical or hybrid prompts (e.g. “Is time travel possible?”) may be ambiguous.
- In such cases, consider inspecting the model confidence and implementing a “default-to-safe” policy.
Domain Coverage:
- General-purpose factual tasks are well-covered; highly specialized verticals (e.g. niche scientific domains) are not explicitly targeted during fine-tuning.
Not a Verifier:
- This model only decides if a prompt needs factual support.
- Actual hallucination detection and answer verification must be handled by separate models (e.g., answer-level verifiers).
Ethical Considerations
Risk Trade-off:
- Over-classifying prompts as
NO_FACT_CHECK_NEEDEDmay reduce safety for borderline factual tasks. - Over-classifying as
FACT_CHECK_NEEDEDincreases compute cost but is safer in high-risk environments.
- Over-classifying prompts as
Deployment Recommendation:
- For safety-critical domains (finance, healthcare, legal, etc.), configure conservative thresholds and fallbacks that favor routing more traffic through the fact-checking path.
Citation
If you use HaluGate Sentinel in academic work or production systems, please cite:
@software{halugate_sentinel_2024,
title = {HaluGate Sentinel: Prompt-Level Fact-Check Switch for Hallucination Gatekeepers},
author = {vLLM Project},
year = {2024},
url = {https://github.com/vllm-project/semantic-router}
}
Acknowledgements
- Base encoder:
answerdotai/ModernBERT-base - Training datasets: SQuAD, TriviaQA, HotpotQA, TruthfulQA, CoQA, Dolly, Alpaca, WritingPrompts, HaluEval, and others listed above.
- Designed for integration with the vLLM Semantic Router and broader Hallucination Gatekeeper ecosystem.
- Downloads last month
- 23
Model tree for llm-semantic-router/halugate-sentinel
Base model
answerdotai/ModernBERT-baseDatasets used to train llm-semantic-router/halugate-sentinel
Space using llm-semantic-router/halugate-sentinel 1
Evaluation results
- Validation Accuracyself-reported0.964
- F1 Scoreself-reported0.965