GigaCheck-Detector-Multi
🌐 LLMTrace Website | 📜 LLMTrace Paper on arXiv | 🤗 LLMTrace - Detection Dataset | Github |
Model Card
Model Description
This is the official GigaCheck-Detector-Multi model from the LLMTrace project. It is a multilingual transformer-based model trained for AI interval detection. Its purpose is to identify and localize the specific spans of text within a document that were generated by an AI.
The model was trained jointly on the English and Russian portions of the LLMTrace Detection dataset, which includes human, fully AI, and mixed-authorship texts with character-level annotations.
For complete details on the training data, methodology, and evaluation, please refer to our research paper: link(coming soon)
Intended Use & Limitations
This model is intended for fine-grained analysis of documents, academic integrity tools, and research into human-AI collaboration.
Limitations:
- The model's performance may degrade on text generated by LLMs released after its training date (September 2025).
- It is not infallible and may miss some AI-generated spans or incorrectly flag human-written parts.
- The boundary predictions may not be perfectly precise in all cases.
Evaluation
The model was evaluated on the test split of the LLMTrace Detection dataset. The performance is measured using standard mean Average Precision (mAP) metrics for object detection, adapted for text spans.
| Metric | Value |
|---|---|
| mAP @ IoU=0.5 | 0.8976 |
| mAP @ IoU=0.5:0.95 | 0.7921 |
Quick start
Requirements:
- python3.11
- gigacheck
pip install git+https://github.com/ai-forever/gigacheck
Inference with transformers (with trust_remote_code=True)
from transformers import AutoModel
import torch
model_name = "iitolstykh/GigaCheck-Detector-Multi"
gigacheck_model = AutoModel.from_pretrained(
model_name, trust_remote_code=True, device_map="cuda:0", torch_dtype=torch.float32
)
text = "The critic's review of the recent publication was scathing. The book failed miserably in portraying the harmful subjective discourses associated with the hegemony of the political system."
output = gigacheck_model([text], conf_interval_thresh=0.5)
# [(start_char, end_char, score)]
print(output.ai_intervals)
Inference with gigacheck
from transformers import AutoConfig
from gigacheck.inference.src.mistral_detector import MistralDetector
import torch
model_name = "iitolstykh/GigaCheck-Detector-Multi"
config = AutoConfig.from_pretrained(model_name)
model = MistralDetector(
max_seq_len=config.max_length,
with_detr=config.with_detr,
id2label=config.id2label,
device="cpu" if not torch.cuda.is_available() else "cuda:0",
conf_interval_thresh=0.5,
).from_pretrained(model_name)
text = "The critic's review of the recent publication was scathing. The book failed miserably in portraying the harmful subjective discourses associated with the hegemony of the political system."
output = model.predict(text)
print(output)
Citation
If you use this model in your research, please cite our papers:
@article{Layer2025LLMTrace,
Title = {{LLMTrace: A Corpus for Classification and Fine-Grained Localization of AI-Written Text}},
Author = {Irina Tolstykh and Aleksandra Tsybina and Sergey Yakubson and Maksim Kuprashevich},
Year = {2025},
Eprint = {arXiv:2509.21269}
}
@article{tolstykh2024gigacheck,
title={{GigaCheck: Detecting LLM-generated Content}},
author={Irina Tolstykh and Aleksandra Tsybina and Sergey Yakubson and Aleksandr Gordeev and Vladimir Dokholyan and Maksim Kuprashevich},
journal={arXiv preprint arXiv:2410.23728},
year={2024}
}
- Downloads last month
- 220
Model tree for iitolstykh/GigaCheck-Detector-Multi
Base model
mistralai/Mistral-7B-v0.3