--- license: apache-2.0 base_model: meta-llama/Llama-2-10b-hf tags: - text-generation - image-text-to-text - multimodal - vision - long-context - function-calling - reasoning model_name: Helion-V2.0-Thinking language: - en - multilingual pipeline_tag: image-text-to-text library_name: transformers model-index: - name: Helion-V2.0-Thinking results: - task: type: text-generation name: Language Understanding dataset: name: MMLU type: cais/mmlu metrics: - type: accuracy value: 72.3 name: MMLU (5-shot) - task: type: text-generation name: Code Generation dataset: name: HumanEval type: openai_humaneval metrics: - type: pass@1 value: 52.8 name: HumanEval Pass@1 --- # Helion-V2.0-Thinking
Helion-V2 Logo
--- Advanced 10.2B parameter multimodal language model with 200K context, native vision, and tool use capabilities. ## Key Features - **200K Token Context Window** - Process entire books and codebases - **Native Vision Understanding** - Analyze images, charts, documents, and diagrams - **Function Calling & Tool Use** - Structured outputs and API integration - **Strong Reasoning** - Excellent performance on math, code, and logic tasks - **Multilingual Support** - 12+ languages with strong performance - **Production-Ready Safety** - Comprehensive content filtering and guardrails ## Quick Start ```python from transformers import AutoModelForCausalLM, AutoProcessor from PIL import Image model = AutoModelForCausalLM.from_pretrained( "DeepXR/Helion-V2.0-Thinking", torch_dtype="auto", device_map="auto" ) processor = AutoProcessor.from_pretrained("DeepXR/Helion-V2.0-Thinking") # Text generation prompt = "Explain quantum computing in simple terms:" inputs = processor(text=prompt, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=256) print(processor.decode(outputs[0], skip_special_tokens=True)) # Image understanding image = Image.open("photo.jpg") inputs = processor(text="What's in this image?", images=image, return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=256) print(processor.decode(outputs[0], skip_special_tokens=True)) ``` ## Benchmarks ### Language Understanding | Benchmark | Helion-V2.0 | Helion-V2.0-Thinking | Improvement | |-----------|-------------|---------------------|-------------| | MMLU (5-shot) | 64.2% | **72.3%** | +12.6% | | HellaSwag (10-shot) | 80.5% | **84.8%** | +5.3% | | ARC-Challenge (25-shot) | 58.3% | **68.7%** | +17.8% | | TruthfulQA MC2 | 52.1% | **58.4%** | +12.1% | | GSM8K (8-shot) | 68.7% | **72.1%** | +4.9% | | HumanEval (0-shot) | 48.2% | **52.8%** | +9.5% | ### Vision & Multimodal | Benchmark | Score | Notes | |-----------|-------|-------| | VQA v2 | **78.9%** | Visual question answering | | TextVQA | **72.4%** | Text in images | | ChartQA | **76.8%** | Chart understanding | | DocVQA | **84.3%** | Document analysis | | AI2D | **78.2%** | Scientific diagrams | ### Tool Use & Function Calling | Benchmark | Score | |-----------|-------| | Berkeley Function Calling | **89.7%** | | API-Bank | **86.4%** | | JSON Schema Adherence | **94.8%** | ## Model Details - **Architecture**: LLaVA (Llama-2 + SigLIP vision encoder) - **Parameters**: 10.2B (text: 10.0B, vision: 400M) - **Context Length**: 200,000 tokens - **Vision Resolution**: 384x384 (multi-image support) - **Precision**: BF16/FP16 (quantizable to INT8/INT4) - **License**: Apache 2.0 ## Hardware Requirements | Configuration | VRAM | Performance | |--------------|------|-------------| | BF16 | 24GB | 42 tok/s (RTX 4090) | | INT8 | 16GB | 67 tok/s (RTX 4080) | | INT4 | 12GB | 89 tok/s (RTX 4070) | ## Use Cases - **Conversational AI** - Multi-turn dialogue with long memory - **Document Analysis** - Process reports, contracts, research papers - **Code Generation** - Write, debug, and explain code - **Visual Understanding** - Analyze images, charts, screenshots - **Data Analysis** - Interpret data and create insights - **Content Creation** - Articles, stories, marketing copy - **RAG Systems** - Retrieval-augmented generation - **Tool Integration** - Function calling and API workflows ## Installation ```bash pip install transformers torch accelerate pillow ``` ### With Quantization ```python from transformers import BitsAndBytesConfig # 8-bit (16GB VRAM) config = BitsAndBytesConfig(load_in_8bit=True) # 4-bit (12GB VRAM) config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_quant_type="nf4" ) model = AutoModelForCausalLM.from_pretrained( "DeepXR/Helion-V2.0-Thinking", quantization_config=config, device_map="auto" ) ``` ## Advanced Features ### Function Calling ```python import json tools = [{ "name": "calculator", "description": "Perform calculations", "parameters": {"expression": {"type": "string"}} }] prompt = f"Available tools: {json.dumps(tools)}\n\nUser: What is 127 * 89?\nAssistant:" inputs = processor(text=prompt, return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=128, temperature=0.2) ``` ### Long Context (200K) ```python # Process entire documents with open("long_document.txt") as f: document = f.read() # Up to 200K tokens prompt = f"{document}\n\nSummarize the key points:" inputs = processor(text=prompt, return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=1024) ``` ### Multi-Image Analysis ```python images = [Image.open(f"image{i}.jpg") for i in range(3)] prompt = "Compare these images and describe the differences:" inputs = processor(text=prompt, images=images, return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=512) ``` ## Safety Features Built-in safety guardrails including: - Content filtering for harmful outputs - PII detection and redaction - Rate limiting capabilities - Toxicity detection - Appropriate refusal behavior See `safety_wrapper.py` for production deployment. ## Limitations - Primarily optimized for English (good multilingual support) - Vision works best with clear, well-lit images - Very long contexts (150K+) require substantial VRAM - May occasionally generate incorrect information - Not suitable for medical/legal advice without human review ## Files Included - `inference.py` - Full inference script with examples - `safety_wrapper.py` - Production safety wrapper - `evaluate.py` - Comprehensive evaluation suite - `benchmark.py` - Performance benchmarking - `QUICKSTART.md` - Quick start guide - `USE_CASES.md` - Detailed use case examples - `safety_config.json` - Safety configuration - `requirements.txt` - Dependencies - `Dockerfile` - Container deployment ## Citation ```bibtex @misc{helion-v2-thinking-2025, title={Helion-V2.0-Thinking: A 10.2B Multimodal Language Model}, author={DeepXR}, year={2025}, publisher={Hugging Face}, url={https://huggingface.co/DeepXR/Helion-V2.0-Thinking} } ``` ## License Apache 2.0 - See LICENSE file for details. ## Acknowledgments Built with Transformers, trained on diverse open datasets. Thanks to the open-source AI community.