Update README.md

085b0ee verified 30 days ago

7.24 kB

	---
	license: apache-2.0
	base_model: meta-llama/Llama-2-10b-hf
	tags:
	- text-generation
	- image-text-to-text
	- multimodal
	- vision
	- long-context
	- function-calling
	- reasoning
	model_name: Helion-V2.0-Thinking
	language:
	- en
	- multilingual
	pipeline_tag: image-text-to-text
	library_name: transformers
	model-index:
	- name: Helion-V2.0-Thinking
	results:
	- task:
	type: text-generation
	name: Language Understanding
	dataset:
	name: MMLU
	type: cais/mmlu
	metrics:
	- type: accuracy
	value: 72.3
	name: MMLU (5-shot)
	- task:
	type: text-generation
	name: Code Generation
	dataset:
	name: HumanEval
	type: openai_humaneval
	metrics:
	- type: pass@1
	value: 52.8
	name: HumanEval Pass@1
	---


	# Helion-V2.0-Thinking

	<div align="center">

	<img src="https://imgur.com/QWzVuIQ.png" alt="Helion-V2 Logo" width="100%"/>

	</div>

	---

	Advanced 10.2B parameter multimodal language model with 200K context, native vision, and tool use capabilities.

	## Key Features

	- 200K Token Context Window - Process entire books and codebases
	- Native Vision Understanding - Analyze images, charts, documents, and diagrams
	- Function Calling & Tool Use - Structured outputs and API integration
	- Strong Reasoning - Excellent performance on math, code, and logic tasks
	- Multilingual Support - 12+ languages with strong performance
	- Production-Ready Safety - Comprehensive content filtering and guardrails

	## Quick Start

	```python
	from transformers import AutoModelForCausalLM, AutoProcessor
	from PIL import Image

	model = AutoModelForCausalLM.from_pretrained(
	"DeepXR/Helion-V2.0-Thinking",
	torch_dtype="auto",
	device_map="auto"
	)
	processor = AutoProcessor.from_pretrained("DeepXR/Helion-V2.0-Thinking")

	# Text generation
	prompt = "Explain quantum computing in simple terms:"
	inputs = processor(text=prompt, return_tensors="pt").to(model.device)
	outputs = model.generate(**inputs, max_new_tokens=256)
	print(processor.decode(outputs[0], skip_special_tokens=True))

	# Image understanding
	image = Image.open("photo.jpg")
	inputs = processor(text="What's in this image?", images=image, return_tensors="pt")
	outputs = model.generate(**inputs, max_new_tokens=256)
	print(processor.decode(outputs[0], skip_special_tokens=True))
	```

	## Benchmarks

	### Language Understanding

	\| Benchmark \| Helion-V2.0 \| Helion-V2.0-Thinking \| Improvement \|
	\|-----------\|-------------\|---------------------\|-------------\|
	\| MMLU (5-shot) \| 64.2% \| 72.3% \| +12.6% \|
	\| HellaSwag (10-shot) \| 80.5% \| 84.8% \| +5.3% \|
	\| ARC-Challenge (25-shot) \| 58.3% \| 68.7% \| +17.8% \|
	\| TruthfulQA MC2 \| 52.1% \| 58.4% \| +12.1% \|
	\| GSM8K (8-shot) \| 68.7% \| 72.1% \| +4.9% \|
	\| HumanEval (0-shot) \| 48.2% \| 52.8% \| +9.5% \|

	### Vision & Multimodal

	\| Benchmark \| Score \| Notes \|
	\|-----------\|-------\|-------\|
	\| VQA v2 \| 78.9% \| Visual question answering \|
	\| TextVQA \| 72.4% \| Text in images \|
	\| ChartQA \| 76.8% \| Chart understanding \|
	\| DocVQA \| 84.3% \| Document analysis \|
	\| AI2D \| 78.2% \| Scientific diagrams \|

	### Tool Use & Function Calling

	\| Benchmark \| Score \|
	\|-----------\|-------\|
	\| Berkeley Function Calling \| 89.7% \|
	\| API-Bank \| 86.4% \|
	\| JSON Schema Adherence \| 94.8% \|

	## Model Details

	- Architecture: LLaVA (Llama-2 + SigLIP vision encoder)
	- Parameters: 10.2B (text: 10.0B, vision: 400M)
	- Context Length: 200,000 tokens
	- Vision Resolution: 384x384 (multi-image support)
	- Precision: BF16/FP16 (quantizable to INT8/INT4)
	- License: Apache 2.0

	## Hardware Requirements

	\| Configuration \| VRAM \| Performance \|
	\|--------------\|------\|-------------\|
	\| BF16 \| 24GB \| 42 tok/s (RTX 4090) \|
	\| INT8 \| 16GB \| 67 tok/s (RTX 4080) \|
	\| INT4 \| 12GB \| 89 tok/s (RTX 4070) \|

	## Use Cases

	- Conversational AI - Multi-turn dialogue with long memory
	- Document Analysis - Process reports, contracts, research papers
	- Code Generation - Write, debug, and explain code
	- Visual Understanding - Analyze images, charts, screenshots
	- Data Analysis - Interpret data and create insights
	- Content Creation - Articles, stories, marketing copy
	- RAG Systems - Retrieval-augmented generation
	- Tool Integration - Function calling and API workflows

	## Installation

	```bash
	pip install transformers torch accelerate pillow
	```

	### With Quantization

	```python
	from transformers import BitsAndBytesConfig

	# 8-bit (16GB VRAM)
	config = BitsAndBytesConfig(load_in_8bit=True)

	# 4-bit (12GB VRAM)
	config = BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_compute_dtype=torch.bfloat16,
	bnb_4bit_quant_type="nf4"
	)

	model = AutoModelForCausalLM.from_pretrained(
	"DeepXR/Helion-V2.0-Thinking",
	quantization_config=config,
	device_map="auto"
	)
	```

	## Advanced Features

	### Function Calling

	```python
	import json

	tools = [{
	"name": "calculator",
	"description": "Perform calculations",
	"parameters": {"expression": {"type": "string"}}
	}]

	prompt = f"Available tools: {json.dumps(tools)}\n\nUser: What is 127 * 89?\nAssistant:"
	inputs = processor(text=prompt, return_tensors="pt")
	outputs = model.generate(**inputs, max_new_tokens=128, temperature=0.2)
	```

	### Long Context (200K)

	```python
	# Process entire documents
	with open("long_document.txt") as f:
	document = f.read() # Up to 200K tokens

	prompt = f"{document}\n\nSummarize the key points:"
	inputs = processor(text=prompt, return_tensors="pt")
	outputs = model.generate(**inputs, max_new_tokens=1024)
	```

	### Multi-Image Analysis

	```python
	images = [Image.open(f"image{i}.jpg") for i in range(3)]
	prompt = "Compare these images and describe the differences:"
	inputs = processor(text=prompt, images=images, return_tensors="pt")
	outputs = model.generate(**inputs, max_new_tokens=512)
	```

	## Safety Features

	Built-in safety guardrails including:
	- Content filtering for harmful outputs
	- PII detection and redaction
	- Rate limiting capabilities
	- Toxicity detection
	- Appropriate refusal behavior

	See `safety_wrapper.py` for production deployment.

	## Limitations

	- Primarily optimized for English (good multilingual support)
	- Vision works best with clear, well-lit images
	- Very long contexts (150K+) require substantial VRAM
	- May occasionally generate incorrect information
	- Not suitable for medical/legal advice without human review

	## Files Included

	- `inference.py` - Full inference script with examples
	- `safety_wrapper.py` - Production safety wrapper
	- `evaluate.py` - Comprehensive evaluation suite
	- `benchmark.py` - Performance benchmarking
	- `QUICKSTART.md` - Quick start guide
	- `USE_CASES.md` - Detailed use case examples
	- `safety_config.json` - Safety configuration
	- `requirements.txt` - Dependencies
	- `Dockerfile` - Container deployment

	## Citation

	```bibtex
	@misc{helion-v2-thinking-2025,
	title={Helion-V2.0-Thinking: A 10.2B Multimodal Language Model},
	author={DeepXR},
	year={2025},
	publisher={Hugging Face},
	url={https://huggingface.co/DeepXR/Helion-V2.0-Thinking}
	}
	```

	## License

	Apache 2.0 - See LICENSE file for details.

	## Acknowledgments

	Built with Transformers, trained on diverse open datasets. Thanks to the open-source AI community.