DeepXR
/

Helion-V2.0-Thinking

+# Helion-V2.0-Thinking Quickstart Guide
+Get started with Helion-V2.0-Thinking in minutes.
+## Installation
+### Basic Installation
+```bash
+pip install transformers torch accelerate pillow requests
+```
+### Full Installation (with all features)
+```bash
+pip install -r requirements.txt
+```
+### GPU Requirements
+- **Minimum**: 24GB VRAM (RTX 4090, A5000)
+- **Recommended**: 40GB+ VRAM (A100, H100)
+- **Quantized (8-bit)**: 16GB VRAM
+- **Quantized (4-bit)**: 12GB VRAM
+## Quick Examples
+### 1. Basic Text Generation
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "DeepXR/Helion-V2.0-Thinking"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype="auto",
+    device_map="auto"
+)
+prompt = "What is artificial intelligence?"
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+outputs = model.generate(**inputs, max_new_tokens=256)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+### 2. Image Understanding
+```python
+from transformers import AutoModelForCausalLM, AutoProcessor
+from PIL import Image
+processor = AutoProcessor.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype="auto",
+    device_map="auto"
+)
+image = Image.open("photo.jpg")
+prompt = "What is in this image?"
+inputs = processor(text=prompt, images=image, return_tensors="pt").to(model.device)
+outputs = model.generate(**inputs, max_new_tokens=256)
+print(processor.decode(outputs[0], skip_special_tokens=True))
+```
+### 3. Using the Inference Script
+```bash
+# Interactive chat mode
+python inference.py --interactive
+# With image analysis
+python inference.py --image photo.jpg --prompt "Describe this image"
+# Run demos
+python inference.py --demo
+# With quantization (saves memory)
+python inference.py --interactive --load-in-4bit
+```
+### 4. With Safety Wrapper
+```python
+from safety_wrapper import SafeHelionWrapper
+# Initialize with safety features
+wrapper = SafeHelionWrapper(
+    model_name="DeepXR/Helion-V2.0-Thinking",
+    enable_safety=True,
+    enable_rate_limiting=True
+)
+# Safe generation
+response = wrapper.generate(
+    prompt="Explain photosynthesis",
+    max_new_tokens=256
+)
+print(response)
+```
+### 5. Function Calling
+```python
+import json
+tools = [{
+    "name": "calculator",
+    "description": "Perform calculations",
+    "parameters": {
+        "type": "object",
+        "properties": {
+            "expression": {"type": "string"}
+        }
+    }
+}]
+prompt = f"""Available tools: {json.dumps(tools)}
+User: What is 125 * 48?
+Assistant (respond with JSON):"""
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+outputs = model.generate(**inputs, max_new_tokens=128, temperature=0.2)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+## Memory-Efficient Options
+### 8-bit Quantization
+```python
+from transformers import BitsAndBytesConfig
+quantization_config = BitsAndBytesConfig(load_in_8bit=True)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    quantization_config=quantization_config,
+    device_map="auto"
+)
+```
+### 4-bit Quantization
+```python
+quantization_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_compute_dtype=torch.bfloat16,
+    bnb_4bit_use_double_quant=True,
+    bnb_4bit_quant_type="nf4"
+)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    quantization_config=quantization_config,
+    device_map="auto"
+)
+```
+## Running Benchmarks
+```bash
+# Full benchmark suite
+python benchmark.py --model DeepXR/Helion-V2.0-Thinking
+# Evaluation suite
+python evaluate.py --model DeepXR/Helion-V2.0-Thinking
+```
+## Common Use Cases
+### Chatbot
+```python
+conversation = []
+while True:
+    user_input = input("You: ")
+    if user_input.lower() == 'quit':
+        break
+    conversation.append({"role": "user", "content": user_input})
+    prompt = "\n".join([
+        f"{msg['role'].capitalize()}: {msg['content']}"
+        for msg in conversation
+    ]) + "\nAssistant:"
+    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+    outputs = model.generate(**inputs, max_new_tokens=512)
+    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+    response = response.split("Assistant:")[-1].strip()
+    conversation.append({"role": "assistant", "content": response})
+    print(f"Assistant: {response}")
+```
+### Document Analysis
+```python
+# Read long document
+with open("document.txt", "r") as f:
+    document = f.read()
+prompt = f"""{document}
+Please provide:
+1. A summary of the main points
+2. Key takeaways
+3. Any recommendations
+Summary:"""
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+outputs = model.generate(**inputs, max_new_tokens=1024)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+### Code Generation
+```python
+prompt = """Write a Python function that:
+1. Takes a list of numbers
+2. Removes duplicates
+3. Returns sorted in descending order
+Include type hints and docstring."""
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+outputs = model.generate(
+    **inputs,
+    max_new_tokens=512,
+    temperature=0.3  # Lower temperature for code
+)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+## Troubleshooting
+### Out of Memory
+1. Use quantization (4-bit or 8-bit)
+2. Reduce `max_new_tokens`
+3. Enable gradient checkpointing
+4. Use smaller batch sizes
+### Slow Performance
+1. Enable Flash Attention 2: `use_flash_attention_2=True`
+2. Use GPU if available
+3. Reduce context length
+4. Use quantization
+### Installation Issues
+```bash
+# Update pip
+pip install --upgrade pip
+# Install from scratch
+pip uninstall transformers torch
+pip install transformers torch accelerate
+# CUDA issues
+pip install torch --index-url https://download.pytorch.org/whl/cu121
+```
+## Next Steps
+- Read the full [README.md](README.md) for detailed documentation
+- Check out [inference.py](inference.py) for more examples
+- Review [safety_wrapper.py](safety_wrapper.py) for safety features
+- Run [benchmark.py](benchmark.py) to test performance
+- See [evaluate.py](evaluate.py) for quality metrics
+## Support
+For issues and questions:
+- Check the Hugging Face model page
+- Review existing issues
+- Submit a new issue with details
+## License
+Apache 2.0 - See LICENSE file for details