# Helion-V2.0-Thinking Quickstart Guide

Get started with Helion-V2.0-Thinking in minutes.

## Installation

### Basic Installation

```bash
pip install transformers torch accelerate pillow requests
```

### Full Installation (with all features)

```bash
pip install -r requirements.txt
```

### GPU Requirements

- **Minimum**: 24GB VRAM (RTX 4090, A5000)
- **Recommended**: 40GB+ VRAM (A100, H100)
- **Quantized (8-bit)**: 16GB VRAM
- **Quantized (4-bit)**: 12GB VRAM

## Quick Examples

### 1. Basic Text Generation

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "DeepXR/Helion-V2.0-Thinking"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

prompt = "What is artificial intelligence?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

### 2. Image Understanding

```python
from transformers import AutoModelForCausalLM, AutoProcessor
from PIL import Image

processor = AutoProcessor.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

image = Image.open("photo.jpg")
prompt = "What is in this image?"
inputs = processor(text=prompt, images=image, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(processor.decode(outputs[0], skip_special_tokens=True))
```

### 3. Using the Inference Script

```bash
# Interactive chat mode
python inference.py --interactive

# With image analysis
python inference.py --image photo.jpg --prompt "Describe this image"

# Run demos
python inference.py --demo

# With quantization (saves memory)
python inference.py --interactive --load-in-4bit
```

### 4. With Safety Wrapper

```python
from safety_wrapper import SafeHelionWrapper

# Initialize with safety features
wrapper = SafeHelionWrapper(
    model_name="DeepXR/Helion-V2.0-Thinking",
    enable_safety=True,
    enable_rate_limiting=True
)

# Safe generation
response = wrapper.generate(
    prompt="Explain photosynthesis",
    max_new_tokens=256
)
print(response)
```

### 5. Function Calling

```python
import json

tools = [{
    "name": "calculator",
    "description": "Perform calculations",
    "parameters": {
        "type": "object",
        "properties": {
            "expression": {"type": "string"}
        }
    }
}]

prompt = f"""Available tools: {json.dumps(tools)}

User: What is 125 * 48?
Assistant (respond with JSON):"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=128, temperature=0.2)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

## Memory-Efficient Options

### 8-bit Quantization

```python
from transformers import BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(load_in_8bit=True)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=quantization_config,
    device_map="auto"
)
```

### 4-bit Quantization

```python
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4"
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=quantization_config,
    device_map="auto"
)
```

## Running Benchmarks

```bash
# Full benchmark suite
python benchmark.py --model DeepXR/Helion-V2.0-Thinking

# Evaluation suite
python evaluate.py --model DeepXR/Helion-V2.0-Thinking
```

## Common Use Cases

### Chatbot

```python
conversation = []

while True:
    user_input = input("You: ")
    if user_input.lower() == 'quit':
        break
    
    conversation.append({"role": "user", "content": user_input})
    
    prompt = "\n".join([
        f"{msg['role'].capitalize()}: {msg['content']}" 
        for msg in conversation
    ]) + "\nAssistant:"
    
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    outputs = model.generate(**inputs, max_new_tokens=512)
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    response = response.split("Assistant:")[-1].strip()
    
    conversation.append({"role": "assistant", "content": response})
    print(f"Assistant: {response}")
```

### Document Analysis

```python
# Read long document
with open("document.txt", "r") as f:
    document = f.read()

prompt = f"""{document}

Please provide:
1. A summary of the main points
2. Key takeaways
3. Any recommendations

Summary:"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=1024)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

### Code Generation

```python
prompt = """Write a Python function that:
1. Takes a list of numbers
2. Removes duplicates
3. Returns sorted in descending order

Include type hints and docstring."""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.3  # Lower temperature for code
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

## Troubleshooting

### Out of Memory

1. Use quantization (4-bit or 8-bit)
2. Reduce `max_new_tokens`
3. Enable gradient checkpointing
4. Use smaller batch sizes

### Slow Performance

1. Enable Flash Attention 2: `use_flash_attention_2=True`
2. Use GPU if available
3. Reduce context length
4. Use quantization

### Installation Issues

```bash
# Update pip
pip install --upgrade pip

# Install from scratch
pip uninstall transformers torch
pip install transformers torch accelerate

# CUDA issues
pip install torch --index-url https://download.pytorch.org/whl/cu121
```

## Next Steps

- Read the full [README.md](README.md) for detailed documentation
- Check out [inference.py](inference.py) for more examples
- Review [safety_wrapper.py](safety_wrapper.py) for safety features
- Run [benchmark.py](benchmark.py) to test performance
- See [evaluate.py](evaluate.py) for quality metrics

## Support

For issues and questions:
- Check the Hugging Face model page
- Review existing issues
- Submit a new issue with details

## License

Apache 2.0 - See LICENSE file for details