Helion-V2.0-Thinking / QUICKSTART.md
Trouter-Library's picture
Create QUICKSTART.md
a95fb3c verified
|
raw
history blame
6.41 kB
# Helion-V2.0-Thinking Quickstart Guide
Get started with Helion-V2.0-Thinking in minutes.
## Installation
### Basic Installation
```bash
pip install transformers torch accelerate pillow requests
```
### Full Installation (with all features)
```bash
pip install -r requirements.txt
```
### GPU Requirements
- **Minimum**: 24GB VRAM (RTX 4090, A5000)
- **Recommended**: 40GB+ VRAM (A100, H100)
- **Quantized (8-bit)**: 16GB VRAM
- **Quantized (4-bit)**: 12GB VRAM
## Quick Examples
### 1. Basic Text Generation
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "DeepXR/Helion-V2.0-Thinking"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
prompt = "What is artificial intelligence?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
### 2. Image Understanding
```python
from transformers import AutoModelForCausalLM, AutoProcessor
from PIL import Image
processor = AutoProcessor.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
image = Image.open("photo.jpg")
prompt = "What is in this image?"
inputs = processor(text=prompt, images=image, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(processor.decode(outputs[0], skip_special_tokens=True))
```
### 3. Using the Inference Script
```bash
# Interactive chat mode
python inference.py --interactive
# With image analysis
python inference.py --image photo.jpg --prompt "Describe this image"
# Run demos
python inference.py --demo
# With quantization (saves memory)
python inference.py --interactive --load-in-4bit
```
### 4. With Safety Wrapper
```python
from safety_wrapper import SafeHelionWrapper
# Initialize with safety features
wrapper = SafeHelionWrapper(
model_name="DeepXR/Helion-V2.0-Thinking",
enable_safety=True,
enable_rate_limiting=True
)
# Safe generation
response = wrapper.generate(
prompt="Explain photosynthesis",
max_new_tokens=256
)
print(response)
```
### 5. Function Calling
```python
import json
tools = [{
"name": "calculator",
"description": "Perform calculations",
"parameters": {
"type": "object",
"properties": {
"expression": {"type": "string"}
}
}
}]
prompt = f"""Available tools: {json.dumps(tools)}
User: What is 125 * 48?
Assistant (respond with JSON):"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=128, temperature=0.2)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Memory-Efficient Options
### 8-bit Quantization
```python
from transformers import BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(load_in_8bit=True)
model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=quantization_config,
device_map="auto"
)
```
### 4-bit Quantization
```python
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4"
)
model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=quantization_config,
device_map="auto"
)
```
## Running Benchmarks
```bash
# Full benchmark suite
python benchmark.py --model DeepXR/Helion-V2.0-Thinking
# Evaluation suite
python evaluate.py --model DeepXR/Helion-V2.0-Thinking
```
## Common Use Cases
### Chatbot
```python
conversation = []
while True:
user_input = input("You: ")
if user_input.lower() == 'quit':
break
conversation.append({"role": "user", "content": user_input})
prompt = "\n".join([
f"{msg['role'].capitalize()}: {msg['content']}"
for msg in conversation
]) + "\nAssistant:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
response = response.split("Assistant:")[-1].strip()
conversation.append({"role": "assistant", "content": response})
print(f"Assistant: {response}")
```
### Document Analysis
```python
# Read long document
with open("document.txt", "r") as f:
document = f.read()
prompt = f"""{document}
Please provide:
1. A summary of the main points
2. Key takeaways
3. Any recommendations
Summary:"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=1024)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
### Code Generation
```python
prompt = """Write a Python function that:
1. Takes a list of numbers
2. Removes duplicates
3. Returns sorted in descending order
Include type hints and docstring."""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.3 # Lower temperature for code
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Troubleshooting
### Out of Memory
1. Use quantization (4-bit or 8-bit)
2. Reduce `max_new_tokens`
3. Enable gradient checkpointing
4. Use smaller batch sizes
### Slow Performance
1. Enable Flash Attention 2: `use_flash_attention_2=True`
2. Use GPU if available
3. Reduce context length
4. Use quantization
### Installation Issues
```bash
# Update pip
pip install --upgrade pip
# Install from scratch
pip uninstall transformers torch
pip install transformers torch accelerate
# CUDA issues
pip install torch --index-url https://download.pytorch.org/whl/cu121
```
## Next Steps
- Read the full [README.md](README.md) for detailed documentation
- Check out [inference.py](inference.py) for more examples
- Review [safety_wrapper.py](safety_wrapper.py) for safety features
- Run [benchmark.py](benchmark.py) to test performance
- See [evaluate.py](evaluate.py) for quality metrics
## Support
For issues and questions:
- Check the Hugging Face model page
- Review existing issues
- Submit a new issue with details
## License
Apache 2.0 - See LICENSE file for details