Helion-V2.0-Thinking / QUICKSTART.md

Create QUICKSTART.md

a95fb3c verified about 1 month ago

6.41 kB

	# Helion-V2.0-Thinking Quickstart Guide

	Get started with Helion-V2.0-Thinking in minutes.

	## Installation

	### Basic Installation

	```bash
	pip install transformers torch accelerate pillow requests
	```

	### Full Installation (with all features)

	```bash
	pip install -r requirements.txt
	```

	### GPU Requirements

	- Minimum: 24GB VRAM (RTX 4090, A5000)
	- Recommended: 40GB+ VRAM (A100, H100)
	- Quantized (8-bit): 16GB VRAM
	- Quantized (4-bit): 12GB VRAM

	## Quick Examples

	### 1. Basic Text Generation

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "DeepXR/Helion-V2.0-Thinking"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype="auto",
	device_map="auto"
	)

	prompt = "What is artificial intelligence?"
	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
	outputs = model.generate(**inputs, max_new_tokens=256)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	### 2. Image Understanding

	```python
	from transformers import AutoModelForCausalLM, AutoProcessor
	from PIL import Image

	processor = AutoProcessor.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype="auto",
	device_map="auto"
	)

	image = Image.open("photo.jpg")
	prompt = "What is in this image?"
	inputs = processor(text=prompt, images=image, return_tensors="pt").to(model.device)
	outputs = model.generate(**inputs, max_new_tokens=256)
	print(processor.decode(outputs[0], skip_special_tokens=True))
	```

	### 3. Using the Inference Script

	```bash
	# Interactive chat mode
	python inference.py --interactive

	# With image analysis
	python inference.py --image photo.jpg --prompt "Describe this image"

	# Run demos
	python inference.py --demo

	# With quantization (saves memory)
	python inference.py --interactive --load-in-4bit
	```

	### 4. With Safety Wrapper

	```python
	from safety_wrapper import SafeHelionWrapper

	# Initialize with safety features
	wrapper = SafeHelionWrapper(
	model_name="DeepXR/Helion-V2.0-Thinking",
	enable_safety=True,
	enable_rate_limiting=True
	)

	# Safe generation
	response = wrapper.generate(
	prompt="Explain photosynthesis",
	max_new_tokens=256
	)
	print(response)
	```

	### 5. Function Calling

	```python
	import json

	tools = [{
	"name": "calculator",
	"description": "Perform calculations",
	"parameters": {
	"type": "object",
	"properties": {
	"expression": {"type": "string"}
	}
	}
	}]

	prompt = f"""Available tools: {json.dumps(tools)}

	User: What is 125 * 48?
	Assistant (respond with JSON):"""

	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
	outputs = model.generate(**inputs, max_new_tokens=128, temperature=0.2)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	## Memory-Efficient Options

	### 8-bit Quantization

	```python
	from transformers import BitsAndBytesConfig

	quantization_config = BitsAndBytesConfig(load_in_8bit=True)

	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	quantization_config=quantization_config,
	device_map="auto"
	)
	```

	### 4-bit Quantization

	```python
	quantization_config = BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_compute_dtype=torch.bfloat16,
	bnb_4bit_use_double_quant=True,
	bnb_4bit_quant_type="nf4"
	)

	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	quantization_config=quantization_config,
	device_map="auto"
	)
	```

	## Running Benchmarks

	```bash
	# Full benchmark suite
	python benchmark.py --model DeepXR/Helion-V2.0-Thinking

	# Evaluation suite
	python evaluate.py --model DeepXR/Helion-V2.0-Thinking
	```

	## Common Use Cases

	### Chatbot

	```python
	conversation = []

	while True:
	user_input = input("You: ")
	if user_input.lower() == 'quit':
	break

	conversation.append({"role": "user", "content": user_input})

	prompt = "\n".join([
	f"{msg['role'].capitalize()}: {msg['content']}"
	for msg in conversation
	]) + "\nAssistant:"

	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
	outputs = model.generate(**inputs, max_new_tokens=512)
	response = tokenizer.decode(outputs[0], skip_special_tokens=True)
	response = response.split("Assistant:")[-1].strip()

	conversation.append({"role": "assistant", "content": response})
	print(f"Assistant: {response}")
	```

	### Document Analysis

	```python
	# Read long document
	with open("document.txt", "r") as f:
	document = f.read()

	prompt = f"""{document}

	Please provide:
	1. A summary of the main points
	2. Key takeaways
	3. Any recommendations

	Summary:"""

	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
	outputs = model.generate(**inputs, max_new_tokens=1024)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	### Code Generation

	```python
	prompt = """Write a Python function that:
	1. Takes a list of numbers
	2. Removes duplicates
	3. Returns sorted in descending order

	Include type hints and docstring."""

	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
	outputs = model.generate(
	**inputs,
	max_new_tokens=512,
	temperature=0.3 # Lower temperature for code
	)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	## Troubleshooting

	### Out of Memory

	1. Use quantization (4-bit or 8-bit)
	2. Reduce `max_new_tokens`
	3. Enable gradient checkpointing
	4. Use smaller batch sizes

	### Slow Performance

	1. Enable Flash Attention 2: `use_flash_attention_2=True`
	2. Use GPU if available
	3. Reduce context length
	4. Use quantization

	### Installation Issues

	```bash
	# Update pip
	pip install --upgrade pip

	# Install from scratch
	pip uninstall transformers torch
	pip install transformers torch accelerate

	# CUDA issues
	pip install torch --index-url https://download.pytorch.org/whl/cu121
	```

	## Next Steps

	- Read the full [README.md](README.md) for detailed documentation
	- Check out [inference.py](inference.py) for more examples
	- Review [safety_wrapper.py](safety_wrapper.py) for safety features
	- Run [benchmark.py](benchmark.py) to test performance
	- See [evaluate.py](evaluate.py) for quality metrics

	## Support

	For issues and questions:
	- Check the Hugging Face model page
	- Review existing issues
	- Submit a new issue with details

	## License

	Apache 2.0 - See LICENSE file for details