Trouter-Library commited on
Commit
a95fb3c
·
verified ·
1 Parent(s): e489ad0

Create QUICKSTART.md

Browse files
Files changed (1) hide show
  1. QUICKSTART.md +287 -0
QUICKSTART.md ADDED
@@ -0,0 +1,287 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Helion-V2.0-Thinking Quickstart Guide
2
+
3
+ Get started with Helion-V2.0-Thinking in minutes.
4
+
5
+ ## Installation
6
+
7
+ ### Basic Installation
8
+
9
+ ```bash
10
+ pip install transformers torch accelerate pillow requests
11
+ ```
12
+
13
+ ### Full Installation (with all features)
14
+
15
+ ```bash
16
+ pip install -r requirements.txt
17
+ ```
18
+
19
+ ### GPU Requirements
20
+
21
+ - **Minimum**: 24GB VRAM (RTX 4090, A5000)
22
+ - **Recommended**: 40GB+ VRAM (A100, H100)
23
+ - **Quantized (8-bit)**: 16GB VRAM
24
+ - **Quantized (4-bit)**: 12GB VRAM
25
+
26
+ ## Quick Examples
27
+
28
+ ### 1. Basic Text Generation
29
+
30
+ ```python
31
+ from transformers import AutoModelForCausalLM, AutoTokenizer
32
+
33
+ model_name = "DeepXR/Helion-V2.0-Thinking"
34
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
35
+ model = AutoModelForCausalLM.from_pretrained(
36
+ model_name,
37
+ torch_dtype="auto",
38
+ device_map="auto"
39
+ )
40
+
41
+ prompt = "What is artificial intelligence?"
42
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
43
+ outputs = model.generate(**inputs, max_new_tokens=256)
44
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
45
+ ```
46
+
47
+ ### 2. Image Understanding
48
+
49
+ ```python
50
+ from transformers import AutoModelForCausalLM, AutoProcessor
51
+ from PIL import Image
52
+
53
+ processor = AutoProcessor.from_pretrained(model_name)
54
+ model = AutoModelForCausalLM.from_pretrained(
55
+ model_name,
56
+ torch_dtype="auto",
57
+ device_map="auto"
58
+ )
59
+
60
+ image = Image.open("photo.jpg")
61
+ prompt = "What is in this image?"
62
+ inputs = processor(text=prompt, images=image, return_tensors="pt").to(model.device)
63
+ outputs = model.generate(**inputs, max_new_tokens=256)
64
+ print(processor.decode(outputs[0], skip_special_tokens=True))
65
+ ```
66
+
67
+ ### 3. Using the Inference Script
68
+
69
+ ```bash
70
+ # Interactive chat mode
71
+ python inference.py --interactive
72
+
73
+ # With image analysis
74
+ python inference.py --image photo.jpg --prompt "Describe this image"
75
+
76
+ # Run demos
77
+ python inference.py --demo
78
+
79
+ # With quantization (saves memory)
80
+ python inference.py --interactive --load-in-4bit
81
+ ```
82
+
83
+ ### 4. With Safety Wrapper
84
+
85
+ ```python
86
+ from safety_wrapper import SafeHelionWrapper
87
+
88
+ # Initialize with safety features
89
+ wrapper = SafeHelionWrapper(
90
+ model_name="DeepXR/Helion-V2.0-Thinking",
91
+ enable_safety=True,
92
+ enable_rate_limiting=True
93
+ )
94
+
95
+ # Safe generation
96
+ response = wrapper.generate(
97
+ prompt="Explain photosynthesis",
98
+ max_new_tokens=256
99
+ )
100
+ print(response)
101
+ ```
102
+
103
+ ### 5. Function Calling
104
+
105
+ ```python
106
+ import json
107
+
108
+ tools = [{
109
+ "name": "calculator",
110
+ "description": "Perform calculations",
111
+ "parameters": {
112
+ "type": "object",
113
+ "properties": {
114
+ "expression": {"type": "string"}
115
+ }
116
+ }
117
+ }]
118
+
119
+ prompt = f"""Available tools: {json.dumps(tools)}
120
+
121
+ User: What is 125 * 48?
122
+ Assistant (respond with JSON):"""
123
+
124
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
125
+ outputs = model.generate(**inputs, max_new_tokens=128, temperature=0.2)
126
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
127
+ ```
128
+
129
+ ## Memory-Efficient Options
130
+
131
+ ### 8-bit Quantization
132
+
133
+ ```python
134
+ from transformers import BitsAndBytesConfig
135
+
136
+ quantization_config = BitsAndBytesConfig(load_in_8bit=True)
137
+
138
+ model = AutoModelForCausalLM.from_pretrained(
139
+ model_name,
140
+ quantization_config=quantization_config,
141
+ device_map="auto"
142
+ )
143
+ ```
144
+
145
+ ### 4-bit Quantization
146
+
147
+ ```python
148
+ quantization_config = BitsAndBytesConfig(
149
+ load_in_4bit=True,
150
+ bnb_4bit_compute_dtype=torch.bfloat16,
151
+ bnb_4bit_use_double_quant=True,
152
+ bnb_4bit_quant_type="nf4"
153
+ )
154
+
155
+ model = AutoModelForCausalLM.from_pretrained(
156
+ model_name,
157
+ quantization_config=quantization_config,
158
+ device_map="auto"
159
+ )
160
+ ```
161
+
162
+ ## Running Benchmarks
163
+
164
+ ```bash
165
+ # Full benchmark suite
166
+ python benchmark.py --model DeepXR/Helion-V2.0-Thinking
167
+
168
+ # Evaluation suite
169
+ python evaluate.py --model DeepXR/Helion-V2.0-Thinking
170
+ ```
171
+
172
+ ## Common Use Cases
173
+
174
+ ### Chatbot
175
+
176
+ ```python
177
+ conversation = []
178
+
179
+ while True:
180
+ user_input = input("You: ")
181
+ if user_input.lower() == 'quit':
182
+ break
183
+
184
+ conversation.append({"role": "user", "content": user_input})
185
+
186
+ prompt = "\n".join([
187
+ f"{msg['role'].capitalize()}: {msg['content']}"
188
+ for msg in conversation
189
+ ]) + "\nAssistant:"
190
+
191
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
192
+ outputs = model.generate(**inputs, max_new_tokens=512)
193
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
194
+ response = response.split("Assistant:")[-1].strip()
195
+
196
+ conversation.append({"role": "assistant", "content": response})
197
+ print(f"Assistant: {response}")
198
+ ```
199
+
200
+ ### Document Analysis
201
+
202
+ ```python
203
+ # Read long document
204
+ with open("document.txt", "r") as f:
205
+ document = f.read()
206
+
207
+ prompt = f"""{document}
208
+
209
+ Please provide:
210
+ 1. A summary of the main points
211
+ 2. Key takeaways
212
+ 3. Any recommendations
213
+
214
+ Summary:"""
215
+
216
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
217
+ outputs = model.generate(**inputs, max_new_tokens=1024)
218
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
219
+ ```
220
+
221
+ ### Code Generation
222
+
223
+ ```python
224
+ prompt = """Write a Python function that:
225
+ 1. Takes a list of numbers
226
+ 2. Removes duplicates
227
+ 3. Returns sorted in descending order
228
+
229
+ Include type hints and docstring."""
230
+
231
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
232
+ outputs = model.generate(
233
+ **inputs,
234
+ max_new_tokens=512,
235
+ temperature=0.3 # Lower temperature for code
236
+ )
237
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
238
+ ```
239
+
240
+ ## Troubleshooting
241
+
242
+ ### Out of Memory
243
+
244
+ 1. Use quantization (4-bit or 8-bit)
245
+ 2. Reduce `max_new_tokens`
246
+ 3. Enable gradient checkpointing
247
+ 4. Use smaller batch sizes
248
+
249
+ ### Slow Performance
250
+
251
+ 1. Enable Flash Attention 2: `use_flash_attention_2=True`
252
+ 2. Use GPU if available
253
+ 3. Reduce context length
254
+ 4. Use quantization
255
+
256
+ ### Installation Issues
257
+
258
+ ```bash
259
+ # Update pip
260
+ pip install --upgrade pip
261
+
262
+ # Install from scratch
263
+ pip uninstall transformers torch
264
+ pip install transformers torch accelerate
265
+
266
+ # CUDA issues
267
+ pip install torch --index-url https://download.pytorch.org/whl/cu121
268
+ ```
269
+
270
+ ## Next Steps
271
+
272
+ - Read the full [README.md](README.md) for detailed documentation
273
+ - Check out [inference.py](inference.py) for more examples
274
+ - Review [safety_wrapper.py](safety_wrapper.py) for safety features
275
+ - Run [benchmark.py](benchmark.py) to test performance
276
+ - See [evaluate.py](evaluate.py) for quality metrics
277
+
278
+ ## Support
279
+
280
+ For issues and questions:
281
+ - Check the Hugging Face model page
282
+ - Review existing issues
283
+ - Submit a new issue with details
284
+
285
+ ## License
286
+
287
+ Apache 2.0 - See LICENSE file for details