File size: 6,236 Bytes
67c580c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 |
# Deployment Configuration Guide
## Critical Issues and Solutions
### 1. Cache Directory Permissions
**Problem**: `PermissionError: [Errno 13] Permission denied: '/.cache'`
**Solution**: The code now automatically detects Docker and uses `/tmp/huggingface_cache`. However, ensure the Dockerfile sets proper permissions.
**Dockerfile Fix**:
```dockerfile
# Create cache directory with proper permissions
RUN mkdir -p /tmp/huggingface_cache && chmod 777 /tmp/huggingface_cache
ENV HF_HOME=/tmp/huggingface_cache
ENV TRANSFORMERS_CACHE=/tmp/huggingface_cache
```
### 2. User ID Issues
**Problem**: `KeyError: 'getpwuid(): uid not found: 1000'`
**Solution**: Run container with proper user or ensure user exists in container.
**Option A - Use root (simplest for HF Spaces)**:
```dockerfile
# Already running as root in HF Spaces - this is fine
# Just ensure cache directories are writable
```
**Option B - Create user in Dockerfile**:
```dockerfile
RUN useradd -m -u 1000 -s /bin/bash appuser && \
mkdir -p /tmp/huggingface_cache && \
chown -R appuser:appuser /tmp/huggingface_cache /app
USER appuser
```
**For Hugging Face Spaces**: Spaces typically run as root, so Option A is fine.
### 3. HuggingFace Token Configuration
**Problem**: Gated repository access errors
**Solution**: Set HF_TOKEN in Hugging Face Spaces secrets.
**Steps**:
1. Go to your Space → Settings → Repository secrets
2. Add `HF_TOKEN` with your Hugging Face access token
3. Token should have read access to gated models
**Verify Token**:
```bash
# Test token access
curl -H "Authorization: Bearer YOUR_TOKEN" https://huggingface.co/api/models/Qwen/Qwen2.5-7B-Instruct
```
### 4. GPU Tensor Device Placement
**Problem**: `Tensor on device cuda:0 is not on the expected device meta!`
**Solution**: Use explicit device placement instead of `device_map="auto"` for non-quantized models.
**Code Fix**: Already implemented in `src/local_model_loader.py` - uses `device_map="auto"` only with quantization, explicit placement otherwise.
### 5. Model Selection for Testing
**Current Models**:
- Primary: `Qwen/Qwen2.5-7B-Instruct` (gated - requires access)
- Fallback: `microsoft/Phi-3-mini-4k-instruct` (non-gated, verified)
**For Testing Without Gated Models**:
Update `src/models_config.py` to use non-gated models:
```python
"reasoning_primary": {
"model_id": "microsoft/Phi-3-mini-4k-instruct", # Non-gated
...
}
```
## Recommended Dockerfile Updates
```dockerfile
FROM python:3.10-slim
WORKDIR /app
# Install system dependencies
RUN apt-get update && apt-get install -y \
gcc \
g++ \
cmake \
libopenblas-dev \
libomp-dev \
curl \
&& rm -rf /var/lib/apt/lists/*
# Create cache directories with proper permissions
RUN mkdir -p /tmp/huggingface_cache && \
chmod 777 /tmp/huggingface_cache && \
mkdir -p /tmp/logs && \
chmod 777 /tmp/logs
# Copy requirements file
COPY requirements.txt .
# Install Python dependencies
RUN pip install --no-cache-dir --upgrade pip && \
pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY . .
# Set environment variables
ENV PYTHONUNBUFFERED=1
ENV PORT=7860
ENV OMP_NUM_THREADS=4
ENV MKL_NUM_THREADS=4
ENV DB_PATH=/tmp/sessions.db
ENV FAISS_INDEX_PATH=/tmp/embeddings.faiss
ENV LOG_DIR=/tmp/logs
ENV HF_HOME=/tmp/huggingface_cache
ENV TRANSFORMERS_CACHE=/tmp/huggingface_cache
ENV RATE_LIMIT_ENABLED=true
# Expose port
EXPOSE 7860
# Health check
HEALTHCHECK --interval=30s --timeout=30s --start-period=120s --retries=3 \
CMD curl -f http://localhost:7860/api/health || exit 1
# Run with Gunicorn
CMD ["gunicorn", "--bind", "0.0.0.0:7860", "--workers", "4", "--threads", "2", "--timeout", "120", "--access-logfile", "-", "--error-logfile", "-", "--log-level", "info", "flask_api_standalone:app"]
```
## Hugging Face Spaces Configuration
### Required Secrets:
1. `HF_TOKEN` - Your Hugging Face access token (for gated models)
### Environment Variables (Optional):
- `HF_HOME` - Will auto-detect to `/tmp/huggingface_cache` in Docker
- `TRANSFORMERS_CACHE` - Will auto-detect to `/tmp/huggingface_cache` in Docker
### Hardware Requirements:
- GPU: NVIDIA T4 (16GB VRAM) - ✅ Detected in logs
- Memory: At least 8GB RAM
- Disk: 20GB+ for model cache
## Verification Steps
1. **Check Cache Directory**:
```bash
ls -la /tmp/huggingface_cache
# Should show writable directory
```
2. **Check HF Token**:
```python
import os
print("HF_TOKEN set:", bool(os.getenv("HF_TOKEN")))
```
3. **Check GPU**:
```python
import torch
print("CUDA available:", torch.cuda.is_available())
print("GPU:", torch.cuda.get_device_name(0) if torch.cuda.is_available() else "None")
```
4. **Test Model Loading**:
- Check logs for: `✓ Cache directory verified: /tmp/huggingface_cache`
- Check logs for: `✓ HF_TOKEN authenticated for gated model access` (if token set)
- Check logs for: `✓ Model loaded successfully`
## Troubleshooting
### Issue: Still getting permission errors
**Fix**: Ensure Dockerfile creates cache directory with 777 permissions
### Issue: Gated repository errors persist
**Fix**:
1. Verify HF_TOKEN is set in Spaces secrets
2. Visit model page and request access
3. Wait for approval (usually instant)
4. Use fallback model (Phi-3-mini) until access granted
### Issue: Tensor device errors
**Fix**: Code now handles this - if quantization fails, loads without quantization and uses explicit device placement
### Issue: Model too large for GPU
**Fix**:
- Code automatically falls back to no quantization if bitsandbytes fails
- Consider using smaller model (Phi-3-mini) for testing
- Check GPU memory: `nvidia-smi`
## Quick Start Checklist
- [ ] HF_TOKEN set in Spaces secrets
- [ ] Dockerfile creates cache directory with proper permissions
- [ ] GPU detected (check logs)
- [ ] Cache directory writable (check logs)
- [ ] Model access granted (or using non-gated fallback)
- [ ] No tensor device errors (check logs)
## Next Steps
1. Update Dockerfile with cache directory creation
2. Set HF_TOKEN in Spaces secrets
3. Request access to gated models (Qwen)
4. Test with fallback model first (Phi-3-mini)
5. Monitor logs for successful model loading
|