ianshank commited on
Commit
bc47fb9
Β·
verified Β·
1 Parent(s): 6510698

🚨 Emergency fix: Ensure prestart script execution and proper dependency installation

Browse files
Files changed (2) hide show
  1. README.md +32 -153
  2. start.sh +20 -0
README.md CHANGED
@@ -6,7 +6,8 @@ colorTo: purple
6
  sdk: gradio
7
  sdk_version: 4.44.0
8
  app_file: app/app.py
9
- startup_duration_timeout: 300
 
10
  pinned: false
11
  license: mit
12
  short_description: AI assistant with expert routing and CPU/GPU support
@@ -18,184 +19,62 @@ models:
18
 
19
  A robust, production-ready AI assistant powered by Microsoft's Phi-3.5-MoE model with intelligent expert routing and comprehensive CPU/GPU environment support.
20
 
21
- ## ✨ Features
22
 
23
  - **🧠 Expert Routing**: Automatically routes queries to specialized experts (Code, Math, Reasoning, Multilingual, General)
24
  - **πŸ”§ Environment Adaptive**: Works seamlessly on both CPU and GPU environments
25
  - **πŸ›‘οΈ Robust Dependency Management**: Conditional installation of dependencies based on environment
26
  - **πŸ“¦ Modular Architecture**: Clean, maintainable, and testable codebase
27
- - **πŸ§ͺ Comprehensive Testing**: Unit, contract, and integration tests
28
  - **⚑ Performance Optimized**: Environment-specific optimizations for best performance
29
 
30
- ## πŸ—οΈ Architecture
 
 
 
 
 
 
31
 
32
- ### Modular Components
33
 
34
  ```
35
  app/
36
  β”œβ”€β”€ app.py # Main application entry point
37
- β”œβ”€β”€ model_loader.py # Model loading with environment detection
38
- β”œβ”€β”€ interface.py # Gradio interface and expert routing
39
  β”œβ”€β”€ config/
40
  β”‚ └── model_config.py # Environment detection and configuration
41
  └── requirements.txt # Core dependencies (no flash-attn)
42
 
43
  scripts/
44
- β”œβ”€β”€ select_revision.py # CPU-safe model revision selector
45
- └── utils/ # Utility functions
46
 
47
- tests/
48
- β”œβ”€β”€ unit/ # Unit tests for individual components
49
- β”œβ”€β”€ contract/ # Contract tests for external APIs
50
- └── integration/ # Full workflow integration tests
51
-
52
- prestart.sh # Environment setup and conditional dependency installation
53
  ```
54
 
55
- ### Key Innovations
56
-
57
- 1. **Conditional Dependency Installation**: Flash-attention is only installed when GPU is available
58
- 2. **CPU-Safe Revision Selection**: Automatically selects model revisions that work on CPU
59
- 3. **Environment-Specific Configuration**: Optimized settings for CPU vs GPU environments
60
- 4. **Comprehensive Error Handling**: Graceful fallbacks when components fail
61
- 5. **Expert Query Classification**: Intelligent routing based on query content
62
-
63
- ## πŸš€ Quick Start
64
-
65
- ### For Hugging Face Spaces
66
-
67
- The application automatically handles environment setup. Simply deploy and it will:
68
-
69
- 1. Detect CPU/GPU environment
70
- 2. Install appropriate dependencies
71
- 3. Select compatible model revision (if needed)
72
- 4. Launch the interface
73
-
74
- ### Local Development
75
-
76
- ```bash
77
- # Clone the repository
78
- git clone <repository-url>
79
- cd phi35-moe-expert-assistant
80
-
81
- # Run prestart setup
82
- ./prestart.sh
83
-
84
- # Start the application
85
- python app/app.py
86
- ```
87
-
88
- ### Testing
89
-
90
- ```bash
91
- # Run all tests
92
- pytest tests/
93
-
94
- # Run specific test categories
95
- pytest tests/unit/ # Unit tests
96
- pytest tests/contract/ # Contract tests
97
- pytest tests/integration/ # Integration tests
98
-
99
- # Run with coverage
100
- pytest --cov=app tests/
101
- ```
102
-
103
- ## πŸ”§ Configuration
104
-
105
- ### Environment Variables
106
-
107
- - `HF_MODEL_ID`: Model to use (default: microsoft/Phi-3.5-MoE-instruct)
108
- - `HF_REVISION`: Specific model revision (auto-selected for CPU if not set)
109
- - `HF_TOKEN`: Hugging Face token for private models
110
-
111
- ### CPU vs GPU Behavior
112
 
113
- | Environment | Model Dtype | Device Map | Attention | Flash-Attn | Revision |
114
- |-------------|-------------|------------|-----------|------------|----------|
115
- | **CPU** | float32 | cpu | eager | ❌ No | Auto-selected safe |
116
- | **GPU** | bfloat16 | auto | sdpa | βœ… Yes | Latest |
117
-
118
- ## πŸ§ͺ Testing Strategy
119
-
120
- ### Unit Tests
121
- - Individual component testing
122
- - Mocked external dependencies
123
- - Fast execution for CI/CD
124
-
125
- ### Contract Tests
126
- - External API interaction validation
127
- - Hugging Face API contracts
128
- - Transformers library contracts
129
-
130
- ### Integration Tests
131
- - Full workflow testing
132
- - CPU/GPU environment simulation
133
- - Error handling scenarios
134
-
135
- ## πŸ› οΈ Development
136
-
137
- ### Code Quality
138
- - **Black**: Code formatting
139
- - **Flake8**: Linting
140
- - **Type hints**: For better IDE support
141
- - **Docstrings**: Comprehensive documentation
142
-
143
- ### Best Practices
144
- - Modular, reusable components
145
- - Comprehensive error handling
146
- - Environment-specific optimizations
147
- - Extensive testing coverage
148
-
149
- ## πŸ” Troubleshooting
150
-
151
- ### Common Issues
152
-
153
- 1. **Import Errors**: Run `./prestart.sh` to install dependencies
154
- 2. **Model Loading Fails**: Check internet connection and HF_TOKEN
155
- 3. **CPU Performance**: Model automatically uses CPU-optimized settings
156
- 4. **Memory Issues**: Reduce max_tokens or use smaller model
157
-
158
- ### Debug Mode
159
-
160
- Set environment variable for detailed logging:
161
- ```bash
162
- export PYTHONPATH=.
163
- export LOG_LEVEL=DEBUG
164
- python app/app.py
165
- ```
166
 
167
  ## πŸ“Š Performance
168
 
169
- ### Benchmarks
170
-
171
- | Environment | Startup Time | Memory Usage | Tokens/sec |
172
- |-------------|--------------|--------------|------------|
173
- | CPU (16GB) | ~3-5 min | ~8-12 GB | ~2-5 |
174
- | GPU (24GB) | ~2-3 min | ~16-20 GB | ~15-30 |
175
-
176
- ### Optimizations
177
-
178
- - **CPU**: float32 precision, eager attention, memory optimization
179
- - **GPU**: bfloat16 precision, flash attention, parallel processing
180
 
181
- ## 🀝 Contributing
182
-
183
- 1. Fork the repository
184
- 2. Create a feature branch
185
- 3. Add comprehensive tests
186
- 4. Ensure all tests pass
187
- 5. Submit a pull request
188
-
189
- ## πŸ“„ License
190
-
191
- MIT License - see LICENSE file for details.
192
-
193
- ## πŸ™ Acknowledgments
194
 
195
- - Microsoft for the Phi-3.5-MoE model
196
- - Hugging Face for the transformers library and hosting
197
- - The open-source community for various dependencies
 
 
198
 
199
  ---
200
 
201
- **Built with ❀️ for robust, production-ready AI applications**
 
6
  sdk: gradio
7
  sdk_version: 4.44.0
8
  app_file: app/app.py
9
+ startup_duration_timeout: 600
10
+ prestart: ./prestart.sh
11
  pinned: false
12
  license: mit
13
  short_description: AI assistant with expert routing and CPU/GPU support
 
19
 
20
  A robust, production-ready AI assistant powered by Microsoft's Phi-3.5-MoE model with intelligent expert routing and comprehensive CPU/GPU environment support.
21
 
22
+ ## πŸš€ Key Features
23
 
24
  - **🧠 Expert Routing**: Automatically routes queries to specialized experts (Code, Math, Reasoning, Multilingual, General)
25
  - **πŸ”§ Environment Adaptive**: Works seamlessly on both CPU and GPU environments
26
  - **πŸ›‘οΈ Robust Dependency Management**: Conditional installation of dependencies based on environment
27
  - **πŸ“¦ Modular Architecture**: Clean, maintainable, and testable codebase
 
28
  - **⚑ Performance Optimized**: Environment-specific optimizations for best performance
29
 
30
+ ## πŸ”§ Recent Fixes
31
+
32
+ - βœ… **Missing Dependencies**: Added `einops` to requirements, conditional `flash_attn` installation
33
+ - βœ… **Deprecated Parameters**: Fixed all `torch_dtype` β†’ `dtype` usage
34
+ - βœ… **CPU Compatibility**: Automatic CPU-safe model revision selection
35
+ - βœ… **Error Handling**: Comprehensive fallback mechanisms
36
+ - βœ… **Security**: Updated to Gradio 4.44.0+ for security fixes
37
 
38
+ ## πŸ—οΈ Architecture
39
 
40
  ```
41
  app/
42
  β”œβ”€β”€ app.py # Main application entry point
43
+ β”œβ”€β”€ model_loader.py # Environment-adaptive model loading
44
+ β”œβ”€β”€ interface.py # Expert routing and Gradio interface
45
  β”œβ”€β”€ config/
46
  β”‚ └── model_config.py # Environment detection and configuration
47
  └── requirements.txt # Core dependencies (no flash-attn)
48
 
49
  scripts/
50
+ └── select_revision.py # CPU-safe model revision selector
 
51
 
52
+ prestart.sh # Environment setup and conditional dependencies
 
 
 
 
 
53
  ```
54
 
55
+ ## 🎯 How It Works
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
 
57
+ 1. **Environment Detection**: Automatically detects CPU vs GPU environment
58
+ 2. **Conditional Dependencies**: Installs `flash_attn` only when GPU is available
59
+ 3. **Model Configuration**: Uses optimal settings for each environment
60
+ 4. **Expert Routing**: Classifies queries and routes to appropriate expert
61
+ 5. **Graceful Fallbacks**: Works even when model loading fails
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
62
 
63
  ## πŸ“Š Performance
64
 
65
+ | Environment | Startup | Memory | Tokens/sec |
66
+ |-------------|---------|--------|------------|
67
+ | **CPU** | 3-5 min | 8-12 GB | 2-5 |
68
+ | **GPU** | 2-3 min | 16-20 GB | 15-30 |
 
 
 
 
 
 
 
69
 
70
+ ## πŸ” Troubleshooting
 
 
 
 
 
 
 
 
 
 
 
 
71
 
72
+ If you encounter issues:
73
+ 1. Check the logs for dependency installation
74
+ 2. Verify `prestart.sh` executed successfully
75
+ 3. Ensure all required packages are installed
76
+ 4. Try the fallback mode if model loading fails
77
 
78
  ---
79
 
80
+ **Built with ❀️ for reliable, production-ready AI applications**
start.sh ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+ set -euo pipefail
3
+
4
+ echo "πŸš€ Starting Phi-3.5-MoE Expert Assistant..."
5
+ echo "πŸ“… $(date)"
6
+
7
+ # Ensure we're in the right directory
8
+ cd /home/user
9
+
10
+ # Make prestart script executable
11
+ chmod +x prestart.sh
12
+
13
+ # Run prestart setup
14
+ echo "πŸ”§ Running prestart setup..."
15
+ ./prestart.sh
16
+
17
+ # Start the application
18
+ echo "πŸš€ Starting application..."
19
+ cd /home/user
20
+ python app/app.py