Spaces:

ianshank
/

phi35-moe-demo

Sleeping

App Files Files Community

ianshank commited on Sep 14

Commit

bc47fb9

verified ·

1 Parent(s): 6510698

🚨 Emergency fix: Ensure prestart script execution and proper dependency installation

Browse files

Files changed (2) hide show

README.md +32 -153
start.sh +20 -0

README.md CHANGED Viewed

@@ -6,7 +6,8 @@ colorTo: purple
 sdk: gradio
 sdk_version: 4.44.0
 app_file: app/app.py
-startup_duration_timeout: 300
 pinned: false
 license: mit
 short_description: AI assistant with expert routing and CPU/GPU support
@@ -18,184 +19,62 @@ models:
 A robust, production-ready AI assistant powered by Microsoft's Phi-3.5-MoE model with intelligent expert routing and comprehensive CPU/GPU environment support.
-## ✨ Features
 - **🧠 Expert Routing**: Automatically routes queries to specialized experts (Code, Math, Reasoning, Multilingual, General)
 - **🔧 Environment Adaptive**: Works seamlessly on both CPU and GPU environments
 - **🛡️ Robust Dependency Management**: Conditional installation of dependencies based on environment
 - **📦 Modular Architecture**: Clean, maintainable, and testable codebase
-- **🧪 Comprehensive Testing**: Unit, contract, and integration tests
 - **⚡ Performance Optimized**: Environment-specific optimizations for best performance
-## 🏗️ Architecture
-### Modular Components
 ```
 app/
 ├── app.py              # Main application entry point
-├── model_loader.py     # Model loading with environment detection
-├── interface.py        # Gradio interface and expert routing
 ├── config/
 │   └── model_config.py # Environment detection and configuration
 └── requirements.txt    # Core dependencies (no flash-attn)
 scripts/
-├── select_revision.py  # CPU-safe model revision selector
-└── utils/             # Utility functions
-tests/
-├── unit/              # Unit tests for individual components
-├── contract/          # Contract tests for external APIs
-└── integration/       # Full workflow integration tests
-prestart.sh            # Environment setup and conditional dependency installation
 ```
-### Key Innovations
-1. **Conditional Dependency Installation**: Flash-attention is only installed when GPU is available
-2. **CPU-Safe Revision Selection**: Automatically selects model revisions that work on CPU
-3. **Environment-Specific Configuration**: Optimized settings for CPU vs GPU environments
-4. **Comprehensive Error Handling**: Graceful fallbacks when components fail
-5. **Expert Query Classification**: Intelligent routing based on query content
-## 🚀 Quick Start
-### For Hugging Face Spaces
-The application automatically handles environment setup. Simply deploy and it will:
-1. Detect CPU/GPU environment
-2. Install appropriate dependencies
-3. Select compatible model revision (if needed)
-4. Launch the interface
-### Local Development
-```bash
-# Clone the repository
-git clone <repository-url>
-cd phi35-moe-expert-assistant
-# Run prestart setup
-./prestart.sh
-# Start the application
-python app/app.py
-```
-### Testing
-```bash
-# Run all tests
-pytest tests/
-# Run specific test categories
-pytest tests/unit/          # Unit tests
-pytest tests/contract/      # Contract tests
-pytest tests/integration/   # Integration tests
-# Run with coverage
-pytest --cov=app tests/
-```
-## 🔧 Configuration
-### Environment Variables
-- `HF_MODEL_ID`: Model to use (default: microsoft/Phi-3.5-MoE-instruct)
-- `HF_REVISION`: Specific model revision (auto-selected for CPU if not set)
-- `HF_TOKEN`: Hugging Face token for private models
-### CPU vs GPU Behavior
-| Environment | Model Dtype | Device Map | Attention | Flash-Attn | Revision |
-|-------------|-------------|------------|-----------|------------|----------|
-| **CPU**     | float32     | cpu        | eager     | ❌ No      | Auto-selected safe |
-| **GPU**     | bfloat16    | auto       | sdpa      | ✅ Yes     | Latest |
-## 🧪 Testing Strategy
-### Unit Tests
-- Individual component testing
-- Mocked external dependencies
-- Fast execution for CI/CD
-### Contract Tests
-- External API interaction validation
-- Hugging Face API contracts
-- Transformers library contracts
-### Integration Tests
-- Full workflow testing
-- CPU/GPU environment simulation
-- Error handling scenarios
-## 🛠️ Development
-### Code Quality
-- **Black**: Code formatting
-- **Flake8**: Linting
-- **Type hints**: For better IDE support
-- **Docstrings**: Comprehensive documentation
-### Best Practices
-- Modular, reusable components
-- Comprehensive error handling
-- Environment-specific optimizations
-- Extensive testing coverage
-## 🔍 Troubleshooting
-### Common Issues
-1. **Import Errors**: Run `./prestart.sh` to install dependencies
-2. **Model Loading Fails**: Check internet connection and HF_TOKEN
-3. **CPU Performance**: Model automatically uses CPU-optimized settings
-4. **Memory Issues**: Reduce max_tokens or use smaller model
-### Debug Mode
-Set environment variable for detailed logging:
-```bash
-export PYTHONPATH=.
-export LOG_LEVEL=DEBUG
-python app/app.py
-```
 ## 📊 Performance
-### Benchmarks
-| Environment | Startup Time | Memory Usage | Tokens/sec |
-|-------------|--------------|--------------|------------|
-| CPU (16GB)  | ~3-5 min     | ~8-12 GB     | ~2-5       |
-| GPU (24GB)  | ~2-3 min     | ~16-20 GB    | ~15-30     |
-### Optimizations
-- **CPU**: float32 precision, eager attention, memory optimization
-- **GPU**: bfloat16 precision, flash attention, parallel processing
-## 🤝 Contributing
-1. Fork the repository
-2. Create a feature branch
-3. Add comprehensive tests
-4. Ensure all tests pass
-5. Submit a pull request
-## 📄 License
-MIT License - see LICENSE file for details.
-## 🙏 Acknowledgments
-- Microsoft for the Phi-3.5-MoE model
-- Hugging Face for the transformers library and hosting
-- The open-source community for various dependencies
 ---
-**Built with ❤️ for robust, production-ready AI applications**

 sdk: gradio
 sdk_version: 4.44.0
 app_file: app/app.py
+startup_duration_timeout: 600
+prestart: ./prestart.sh
 pinned: false
 license: mit
 short_description: AI assistant with expert routing and CPU/GPU support
 A robust, production-ready AI assistant powered by Microsoft's Phi-3.5-MoE model with intelligent expert routing and comprehensive CPU/GPU environment support.
+## 🚀 Key Features
 - **🧠 Expert Routing**: Automatically routes queries to specialized experts (Code, Math, Reasoning, Multilingual, General)
 - **🔧 Environment Adaptive**: Works seamlessly on both CPU and GPU environments
 - **🛡️ Robust Dependency Management**: Conditional installation of dependencies based on environment
 - **📦 Modular Architecture**: Clean, maintainable, and testable codebase
 - **⚡ Performance Optimized**: Environment-specific optimizations for best performance
+## 🔧 Recent Fixes
+- ✅ **Missing Dependencies**: Added `einops` to requirements, conditional `flash_attn` installation
+- ✅ **Deprecated Parameters**: Fixed all `torch_dtype` → `dtype` usage
+- ✅ **CPU Compatibility**: Automatic CPU-safe model revision selection
+- ✅ **Error Handling**: Comprehensive fallback mechanisms
+- ✅ **Security**: Updated to Gradio 4.44.0+ for security fixes
+## 🏗️ Architecture
 ```
 app/
 ├── app.py              # Main application entry point
+├── model_loader.py     # Environment-adaptive model loading
+├── interface.py        # Expert routing and Gradio interface
 ├── config/
 │   └── model_config.py # Environment detection and configuration
 └── requirements.txt    # Core dependencies (no flash-attn)
 scripts/
+└── select_revision.py  # CPU-safe model revision selector
+prestart.sh            # Environment setup and conditional dependencies
 ```
+## 🎯 How It Works
+1. **Environment Detection**: Automatically detects CPU vs GPU environment
+2. **Conditional Dependencies**: Installs `flash_attn` only when GPU is available
+3. **Model Configuration**: Uses optimal settings for each environment
+4. **Expert Routing**: Classifies queries and routes to appropriate expert
+5. **Graceful Fallbacks**: Works even when model loading fails
 ## 📊 Performance
+| Environment | Startup | Memory | Tokens/sec |
+|-------------|---------|--------|------------|
+| **CPU**     | 3-5 min | 8-12 GB | 2-5 |
+| **GPU**     | 2-3 min | 16-20 GB | 15-30 |
+## 🔍 Troubleshooting
+If you encounter issues:
+1. Check the logs for dependency installation
+2. Verify `prestart.sh` executed successfully
+3. Ensure all required packages are installed
+4. Try the fallback mode if model loading fails
 ---
+**Built with ❤️ for reliable, production-ready AI applications**

start.sh ADDED Viewed

	@@ -0,0 +1,20 @@

+#!/bin/bash
+set -euo pipefail
+echo "🚀 Starting Phi-3.5-MoE Expert Assistant..."
+echo "📅 $(date)"
+# Ensure we're in the right directory
+cd /home/user
+# Make prestart script executable
+chmod +x prestart.sh
+# Run prestart setup
+echo "🔧 Running prestart setup..."
+./prestart.sh
+# Start the application
+echo "🚀 Starting application..."
+cd /home/user
+python app/app.py