Spaces:

ianshank
/

phi35-moe-demo

Sleeping

App Files Files Community

phi35-moe-demo / README.md

ianshank

🚀 Final fix v20250913_220639: Comprehensive solution for dependency and configuration issues

3eeba36 verified 3 months ago

preview code

raw

history blame contribute delete

2.65 kB

	---
	title: Phi-3.5-MoE Expert Assistant
	emoji: 🤖
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	sdk_version: 4.44.0
	app_file: app.py
	entrypoint: start.sh
	startup_duration_timeout: 600
	pinned: false
	license: mit
	short_description: AI assistant with expert routing and CPU/GPU support
	models:
	- microsoft/Phi-3.5-MoE-instruct
	---

	# 🤖 Phi-3.5-MoE Expert Assistant

	A robust, production-ready AI assistant powered by Microsoft's Phi-3.5-MoE model with intelligent expert routing and comprehensive CPU/GPU environment support.

	## 🚀 Key Features

	- 🧠 Expert Routing: Automatically routes queries to specialized experts (Code, Math, Reasoning, Multilingual, General)
	- 🔧 Environment Adaptive: Works seamlessly on both CPU and GPU environments
	- 🛡️ Robust Dependency Management: Conditional installation of dependencies based on environment
	- 📦 Fault Tolerance: Handles missing dependencies with fallback mechanisms
	- ⚡ Performance Optimized: Environment-specific optimizations for best performance

	## 🔧 Recent Fixes

	- ✅ Missing Dependencies: Added `einops` to requirements, conditional `flash_attn` installation
	- ✅ Deprecated Parameters: Fixed all `torch_dtype` → `dtype` usage
	- ✅ CPU Compatibility: Automatic CPU-safe model revision selection
	- ✅ Error Handling: Comprehensive fallback mechanisms
	- ✅ Security: Updated to Gradio 4.44.0+ for security fixes

	## 🏗️ Architecture

	```
	app.py # Main application entry point
	preinstall.py # Pre-installation script for dependencies
	model_patch.py # Patch for handling missing dependencies
	start.sh # Startup script
	requirements.txt # Core dependencies
	```

	## 🎯 How It Works

	1. Environment Detection: Automatically detects CPU vs GPU environment
	2. Dependency Management: Installs required dependencies based on environment
	3. Model Configuration: Uses optimal settings for each environment
	4. Expert Routing: Classifies queries and routes to appropriate expert
	5. Graceful Fallbacks: Works even when dependencies are missing

	## 📊 Performance

	\| Environment \| Startup \| Memory \| Tokens/sec \|
	\|-------------\|---------\|--------\|------------\|
	\| CPU \| 3-5 min \| 8-12 GB \| 2-5 \|
	\| GPU \| 2-3 min \| 16-20 GB \| 15-30 \|

	## 🔍 Troubleshooting

	If you encounter issues:
	1. Check the logs for dependency installation
	2. Verify the pre-installation script executed successfully
	3. Ensure all required packages are installed
	4. Try the fallback mode if model loading fails

	---

	Built with ❤️ for reliable, production-ready AI applications