HonestAI

Paused

App Files Files Community

HonestAI / README.md

JatsTheAIGen

Phase 1: Remove HF API inference - Local models only

5787d0a about 1 month ago

preview code

raw

history blame contribute delete

14.4 kB

	---
	title: AI Research Assistant MVP
	emoji: 🧠
	colorFrom: blue
	colorTo: purple
	sdk: docker
	app_port: 7860
	pinned: false
	license: apache-2.0
	tags:
	- ai
	- chatbot
	- research
	- education
	- transformers
	models:
	- meta-llama/Llama-3.1-8B-Instruct
	- intfloat/e5-base-v2
	- Qwen/Qwen2.5-1.5B-Instruct
	datasets:
	- wikipedia
	- commoncrawl
	base_path: research-assistant
	hf_oauth: true
	hf_token: true
	disable_embedding: false
	duplicated_from: null
	extra_gated_prompt: null
	extra_gated_fields: {}
	gated: false
	public: true
	---

	# AI Research Assistant - MVP

	<div align="center">

	![HF Spaces](https://img.shields.io/badge/🤗-Hugging%20Face%20Spaces-blue)
	![Python](https://img.shields.io/badge/Python-3.9%2B-green)
	![Gradio](https://img.shields.io/badge/Interface-Gradio-FF6B6B)
	![NVIDIA T4](https://img.shields.io/badge/GPU-NVIDIA%20T4-blue)

	Academic-grade AI assistant with transparent reasoning and mobile-optimized interface

	[![Demo](https://img.shields.io/badge/🚀-Live%20Demo-9cf)](https://huggingface.co/spaces/your-username/research-assistant)
	[![Documentation](https://img.shields.io/badge/📚-Documentation-blue)](https://github.com/your-org/research-assistant/wiki)

	</div>

	## 🎯 Overview

	This MVP demonstrates an intelligent research assistant framework featuring transparent reasoning chains, specialized agent architecture, and mobile-first design. Built for Hugging Face Spaces with NVIDIA T4 GPU acceleration for local model inference.

	### Key Differentiators
	- 🔍 Transparent Reasoning: Watch the AI think step-by-step with Chain of Thought
	- 🧠 Specialized Agents: Multiple AI models working together for optimal performance
	- 📱 Mobile-First: Optimized for seamless mobile web experience
	- 🎓 Academic Focus: Designed for research and educational use cases

	## 📚 API Documentation

	Comprehensive API documentation is available: [API_DOCUMENTATION.md](API_DOCUMENTATION.md)

	The API provides REST endpoints for:
	- Chat interactions with AI assistant
	- Health checks
	- Context management
	- Session tracking

	Quick API Example:
	```python
	import requests

	response = requests.post(
	"https://huggingface.co/spaces/JatinAutonomousLabs/HonestAI/api/chat",
	json={
	"message": "What is machine learning?",
	"session_id": "my-session",
	"user_id": "user-123"
	}
	)
	data = response.json()
	print(data["message"])
	print(f"Performance: {data.get('performance', {})}")
	```

	## 🚀 Quick Start

	### Option 1: Use Our Demo
	Visit our live demo on Hugging Face Spaces:
	```bash
	https://huggingface.co/spaces/JatinAutonomousLabs/HonestAI
	```

	### Option 2: Deploy Your Own Instance

	#### Prerequisites
	- Hugging Face account with [write token](https://huggingface.co/settings/tokens)
	- Basic understanding of Hugging Face Spaces

	#### Deployment Steps

	1. Fork this space using the Hugging Face UI
	2. Add your HF token (optional, only needed for gated models):
	- Go to your Space → Settings → Repository secrets
	- Add `HF_TOKEN` with your Hugging Face token (only needed if using gated models)
	- Note: Local models are used for inference - HF_TOKEN is only for downloading models
	3. The space will auto-build (takes 5-10 minutes)

	#### Manual Build (Advanced)

	```bash
	# Clone the repository
	git clone https://huggingface.co/spaces/your-username/research-assistant
	cd research-assistant

	# Install dependencies
	pip install -r requirements.txt

	# Set up environment (optional - only needed for gated models)
	export HF_TOKEN="your_hugging_face_token_here" # Optional: only for downloading gated models

	# Launch the application (multiple options)
	python main.py # Full integration with error handling
	python launch.py # Simple launcher
	python app.py # UI-only mode
	```

	## 📁 Integration Structure

	The MVP now includes complete integration files for deployment:

	```
	├── main.py # 🎯 Main integration entry point
	├── launch.py # 🚀 Simple launcher for HF Spaces
	├── app.py # 📱 Mobile-optimized UI
	├── requirements.txt # 📦 Dependencies
	└── src/
	├── __init__.py # 📦 Package initialization
	├── database.py # 🗄️ SQLite database management
	├── event_handlers.py # 🔗 UI event integration
	├── config.py # ⚙️ Configuration
	├── llm_router.py # 🤖 LLM routing
	├── orchestrator_engine.py # 🎭 Request orchestration
	├── context_manager.py # 🧠 Context management
	├── mobile_handlers.py # 📱 Mobile UX handlers
	└── agents/
	├── __init__.py # 🤖 Agents package
	├── intent_agent.py # 🎯 Intent recognition
	├── synthesis_agent.py # ✨ Response synthesis
	└── safety_agent.py # 🛡️ Safety checking
	```

	### Key Features:
	- 🔄 Graceful Degradation: Falls back to mock mode if components fail
	- 📱 Mobile-First: Optimized for mobile devices and small screens
	- 🗄️ Database Ready: SQLite integration with session management
	- 🔗 Event Handling: Complete UI-to-backend integration
	- ⚡ Error Recovery: Robust error handling throughout

	## 🏗️ Architecture

	```
	┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
	│ Mobile Web │ ── │ ORCHESTRATOR │ ── │ AGENT SWARM │
	│ Interface │ │ (Core Engine) │ │ (5 Specialists)│
	└─────────────────┘ └──────────────────┘ └─────────────────┘
	│ │ │
	└─────────────────────────┼────────────────────────┘
	│
	┌─────────────────────────────┐
	│ PERSISTENCE LAYER │
	│ (SQLite + FAISS Lite) │
	└─────────────────────────────┘
	```

	### Core Components

	\| Component \| Purpose \| Technology \|
	\|-----------\|---------\|------------\|
	\| Orchestrator \| Main coordination engine \| Python + Async \|
	\| Intent Recognition \| Understand user goals \| RoBERTa-base + CoT \|
	\| Context Manager \| Session memory & recall \| FAISS + SQLite \|
	\| Response Synthesis \| Generate final answers \| Mistral-7B \|
	\| Safety Checker \| Content moderation \| Unbiased-Toxic-RoBERTa \|
	\| Research Agent \| Information gathering \| Web search + analysis \|

	## 💡 Usage Examples

	### Basic Research Query
	```
	User: "Explain quantum entanglement in simple terms"

	Assistant:
	1. 🤔 [Reasoning] Breaking down quantum physics concepts...
	2. 🔍 [Research] Gathering latest explanations...
	3. ✍️ [Synthesis] Creating simplified explanation...

	[Final Response]: Quantum entanglement is when two particles become linked...
	```

	### Technical Analysis
	```
	User: "Compare transformer models for text classification"

	Assistant:
	1. 🏷️ [Intent] Identifying technical comparison request
	2. 📊 [Analysis] Evaluating BERT vs RoBERTa vs DistilBERT
	3. 📈 [Synthesis] Creating comparison table with metrics...
	```

	## ⚙️ Configuration

	### Environment Variables

	```python
	# Required
	HF_TOKEN="your_hugging_face_token"

	# Optional
	MAX_WORKERS=4
	CACHE_TTL=3600
	DEFAULT_MODEL="meta-llama/Llama-3.1-8B-Instruct"
	EMBEDDING_MODEL="intfloat/e5-base-v2"
	CLASSIFICATION_MODEL="Qwen/Qwen2.5-1.5B-Instruct"
	HF_HOME="/tmp/huggingface" # Cache directory (auto-configured)
	LOG_LEVEL="INFO"
	```

	Cache Directory Management:
	- Automatically configured with secure fallback chain
	- Supports HF_HOME, TRANSFORMERS_CACHE, or user cache
	- Validates write permissions automatically
	- See `.env.example` for all available options

	### Model Configuration

	The system uses multiple specialized models optimized for T4 16GB GPU:

	\| Task \| Model \| Purpose \| Quantization \|
	\|------\|-------\|---------\|--------------\|
	\| Primary Reasoning \| `meta-llama/Llama-3.1-8B-Instruct` \| General responses \| 4-bit NF4 \|
	\| Embeddings \| `intfloat/e5-base-v2` \| Semantic search \| None (768-dim) \|
	\| Intent Classification \| `Qwen/Qwen2.5-1.5B-Instruct` \| User goal detection \| 4-bit NF4 \|
	\| Safety Checking \| `meta-llama/Llama-3.1-8B-Instruct` \| Content moderation \| 4-bit NF4 \|

	Performance Optimizations:
	- ✅ 4-bit quantization (NF4) for memory efficiency
	- ✅ Model preloading for faster responses
	- ✅ Connection pooling for API calls
	- ✅ Parallel agent processing

	## 📱 Mobile Optimization

	### Key Mobile Features
	- Touch-friendly interface (44px+ touch targets)
	- Progressive Web App capabilities
	- Offline functionality for cached sessions
	- Reduced data usage with optimized responses
	- Keyboard-aware layout adjustments

	### Supported Devices
	- ✅ Smartphones (iOS/Android)
	- ✅ Tablets
	- ✅ Desktop browsers
	- ✅ Screen readers (accessibility)

	## 🛠️ Development

	### Project Structure
	```
	research-assistant/
	├── app.py # Main Gradio application
	├── requirements.txt # Dependencies
	├── Dockerfile # Container configuration
	├── src/
	│ ├── orchestrator.py # Core orchestration engine
	│ ├── agents/ # Specialized agent modules
	│ ├── llm_router.py # Multi-model routing
	│ └── mobile_ux.py # Mobile optimizations
	├── tests/ # Test suites
	└── docs/ # Documentation
	```

	### Adding New Agents

	1. Create agent module in `src/agents/`
	2. Implement agent protocol:
	```python
	class YourNewAgent:
	async def execute(self, user_input: str, context: dict) -> dict:
	# Your agent logic here
	return {
	"result": processed_output,
	"confidence": 0.95,
	"metadata": {}
	}
	```

	3. Register agent in orchestrator configuration

	## 🧪 Testing

	### Run Test Suite
	```bash
	# Install test dependencies
	pip install -r requirements.txt

	# Run all tests
	pytest tests/ -v

	# Run specific test categories
	pytest tests/test_agents.py -v
	pytest tests/test_mobile_ux.py -v
	```

	### Test Coverage
	- ✅ Agent functionality
	- ✅ Mobile UX components
	- ✅ LLM routing logic
	- ✅ Error handling
	- ✅ Performance benchmarks

	## 🚨 Troubleshooting

	### Common Build Issues

	\| Issue \| Solution \|
	\|-------\|----------\|
	\| HF_TOKEN not found \| Optional - only needed for gated model access \|
	\| Local models unavailable \| Check transformers/torch installation \|
	\| Build timeout \| Reduce model sizes in requirements \|
	\| Memory errors \| Check GPU memory usage, optimize model loading \|
	\| Import errors \| Check Python version (3.9+) \|

	### Performance Optimization

	1. Enable caching in context manager
	2. Use smaller models for initial deployment
	3. Implement lazy loading for mobile users
	4. Monitor memory usage with built-in tools

	### Debug Mode

	Enable detailed logging:
	```python
	import logging
	logging.basicConfig(level=logging.DEBUG)
	```

	## 📊 Performance Metrics

	The API now includes comprehensive performance metrics in every response:

	```json
	{
	"performance": {
	"processing_time": 1230.5, // milliseconds
	"tokens_used": 456,
	"agents_used": 4,
	"confidence_score": 85.2, // percentage
	"agent_contributions": [
	{"agent": "Intent", "percentage": 25.0},
	{"agent": "Synthesis", "percentage": 40.0},
	{"agent": "Safety", "percentage": 15.0},
	{"agent": "Skills", "percentage": 20.0}
	],
	"safety_score": 85.0,
	"latency_seconds": 1.230,
	"timestamp": "2024-01-15T10:30:45.123456"
	}
	}
	```

	\| Metric \| Target \| Current \|
	\|--------\|---------\|---------\|
	\| Response Time \| <10s \| ~7s \|
	\| Cache Hit Rate \| >60% \| ~65% \|
	\| Mobile UX Score \| >80/100 \| 85/100 \|
	\| Error Rate \| <5% \| ~3% \|
	\| Performance Tracking \| ✅ \| ✅ Implemented \|

	## 🔮 Roadmap

	### Phase 1 (Current - MVP)
	- ✅ Basic agent orchestration
	- ✅ Mobile-optimized interface
	- ✅ Multi-model routing
	- ✅ Transparent reasoning display
	- ✅ Performance metrics tracking
	- ✅ Enhanced configuration management
	- ✅ 4-bit quantization for T4 GPU
	- ✅ Model preloading and optimization

	### Phase 2 (Next 3 months)
	- 🚧 Advanced research capabilities
	- 🚧 Plugin system for tools
	- 🚧 Enhanced mobile PWA features
	- 🚧 Multi-language support

	### Phase 3 (Future)
	- 🔮 Autonomous agent swarms
	- 🔮 Voice interface integration
	- 🔮 Enterprise features
	- 🔮 Advanced analytics

	## 👥 Contributing

	We welcome contributions! Please see:

	1. [Contributing Guidelines](docs/CONTRIBUTING.md)
	2. [Code of Conduct](docs/CODE_OF_CONDUCT.md)
	3. [Development Setup](docs/DEVELOPMENT.md)

	### Quick Contribution Steps
	```bash
	# 1. Fork the repository
	# 2. Create feature branch
	git checkout -b feature/amazing-feature

	# 3. Commit changes
	git commit -m "Add amazing feature"

	# 4. Push to branch
	git push origin feature/amazing-feature

	# 5. Open Pull Request
	```

	## 📄 Citation

	If you use this framework in your research, please cite:

	```bibtex
	@software{research_assistant_mvp,
	title = {AI Research Assistant - MVP},
	author = {Your Name},
	year = {2024},
	url = {https://huggingface.co/spaces/your-username/research-assistant}
	}
	```

	## 📜 License

	This project is licensed under the Apache 2.0 License - see the [LICENSE](LICENSE) file for details.

	## 🙏 Acknowledgments

	- [Hugging Face](https://huggingface.co) for the infrastructure
	- [Gradio](https://gradio.app) for the web framework
	- Model contributors from the HF community
	- Early testers and feedback providers

	---

	<div align="center">

	Need help?
	- [Open an Issue](https://github.com/your-org/research-assistant/issues)
	- [Join our Discord](https://discord.gg/your-discord)
	- [Email Support](mailto:[email protected])

	Built with ❤️ for the research community

	</div>