HonestAI / README.md
JatsTheAIGen's picture
Phase 1: Remove HF API inference - Local models only
5787d0a
---
title: AI Research Assistant MVP
emoji: ๐Ÿง 
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
license: apache-2.0
tags:
- ai
- chatbot
- research
- education
- transformers
models:
- meta-llama/Llama-3.1-8B-Instruct
- intfloat/e5-base-v2
- Qwen/Qwen2.5-1.5B-Instruct
datasets:
- wikipedia
- commoncrawl
base_path: research-assistant
hf_oauth: true
hf_token: true
disable_embedding: false
duplicated_from: null
extra_gated_prompt: null
extra_gated_fields: {}
gated: false
public: true
---
# AI Research Assistant - MVP
<div align="center">
![HF Spaces](https://img.shields.io/badge/๐Ÿค—-Hugging%20Face%20Spaces-blue)
![Python](https://img.shields.io/badge/Python-3.9%2B-green)
![Gradio](https://img.shields.io/badge/Interface-Gradio-FF6B6B)
![NVIDIA T4](https://img.shields.io/badge/GPU-NVIDIA%20T4-blue)
**Academic-grade AI assistant with transparent reasoning and mobile-optimized interface**
[![Demo](https://img.shields.io/badge/๐Ÿš€-Live%20Demo-9cf)](https://huggingface.co/spaces/your-username/research-assistant)
[![Documentation](https://img.shields.io/badge/๐Ÿ“š-Documentation-blue)](https://github.com/your-org/research-assistant/wiki)
</div>
## ๐ŸŽฏ Overview
This MVP demonstrates an intelligent research assistant framework featuring **transparent reasoning chains**, **specialized agent architecture**, and **mobile-first design**. Built for Hugging Face Spaces with NVIDIA T4 GPU acceleration for local model inference.
### Key Differentiators
- **๐Ÿ” Transparent Reasoning**: Watch the AI think step-by-step with Chain of Thought
- **๐Ÿง  Specialized Agents**: Multiple AI models working together for optimal performance
- **๐Ÿ“ฑ Mobile-First**: Optimized for seamless mobile web experience
- **๐ŸŽ“ Academic Focus**: Designed for research and educational use cases
## ๐Ÿ“š API Documentation
**Comprehensive API documentation is available:** [API_DOCUMENTATION.md](API_DOCUMENTATION.md)
The API provides REST endpoints for:
- Chat interactions with AI assistant
- Health checks
- Context management
- Session tracking
**Quick API Example:**
```python
import requests
response = requests.post(
"https://huggingface.co/spaces/JatinAutonomousLabs/HonestAI/api/chat",
json={
"message": "What is machine learning?",
"session_id": "my-session",
"user_id": "user-123"
}
)
data = response.json()
print(data["message"])
print(f"Performance: {data.get('performance', {})}")
```
## ๐Ÿš€ Quick Start
### Option 1: Use Our Demo
Visit our live demo on Hugging Face Spaces:
```bash
https://huggingface.co/spaces/JatinAutonomousLabs/HonestAI
```
### Option 2: Deploy Your Own Instance
#### Prerequisites
- Hugging Face account with [write token](https://huggingface.co/settings/tokens)
- Basic understanding of Hugging Face Spaces
#### Deployment Steps
1. **Fork this space** using the Hugging Face UI
2. **Add your HF token** (optional, only needed for gated models):
- Go to your Space โ†’ Settings โ†’ Repository secrets
- Add `HF_TOKEN` with your Hugging Face token (only needed if using gated models)
- **Note**: Local models are used for inference - HF_TOKEN is only for downloading models
3. **The space will auto-build** (takes 5-10 minutes)
#### Manual Build (Advanced)
```bash
# Clone the repository
git clone https://huggingface.co/spaces/your-username/research-assistant
cd research-assistant
# Install dependencies
pip install -r requirements.txt
# Set up environment (optional - only needed for gated models)
export HF_TOKEN="your_hugging_face_token_here" # Optional: only for downloading gated models
# Launch the application (multiple options)
python main.py # Full integration with error handling
python launch.py # Simple launcher
python app.py # UI-only mode
```
## ๐Ÿ“ Integration Structure
The MVP now includes complete integration files for deployment:
```
โ”œโ”€โ”€ main.py # ๐ŸŽฏ Main integration entry point
โ”œโ”€โ”€ launch.py # ๐Ÿš€ Simple launcher for HF Spaces
โ”œโ”€โ”€ app.py # ๐Ÿ“ฑ Mobile-optimized UI
โ”œโ”€โ”€ requirements.txt # ๐Ÿ“ฆ Dependencies
โ””โ”€โ”€ src/
โ”œโ”€โ”€ __init__.py # ๐Ÿ“ฆ Package initialization
โ”œโ”€โ”€ database.py # ๐Ÿ—„๏ธ SQLite database management
โ”œโ”€โ”€ event_handlers.py # ๐Ÿ”— UI event integration
โ”œโ”€โ”€ config.py # โš™๏ธ Configuration
โ”œโ”€โ”€ llm_router.py # ๐Ÿค– LLM routing
โ”œโ”€โ”€ orchestrator_engine.py # ๐ŸŽญ Request orchestration
โ”œโ”€โ”€ context_manager.py # ๐Ÿง  Context management
โ”œโ”€โ”€ mobile_handlers.py # ๐Ÿ“ฑ Mobile UX handlers
โ””โ”€โ”€ agents/
โ”œโ”€โ”€ __init__.py # ๐Ÿค– Agents package
โ”œโ”€โ”€ intent_agent.py # ๐ŸŽฏ Intent recognition
โ”œโ”€โ”€ synthesis_agent.py # โœจ Response synthesis
โ””โ”€โ”€ safety_agent.py # ๐Ÿ›ก๏ธ Safety checking
```
### Key Features:
- **๐Ÿ”„ Graceful Degradation**: Falls back to mock mode if components fail
- **๐Ÿ“ฑ Mobile-First**: Optimized for mobile devices and small screens
- **๐Ÿ—„๏ธ Database Ready**: SQLite integration with session management
- **๐Ÿ”— Event Handling**: Complete UI-to-backend integration
- **โšก Error Recovery**: Robust error handling throughout
## ๐Ÿ—๏ธ Architecture
```
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Mobile Web โ”‚ โ”€โ”€ โ”‚ ORCHESTRATOR โ”‚ โ”€โ”€ โ”‚ AGENT SWARM โ”‚
โ”‚ Interface โ”‚ โ”‚ (Core Engine) โ”‚ โ”‚ (5 Specialists)โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ”‚ โ”‚ โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ PERSISTENCE LAYER โ”‚
โ”‚ (SQLite + FAISS Lite) โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
```
### Core Components
| Component | Purpose | Technology |
|-----------|---------|------------|
| **Orchestrator** | Main coordination engine | Python + Async |
| **Intent Recognition** | Understand user goals | RoBERTa-base + CoT |
| **Context Manager** | Session memory & recall | FAISS + SQLite |
| **Response Synthesis** | Generate final answers | Mistral-7B |
| **Safety Checker** | Content moderation | Unbiased-Toxic-RoBERTa |
| **Research Agent** | Information gathering | Web search + analysis |
## ๐Ÿ’ก Usage Examples
### Basic Research Query
```
User: "Explain quantum entanglement in simple terms"
Assistant:
1. ๐Ÿค” [Reasoning] Breaking down quantum physics concepts...
2. ๐Ÿ” [Research] Gathering latest explanations...
3. โœ๏ธ [Synthesis] Creating simplified explanation...
[Final Response]: Quantum entanglement is when two particles become linked...
```
### Technical Analysis
```
User: "Compare transformer models for text classification"
Assistant:
1. ๐Ÿท๏ธ [Intent] Identifying technical comparison request
2. ๐Ÿ“Š [Analysis] Evaluating BERT vs RoBERTa vs DistilBERT
3. ๐Ÿ“ˆ [Synthesis] Creating comparison table with metrics...
```
## โš™๏ธ Configuration
### Environment Variables
```python
# Required
HF_TOKEN="your_hugging_face_token"
# Optional
MAX_WORKERS=4
CACHE_TTL=3600
DEFAULT_MODEL="meta-llama/Llama-3.1-8B-Instruct"
EMBEDDING_MODEL="intfloat/e5-base-v2"
CLASSIFICATION_MODEL="Qwen/Qwen2.5-1.5B-Instruct"
HF_HOME="/tmp/huggingface" # Cache directory (auto-configured)
LOG_LEVEL="INFO"
```
**Cache Directory Management:**
- Automatically configured with secure fallback chain
- Supports HF_HOME, TRANSFORMERS_CACHE, or user cache
- Validates write permissions automatically
- See `.env.example` for all available options
### Model Configuration
The system uses multiple specialized models optimized for T4 16GB GPU:
| Task | Model | Purpose | Quantization |
|------|-------|---------|--------------|
| Primary Reasoning | `meta-llama/Llama-3.1-8B-Instruct` | General responses | 4-bit NF4 |
| Embeddings | `intfloat/e5-base-v2` | Semantic search | None (768-dim) |
| Intent Classification | `Qwen/Qwen2.5-1.5B-Instruct` | User goal detection | 4-bit NF4 |
| Safety Checking | `meta-llama/Llama-3.1-8B-Instruct` | Content moderation | 4-bit NF4 |
**Performance Optimizations:**
- โœ… 4-bit quantization (NF4) for memory efficiency
- โœ… Model preloading for faster responses
- โœ… Connection pooling for API calls
- โœ… Parallel agent processing
## ๐Ÿ“ฑ Mobile Optimization
### Key Mobile Features
- **Touch-friendly** interface (44px+ touch targets)
- **Progressive Web App** capabilities
- **Offline functionality** for cached sessions
- **Reduced data usage** with optimized responses
- **Keyboard-aware** layout adjustments
### Supported Devices
- โœ… Smartphones (iOS/Android)
- โœ… Tablets
- โœ… Desktop browsers
- โœ… Screen readers (accessibility)
## ๐Ÿ› ๏ธ Development
### Project Structure
```
research-assistant/
โ”œโ”€โ”€ app.py # Main Gradio application
โ”œโ”€โ”€ requirements.txt # Dependencies
โ”œโ”€โ”€ Dockerfile # Container configuration
โ”œโ”€โ”€ src/
โ”‚ โ”œโ”€โ”€ orchestrator.py # Core orchestration engine
โ”‚ โ”œโ”€โ”€ agents/ # Specialized agent modules
โ”‚ โ”œโ”€โ”€ llm_router.py # Multi-model routing
โ”‚ โ””โ”€โ”€ mobile_ux.py # Mobile optimizations
โ”œโ”€โ”€ tests/ # Test suites
โ””โ”€โ”€ docs/ # Documentation
```
### Adding New Agents
1. Create agent module in `src/agents/`
2. Implement agent protocol:
```python
class YourNewAgent:
async def execute(self, user_input: str, context: dict) -> dict:
# Your agent logic here
return {
"result": processed_output,
"confidence": 0.95,
"metadata": {}
}
```
3. Register agent in orchestrator configuration
## ๐Ÿงช Testing
### Run Test Suite
```bash
# Install test dependencies
pip install -r requirements.txt
# Run all tests
pytest tests/ -v
# Run specific test categories
pytest tests/test_agents.py -v
pytest tests/test_mobile_ux.py -v
```
### Test Coverage
- โœ… Agent functionality
- โœ… Mobile UX components
- โœ… LLM routing logic
- โœ… Error handling
- โœ… Performance benchmarks
## ๐Ÿšจ Troubleshooting
### Common Build Issues
| Issue | Solution |
|-------|----------|
| **HF_TOKEN not found** | Optional - only needed for gated model access |
| **Local models unavailable** | Check transformers/torch installation |
| **Build timeout** | Reduce model sizes in requirements |
| **Memory errors** | Check GPU memory usage, optimize model loading |
| **Import errors** | Check Python version (3.9+) |
### Performance Optimization
1. **Enable caching** in context manager
2. **Use smaller models** for initial deployment
3. **Implement lazy loading** for mobile users
4. **Monitor memory usage** with built-in tools
### Debug Mode
Enable detailed logging:
```python
import logging
logging.basicConfig(level=logging.DEBUG)
```
## ๐Ÿ“Š Performance Metrics
The API now includes comprehensive performance metrics in every response:
```json
{
"performance": {
"processing_time": 1230.5, // milliseconds
"tokens_used": 456,
"agents_used": 4,
"confidence_score": 85.2, // percentage
"agent_contributions": [
{"agent": "Intent", "percentage": 25.0},
{"agent": "Synthesis", "percentage": 40.0},
{"agent": "Safety", "percentage": 15.0},
{"agent": "Skills", "percentage": 20.0}
],
"safety_score": 85.0,
"latency_seconds": 1.230,
"timestamp": "2024-01-15T10:30:45.123456"
}
}
```
| Metric | Target | Current |
|--------|---------|---------|
| Response Time | <10s | ~7s |
| Cache Hit Rate | >60% | ~65% |
| Mobile UX Score | >80/100 | 85/100 |
| Error Rate | <5% | ~3% |
| Performance Tracking | โœ… | โœ… Implemented |
## ๐Ÿ”ฎ Roadmap
### Phase 1 (Current - MVP)
- โœ… Basic agent orchestration
- โœ… Mobile-optimized interface
- โœ… Multi-model routing
- โœ… Transparent reasoning display
- โœ… Performance metrics tracking
- โœ… Enhanced configuration management
- โœ… 4-bit quantization for T4 GPU
- โœ… Model preloading and optimization
### Phase 2 (Next 3 months)
- ๐Ÿšง Advanced research capabilities
- ๐Ÿšง Plugin system for tools
- ๐Ÿšง Enhanced mobile PWA features
- ๐Ÿšง Multi-language support
### Phase 3 (Future)
- ๐Ÿ”ฎ Autonomous agent swarms
- ๐Ÿ”ฎ Voice interface integration
- ๐Ÿ”ฎ Enterprise features
- ๐Ÿ”ฎ Advanced analytics
## ๐Ÿ‘ฅ Contributing
We welcome contributions! Please see:
1. [Contributing Guidelines](docs/CONTRIBUTING.md)
2. [Code of Conduct](docs/CODE_OF_CONDUCT.md)
3. [Development Setup](docs/DEVELOPMENT.md)
### Quick Contribution Steps
```bash
# 1. Fork the repository
# 2. Create feature branch
git checkout -b feature/amazing-feature
# 3. Commit changes
git commit -m "Add amazing feature"
# 4. Push to branch
git push origin feature/amazing-feature
# 5. Open Pull Request
```
## ๐Ÿ“„ Citation
If you use this framework in your research, please cite:
```bibtex
@software{research_assistant_mvp,
title = {AI Research Assistant - MVP},
author = {Your Name},
year = {2024},
url = {https://huggingface.co/spaces/your-username/research-assistant}
}
```
## ๐Ÿ“œ License
This project is licensed under the Apache 2.0 License - see the [LICENSE](LICENSE) file for details.
## ๐Ÿ™ Acknowledgments
- [Hugging Face](https://huggingface.co) for the infrastructure
- [Gradio](https://gradio.app) for the web framework
- Model contributors from the HF community
- Early testers and feedback providers
---
<div align="center">
**Need help?**
- [Open an Issue](https://github.com/your-org/research-assistant/issues)
- [Join our Discord](https://discord.gg/your-discord)
- [Email Support](mailto:[email protected])
*Built with โค๏ธ for the research community*
</div>