HonestAI

Paused

File size: 14,418 Bytes

---
title: AI Research Assistant MVP
emoji: 🧠
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
license: apache-2.0
tags:
- ai
- chatbot
- research
- education
- transformers
models:
- meta-llama/Llama-3.1-8B-Instruct
- intfloat/e5-base-v2
- Qwen/Qwen2.5-1.5B-Instruct
datasets:
- wikipedia
- commoncrawl
base_path: research-assistant
hf_oauth: true
hf_token: true
disable_embedding: false
duplicated_from: null
extra_gated_prompt: null
extra_gated_fields: {}
gated: false
public: true
---

# AI Research Assistant - MVP

<div align="center">

![HF Spaces](https://img.shields.io/badge/🤗-Hugging%20Face%20Spaces-blue)
![Python](https://img.shields.io/badge/Python-3.9%2B-green)
![Gradio](https://img.shields.io/badge/Interface-Gradio-FF6B6B)
![NVIDIA T4](https://img.shields.io/badge/GPU-NVIDIA%20T4-blue)

**Academic-grade AI assistant with transparent reasoning and mobile-optimized interface**

[![Demo](https://img.shields.io/badge/🚀-Live%20Demo-9cf)](https://huggingface.co/spaces/your-username/research-assistant)
[![Documentation](https://img.shields.io/badge/📚-Documentation-blue)](https://github.com/your-org/research-assistant/wiki)

</div>

## 🎯 Overview

This MVP demonstrates an intelligent research assistant framework featuring **transparent reasoning chains**, **specialized agent architecture**, and **mobile-first design**. Built for Hugging Face Spaces with NVIDIA T4 GPU acceleration for local model inference.

### Key Differentiators
- **🔍 Transparent Reasoning**: Watch the AI think step-by-step with Chain of Thought
- **🧠 Specialized Agents**: Multiple AI models working together for optimal performance  
- **📱 Mobile-First**: Optimized for seamless mobile web experience
- **🎓 Academic Focus**: Designed for research and educational use cases

## 📚 API Documentation

**Comprehensive API documentation is available:** [API_DOCUMENTATION.md](API_DOCUMENTATION.md)

The API provides REST endpoints for:
- Chat interactions with AI assistant
- Health checks
- Context management
- Session tracking

**Quick API Example:**
```python
import requests

response = requests.post(
    "https://huggingface.co/spaces/JatinAutonomousLabs/HonestAI/api/chat",
    json={
        "message": "What is machine learning?",
        "session_id": "my-session",
        "user_id": "user-123"
    }
)
data = response.json()
print(data["message"])
print(f"Performance: {data.get('performance', {})}")
```

## 🚀 Quick Start

### Option 1: Use Our Demo
Visit our live demo on Hugging Face Spaces:
```bash
https://huggingface.co/spaces/JatinAutonomousLabs/HonestAI
```

### Option 2: Deploy Your Own Instance

#### Prerequisites
- Hugging Face account with [write token](https://huggingface.co/settings/tokens)
- Basic understanding of Hugging Face Spaces

#### Deployment Steps

1. **Fork this space** using the Hugging Face UI
2. **Add your HF token** (optional, only needed for gated models):
   - Go to your Space → Settings → Repository secrets
   - Add `HF_TOKEN` with your Hugging Face token (only needed if using gated models)
   - **Note**: Local models are used for inference - HF_TOKEN is only for downloading models
3. **The space will auto-build** (takes 5-10 minutes)

#### Manual Build (Advanced)

```bash
# Clone the repository
git clone https://huggingface.co/spaces/your-username/research-assistant
cd research-assistant

# Install dependencies
pip install -r requirements.txt

# Set up environment (optional - only needed for gated models)
export HF_TOKEN="your_hugging_face_token_here"  # Optional: only for downloading gated models

# Launch the application (multiple options)
python main.py          # Full integration with error handling
python launch.py        # Simple launcher
python app.py           # UI-only mode
```

## 📁 Integration Structure

The MVP now includes complete integration files for deployment:

```
├── main.py                    # 🎯 Main integration entry point
├── launch.py                  # 🚀 Simple launcher for HF Spaces
├── app.py                     # 📱 Mobile-optimized UI
├── requirements.txt           # 📦 Dependencies
└── src/
    ├── __init__.py           # 📦 Package initialization
    ├── database.py           # 🗄️ SQLite database management
    ├── event_handlers.py     # 🔗 UI event integration
    ├── config.py             # ⚙️ Configuration
    ├── llm_router.py         # 🤖 LLM routing
    ├── orchestrator_engine.py # 🎭 Request orchestration
    ├── context_manager.py    # 🧠 Context management
    ├── mobile_handlers.py    # 📱 Mobile UX handlers
    └── agents/
        ├── __init__.py       # 🤖 Agents package
        ├── intent_agent.py   # 🎯 Intent recognition
        ├── synthesis_agent.py # ✨ Response synthesis
        └── safety_agent.py   # 🛡️ Safety checking
```

### Key Features:
- **🔄 Graceful Degradation**: Falls back to mock mode if components fail
- **📱 Mobile-First**: Optimized for mobile devices and small screens
- **🗄️ Database Ready**: SQLite integration with session management
- **🔗 Event Handling**: Complete UI-to-backend integration
- **⚡ Error Recovery**: Robust error handling throughout

## 🏗️ Architecture

```
┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   Mobile Web    │ ── │   ORCHESTRATOR   │ ── │   AGENT SWARM   │
│   Interface     │    │   (Core Engine)  │    │   (5 Specialists)│
└─────────────────┘    └──────────────────┘    └─────────────────┘
         │                        │                        │
         └─────────────────────────┼────────────────────────┘
                                   │
                    ┌─────────────────────────────┐
                    │   PERSISTENCE LAYER         │
                    │   (SQLite + FAISS Lite)    │
                    └─────────────────────────────┘
```

### Core Components

| Component | Purpose | Technology |
|-----------|---------|------------|
| **Orchestrator** | Main coordination engine | Python + Async |
| **Intent Recognition** | Understand user goals | RoBERTa-base + CoT |
| **Context Manager** | Session memory & recall | FAISS + SQLite |
| **Response Synthesis** | Generate final answers | Mistral-7B |
| **Safety Checker** | Content moderation | Unbiased-Toxic-RoBERTa |
| **Research Agent** | Information gathering | Web search + analysis |

## 💡 Usage Examples

### Basic Research Query
```
User: "Explain quantum entanglement in simple terms"

Assistant: 
1. 🤔 [Reasoning] Breaking down quantum physics concepts...
2. 🔍 [Research] Gathering latest explanations...
3. ✍️ [Synthesis] Creating simplified explanation...

[Final Response]: Quantum entanglement is when two particles become linked...
```

### Technical Analysis
```
User: "Compare transformer models for text classification"

Assistant:
1. 🏷️ [Intent] Identifying technical comparison request
2. 📊 [Analysis] Evaluating BERT vs RoBERTa vs DistilBERT
3. 📈 [Synthesis] Creating comparison table with metrics...
```

## ⚙️ Configuration

### Environment Variables

```python
# Required
HF_TOKEN="your_hugging_face_token"

# Optional
MAX_WORKERS=4
CACHE_TTL=3600
DEFAULT_MODEL="meta-llama/Llama-3.1-8B-Instruct"
EMBEDDING_MODEL="intfloat/e5-base-v2"
CLASSIFICATION_MODEL="Qwen/Qwen2.5-1.5B-Instruct"
HF_HOME="/tmp/huggingface"  # Cache directory (auto-configured)
LOG_LEVEL="INFO"
```

**Cache Directory Management:**
- Automatically configured with secure fallback chain
- Supports HF_HOME, TRANSFORMERS_CACHE, or user cache
- Validates write permissions automatically
- See `.env.example` for all available options

### Model Configuration

The system uses multiple specialized models optimized for T4 16GB GPU:

| Task | Model | Purpose | Quantization |
|------|-------|---------|--------------|
| Primary Reasoning | `meta-llama/Llama-3.1-8B-Instruct` | General responses | 4-bit NF4 |
| Embeddings | `intfloat/e5-base-v2` | Semantic search | None (768-dim) |
| Intent Classification | `Qwen/Qwen2.5-1.5B-Instruct` | User goal detection | 4-bit NF4 |
| Safety Checking | `meta-llama/Llama-3.1-8B-Instruct` | Content moderation | 4-bit NF4 |

**Performance Optimizations:**
- ✅ 4-bit quantization (NF4) for memory efficiency
- ✅ Model preloading for faster responses
- ✅ Connection pooling for API calls
- ✅ Parallel agent processing

## 📱 Mobile Optimization

### Key Mobile Features
- **Touch-friendly** interface (44px+ touch targets)
- **Progressive Web App** capabilities
- **Offline functionality** for cached sessions
- **Reduced data usage** with optimized responses
- **Keyboard-aware** layout adjustments

### Supported Devices
- ✅ Smartphones (iOS/Android)
- ✅ Tablets
- ✅ Desktop browsers
- ✅ Screen readers (accessibility)

## 🛠️ Development

### Project Structure
```
research-assistant/
├── app.py                 # Main Gradio application
├── requirements.txt       # Dependencies
├── Dockerfile            # Container configuration
├── src/
│   ├── orchestrator.py   # Core orchestration engine
│   ├── agents/          # Specialized agent modules
│   ├── llm_router.py    # Multi-model routing
│   └── mobile_ux.py     # Mobile optimizations
├── tests/               # Test suites
└── docs/               # Documentation
```

### Adding New Agents

1. Create agent module in `src/agents/`
2. Implement agent protocol:
```python
class YourNewAgent:
    async def execute(self, user_input: str, context: dict) -> dict:
        # Your agent logic here
        return {
            "result": processed_output,
            "confidence": 0.95,
            "metadata": {}
        }
```

3. Register agent in orchestrator configuration

## 🧪 Testing

### Run Test Suite
```bash
# Install test dependencies
pip install -r requirements.txt

# Run all tests
pytest tests/ -v

# Run specific test categories
pytest tests/test_agents.py -v
pytest tests/test_mobile_ux.py -v
```

### Test Coverage
- ✅ Agent functionality
- ✅ Mobile UX components  
- ✅ LLM routing logic
- ✅ Error handling
- ✅ Performance benchmarks

## 🚨 Troubleshooting

### Common Build Issues

| Issue | Solution |
|-------|----------|
| **HF_TOKEN not found** | Optional - only needed for gated model access |
| **Local models unavailable** | Check transformers/torch installation |
| **Build timeout** | Reduce model sizes in requirements |
| **Memory errors** | Check GPU memory usage, optimize model loading |
| **Import errors** | Check Python version (3.9+) |

### Performance Optimization

1. **Enable caching** in context manager
2. **Use smaller models** for initial deployment
3. **Implement lazy loading** for mobile users
4. **Monitor memory usage** with built-in tools

### Debug Mode

Enable detailed logging:
```python
import logging
logging.basicConfig(level=logging.DEBUG)
```

## 📊 Performance Metrics

The API now includes comprehensive performance metrics in every response:

```json
{
  "performance": {
    "processing_time": 1230.5,      // milliseconds
    "tokens_used": 456,
    "agents_used": 4,
    "confidence_score": 85.2,        // percentage
    "agent_contributions": [
      {"agent": "Intent", "percentage": 25.0},
      {"agent": "Synthesis", "percentage": 40.0},
      {"agent": "Safety", "percentage": 15.0},
      {"agent": "Skills", "percentage": 20.0}
    ],
    "safety_score": 85.0,
    "latency_seconds": 1.230,
    "timestamp": "2024-01-15T10:30:45.123456"
  }
}
```

| Metric | Target | Current |
|--------|---------|---------|
| Response Time | <10s | ~7s |
| Cache Hit Rate | >60% | ~65% |
| Mobile UX Score | >80/100 | 85/100 |
| Error Rate | <5% | ~3% |
| Performance Tracking | ✅ | ✅ Implemented |

## 🔮 Roadmap

### Phase 1 (Current - MVP)
- ✅ Basic agent orchestration
- ✅ Mobile-optimized interface  
- ✅ Multi-model routing
- ✅ Transparent reasoning display
- ✅ Performance metrics tracking
- ✅ Enhanced configuration management
- ✅ 4-bit quantization for T4 GPU
- ✅ Model preloading and optimization

### Phase 2 (Next 3 months)
- 🚧 Advanced research capabilities
- 🚧 Plugin system for tools
- 🚧 Enhanced mobile PWA features
- 🚧 Multi-language support

### Phase 3 (Future)
- 🔮 Autonomous agent swarms
- 🔮 Voice interface integration
- 🔮 Enterprise features
- 🔮 Advanced analytics

## 👥 Contributing

We welcome contributions! Please see:

1. [Contributing Guidelines](docs/CONTRIBUTING.md)
2. [Code of Conduct](docs/CODE_OF_CONDUCT.md)
3. [Development Setup](docs/DEVELOPMENT.md)

### Quick Contribution Steps
```bash
# 1. Fork the repository
# 2. Create feature branch
git checkout -b feature/amazing-feature

# 3. Commit changes
git commit -m "Add amazing feature"

# 4. Push to branch  
git push origin feature/amazing-feature

# 5. Open Pull Request
```

## 📄 Citation

If you use this framework in your research, please cite:

```bibtex
@software{research_assistant_mvp,
  title = {AI Research Assistant - MVP},
  author = {Your Name},
  year = {2024},
  url = {https://huggingface.co/spaces/your-username/research-assistant}
}
```

## 📜 License

This project is licensed under the Apache 2.0 License - see the [LICENSE](LICENSE) file for details.

## 🙏 Acknowledgments

- [Hugging Face](https://huggingface.co) for the infrastructure
- [Gradio](https://gradio.app) for the web framework
- Model contributors from the HF community
- Early testers and feedback providers

---

<div align="center">

**Need help?** 
- [Open an Issue](https://github.com/your-org/research-assistant/issues)
- [Join our Discord](https://discord.gg/your-discord)
- [Email Support](mailto:[email protected])

*Built with ❤️ for the research community*

</div>