Spaces:
Running
Running
Commit
·
ee2c527
1
Parent(s):
5068f9a
docs: add NEXT_TASK.md for LlamaIndex integration
Browse filesPriority infrastructure task for async AI agent:
- Wire LlamaIndexRAGService into service loader
- Tiered upgrade: free (local) → premium (OpenAI + persistence)
- Addresses issues #54 and #64
See file for full implementation spec.
- NEXT_TASK.md +147 -0
NEXT_TASK.md
ADDED
|
@@ -0,0 +1,147 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# NEXT_TASK: Wire LlamaIndex RAG Service into Simple Mode
|
| 2 |
+
|
| 3 |
+
**Priority:** P1 - Infrastructure
|
| 4 |
+
**GitHub Issues:** Addresses #64 (persistence) and #54 (wire in LlamaIndex)
|
| 5 |
+
**Difficulty:** Medium
|
| 6 |
+
**Estimated Changes:** 3-4 files
|
| 7 |
+
|
| 8 |
+
## Problem
|
| 9 |
+
|
| 10 |
+
We have two embedding services that are NOT connected:
|
| 11 |
+
|
| 12 |
+
1. `src/services/embeddings.py` - Used everywhere (free, in-memory, no persistence)
|
| 13 |
+
2. `src/services/llamaindex_rag.py` - Never used (better embeddings, persistence, RAG)
|
| 14 |
+
|
| 15 |
+
The LlamaIndex service provides significant value but is orphaned code.
|
| 16 |
+
|
| 17 |
+
## Solution: Tiered Service Selection
|
| 18 |
+
|
| 19 |
+
Use the existing `service_loader.py` pattern to select the right service:
|
| 20 |
+
|
| 21 |
+
```python
|
| 22 |
+
# When NO OpenAI key: Use free local embeddings (current behavior)
|
| 23 |
+
# When OpenAI key present: Upgrade to LlamaIndex (persistence + better quality)
|
| 24 |
+
```
|
| 25 |
+
|
| 26 |
+
## Implementation Steps
|
| 27 |
+
|
| 28 |
+
### Step 1: Add service selection in `src/utils/service_loader.py`
|
| 29 |
+
|
| 30 |
+
```python
|
| 31 |
+
def get_embedding_service() -> "EmbeddingService | LlamaIndexRAGService":
|
| 32 |
+
"""Get the best available embedding service.
|
| 33 |
+
|
| 34 |
+
Returns LlamaIndexRAGService if OpenAI key available (better quality + persistence).
|
| 35 |
+
Falls back to EmbeddingService (free, in-memory) otherwise.
|
| 36 |
+
"""
|
| 37 |
+
if settings.openai_api_key:
|
| 38 |
+
try:
|
| 39 |
+
from src.services.llamaindex_rag import get_rag_service
|
| 40 |
+
return get_rag_service()
|
| 41 |
+
except ImportError:
|
| 42 |
+
pass # LlamaIndex deps not installed, fallback
|
| 43 |
+
|
| 44 |
+
from src.services.embeddings import EmbeddingService
|
| 45 |
+
return EmbeddingService()
|
| 46 |
+
```
|
| 47 |
+
|
| 48 |
+
### Step 2: Create a unified interface (Protocol)
|
| 49 |
+
|
| 50 |
+
Both services need compatible methods. Create `src/services/embedding_protocol.py`:
|
| 51 |
+
|
| 52 |
+
```python
|
| 53 |
+
from typing import Protocol, Any
|
| 54 |
+
from src.utils.models import Evidence
|
| 55 |
+
|
| 56 |
+
class EmbeddingServiceProtocol(Protocol):
|
| 57 |
+
"""Common interface for embedding services."""
|
| 58 |
+
|
| 59 |
+
async def add_evidence(self, evidence_id: str, content: str, metadata: dict[str, Any]) -> None:
|
| 60 |
+
"""Store evidence with embeddings."""
|
| 61 |
+
...
|
| 62 |
+
|
| 63 |
+
async def search_similar(self, query: str, n_results: int = 5) -> list[dict[str, Any]]:
|
| 64 |
+
"""Search for similar content."""
|
| 65 |
+
...
|
| 66 |
+
|
| 67 |
+
async def deduplicate(self, evidence: list[Evidence]) -> list[Evidence]:
|
| 68 |
+
"""Remove duplicate evidence."""
|
| 69 |
+
...
|
| 70 |
+
```
|
| 71 |
+
|
| 72 |
+
### Step 3: Make LlamaIndexRAGService async-compatible
|
| 73 |
+
|
| 74 |
+
Current `llamaindex_rag.py` methods are sync. Wrap them:
|
| 75 |
+
|
| 76 |
+
```python
|
| 77 |
+
async def add_evidence(self, evidence_id: str, content: str, metadata: dict[str, Any]) -> None:
|
| 78 |
+
"""Async wrapper for ingest."""
|
| 79 |
+
loop = asyncio.get_running_loop()
|
| 80 |
+
evidence = Evidence(content=content, citation=Citation(...metadata))
|
| 81 |
+
await loop.run_in_executor(None, self.ingest_evidence, [evidence])
|
| 82 |
+
```
|
| 83 |
+
|
| 84 |
+
### Step 4: Update ResearchMemory to use the service loader
|
| 85 |
+
|
| 86 |
+
In `src/services/research_memory.py`:
|
| 87 |
+
|
| 88 |
+
```python
|
| 89 |
+
from src.utils.service_loader import get_embedding_service
|
| 90 |
+
|
| 91 |
+
class ResearchMemory:
|
| 92 |
+
def __init__(self, query: str, embedding_service: EmbeddingServiceProtocol | None = None):
|
| 93 |
+
self._embedding_service = embedding_service or get_embedding_service()
|
| 94 |
+
```
|
| 95 |
+
|
| 96 |
+
### Step 5: Add tests
|
| 97 |
+
|
| 98 |
+
```python
|
| 99 |
+
# tests/unit/services/test_service_loader.py
|
| 100 |
+
def test_uses_llamaindex_when_openai_key_present(monkeypatch):
|
| 101 |
+
monkeypatch.setenv("OPENAI_API_KEY", "test-key")
|
| 102 |
+
service = get_embedding_service()
|
| 103 |
+
assert isinstance(service, LlamaIndexRAGService)
|
| 104 |
+
|
| 105 |
+
def test_falls_back_to_local_when_no_key(monkeypatch):
|
| 106 |
+
monkeypatch.delenv("OPENAI_API_KEY", raising=False)
|
| 107 |
+
service = get_embedding_service()
|
| 108 |
+
assert isinstance(service, EmbeddingService)
|
| 109 |
+
```
|
| 110 |
+
|
| 111 |
+
## Benefits After Implementation
|
| 112 |
+
|
| 113 |
+
| Feature | Free Tier | Premium Tier (OpenAI key) |
|
| 114 |
+
|---------|-----------|---------------------------|
|
| 115 |
+
| Embeddings | Local (sentence-transformers) | OpenAI (text-embedding-3-small) |
|
| 116 |
+
| Persistence | In-memory (lost on restart) | Disk (ChromaDB PersistentClient) |
|
| 117 |
+
| Quality | Good | Better |
|
| 118 |
+
| Cost | Free | API costs |
|
| 119 |
+
| Knowledge accumulation | No | Yes |
|
| 120 |
+
|
| 121 |
+
## Files to Modify
|
| 122 |
+
|
| 123 |
+
1. `src/utils/service_loader.py` - Add `get_embedding_service()`
|
| 124 |
+
2. `src/services/llamaindex_rag.py` - Add async wrappers, match interface
|
| 125 |
+
3. `src/services/research_memory.py` - Use service loader
|
| 126 |
+
4. `tests/unit/services/test_service_loader.py` - Add tests
|
| 127 |
+
|
| 128 |
+
## Acceptance Criteria
|
| 129 |
+
|
| 130 |
+
- [ ] `get_embedding_service()` returns LlamaIndex when OpenAI key present
|
| 131 |
+
- [ ] Falls back to local EmbeddingService when no key
|
| 132 |
+
- [ ] Both services have compatible async interfaces
|
| 133 |
+
- [ ] Persistence works (evidence survives restart with OpenAI key)
|
| 134 |
+
- [ ] All existing tests pass
|
| 135 |
+
- [ ] New tests for service selection
|
| 136 |
+
|
| 137 |
+
## Related Issues
|
| 138 |
+
|
| 139 |
+
- #64 - feat: Add persistence to EmbeddingService (this solves it via LlamaIndex)
|
| 140 |
+
- #54 - tech-debt: LlamaIndex RAG is dead code (this wires it in)
|
| 141 |
+
|
| 142 |
+
## Notes for AI Agent
|
| 143 |
+
|
| 144 |
+
- Run `make check` before committing
|
| 145 |
+
- The service_loader.py pattern already exists for Modal - follow that pattern
|
| 146 |
+
- LlamaIndex requires `uv sync --extra modal` for deps
|
| 147 |
+
- Test with and without OPENAI_API_KEY set
|