VibecoderMcSwaggins commited on
Commit
ee2c527
·
1 Parent(s): 5068f9a

docs: add NEXT_TASK.md for LlamaIndex integration

Browse files

Priority infrastructure task for async AI agent:
- Wire LlamaIndexRAGService into service loader
- Tiered upgrade: free (local) → premium (OpenAI + persistence)
- Addresses issues #54 and #64

See file for full implementation spec.

Files changed (1) hide show
  1. NEXT_TASK.md +147 -0
NEXT_TASK.md ADDED
@@ -0,0 +1,147 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # NEXT_TASK: Wire LlamaIndex RAG Service into Simple Mode
2
+
3
+ **Priority:** P1 - Infrastructure
4
+ **GitHub Issues:** Addresses #64 (persistence) and #54 (wire in LlamaIndex)
5
+ **Difficulty:** Medium
6
+ **Estimated Changes:** 3-4 files
7
+
8
+ ## Problem
9
+
10
+ We have two embedding services that are NOT connected:
11
+
12
+ 1. `src/services/embeddings.py` - Used everywhere (free, in-memory, no persistence)
13
+ 2. `src/services/llamaindex_rag.py` - Never used (better embeddings, persistence, RAG)
14
+
15
+ The LlamaIndex service provides significant value but is orphaned code.
16
+
17
+ ## Solution: Tiered Service Selection
18
+
19
+ Use the existing `service_loader.py` pattern to select the right service:
20
+
21
+ ```python
22
+ # When NO OpenAI key: Use free local embeddings (current behavior)
23
+ # When OpenAI key present: Upgrade to LlamaIndex (persistence + better quality)
24
+ ```
25
+
26
+ ## Implementation Steps
27
+
28
+ ### Step 1: Add service selection in `src/utils/service_loader.py`
29
+
30
+ ```python
31
+ def get_embedding_service() -> "EmbeddingService | LlamaIndexRAGService":
32
+ """Get the best available embedding service.
33
+
34
+ Returns LlamaIndexRAGService if OpenAI key available (better quality + persistence).
35
+ Falls back to EmbeddingService (free, in-memory) otherwise.
36
+ """
37
+ if settings.openai_api_key:
38
+ try:
39
+ from src.services.llamaindex_rag import get_rag_service
40
+ return get_rag_service()
41
+ except ImportError:
42
+ pass # LlamaIndex deps not installed, fallback
43
+
44
+ from src.services.embeddings import EmbeddingService
45
+ return EmbeddingService()
46
+ ```
47
+
48
+ ### Step 2: Create a unified interface (Protocol)
49
+
50
+ Both services need compatible methods. Create `src/services/embedding_protocol.py`:
51
+
52
+ ```python
53
+ from typing import Protocol, Any
54
+ from src.utils.models import Evidence
55
+
56
+ class EmbeddingServiceProtocol(Protocol):
57
+ """Common interface for embedding services."""
58
+
59
+ async def add_evidence(self, evidence_id: str, content: str, metadata: dict[str, Any]) -> None:
60
+ """Store evidence with embeddings."""
61
+ ...
62
+
63
+ async def search_similar(self, query: str, n_results: int = 5) -> list[dict[str, Any]]:
64
+ """Search for similar content."""
65
+ ...
66
+
67
+ async def deduplicate(self, evidence: list[Evidence]) -> list[Evidence]:
68
+ """Remove duplicate evidence."""
69
+ ...
70
+ ```
71
+
72
+ ### Step 3: Make LlamaIndexRAGService async-compatible
73
+
74
+ Current `llamaindex_rag.py` methods are sync. Wrap them:
75
+
76
+ ```python
77
+ async def add_evidence(self, evidence_id: str, content: str, metadata: dict[str, Any]) -> None:
78
+ """Async wrapper for ingest."""
79
+ loop = asyncio.get_running_loop()
80
+ evidence = Evidence(content=content, citation=Citation(...metadata))
81
+ await loop.run_in_executor(None, self.ingest_evidence, [evidence])
82
+ ```
83
+
84
+ ### Step 4: Update ResearchMemory to use the service loader
85
+
86
+ In `src/services/research_memory.py`:
87
+
88
+ ```python
89
+ from src.utils.service_loader import get_embedding_service
90
+
91
+ class ResearchMemory:
92
+ def __init__(self, query: str, embedding_service: EmbeddingServiceProtocol | None = None):
93
+ self._embedding_service = embedding_service or get_embedding_service()
94
+ ```
95
+
96
+ ### Step 5: Add tests
97
+
98
+ ```python
99
+ # tests/unit/services/test_service_loader.py
100
+ def test_uses_llamaindex_when_openai_key_present(monkeypatch):
101
+ monkeypatch.setenv("OPENAI_API_KEY", "test-key")
102
+ service = get_embedding_service()
103
+ assert isinstance(service, LlamaIndexRAGService)
104
+
105
+ def test_falls_back_to_local_when_no_key(monkeypatch):
106
+ monkeypatch.delenv("OPENAI_API_KEY", raising=False)
107
+ service = get_embedding_service()
108
+ assert isinstance(service, EmbeddingService)
109
+ ```
110
+
111
+ ## Benefits After Implementation
112
+
113
+ | Feature | Free Tier | Premium Tier (OpenAI key) |
114
+ |---------|-----------|---------------------------|
115
+ | Embeddings | Local (sentence-transformers) | OpenAI (text-embedding-3-small) |
116
+ | Persistence | In-memory (lost on restart) | Disk (ChromaDB PersistentClient) |
117
+ | Quality | Good | Better |
118
+ | Cost | Free | API costs |
119
+ | Knowledge accumulation | No | Yes |
120
+
121
+ ## Files to Modify
122
+
123
+ 1. `src/utils/service_loader.py` - Add `get_embedding_service()`
124
+ 2. `src/services/llamaindex_rag.py` - Add async wrappers, match interface
125
+ 3. `src/services/research_memory.py` - Use service loader
126
+ 4. `tests/unit/services/test_service_loader.py` - Add tests
127
+
128
+ ## Acceptance Criteria
129
+
130
+ - [ ] `get_embedding_service()` returns LlamaIndex when OpenAI key present
131
+ - [ ] Falls back to local EmbeddingService when no key
132
+ - [ ] Both services have compatible async interfaces
133
+ - [ ] Persistence works (evidence survives restart with OpenAI key)
134
+ - [ ] All existing tests pass
135
+ - [ ] New tests for service selection
136
+
137
+ ## Related Issues
138
+
139
+ - #64 - feat: Add persistence to EmbeddingService (this solves it via LlamaIndex)
140
+ - #54 - tech-debt: LlamaIndex RAG is dead code (this wires it in)
141
+
142
+ ## Notes for AI Agent
143
+
144
+ - Run `make check` before committing
145
+ - The service_loader.py pattern already exists for Modal - follow that pattern
146
+ - LlamaIndex requires `uv sync --extra modal` for deps
147
+ - Test with and without OPENAI_API_KEY set