diff --git a/.cursorrules b/.cursorrules deleted file mode 100644 index 8fbe6def025d95d15c47f657eafbbbf0643a5ca5..0000000000000000000000000000000000000000 --- a/.cursorrules +++ /dev/null @@ -1,240 +0,0 @@ -# DeepCritical Project - Cursor Rules - -## Project-Wide Rules - -**Architecture**: Multi-agent research system using Pydantic AI for agent orchestration, supporting iterative and deep research patterns. Uses middleware for state management, budget tracking, and workflow coordination. - -**Type Safety**: ALWAYS use complete type hints. All functions must have parameter and return type annotations. Use `mypy --strict` compliance. Use `TYPE_CHECKING` imports for circular dependencies: `from typing import TYPE_CHECKING; if TYPE_CHECKING: from src.services.embeddings import EmbeddingService` - -**Async Patterns**: ALL I/O operations must be async (`async def`, `await`). Use `asyncio.gather()` for parallel operations. CPU-bound work must use `run_in_executor()`: `loop = asyncio.get_running_loop(); result = await loop.run_in_executor(None, cpu_bound_function, args)`. Never block the event loop. - -**Error Handling**: Use custom exceptions from `src/utils/exceptions.py`: `DeepCriticalError`, `SearchError`, `RateLimitError`, `JudgeError`, `ConfigurationError`. Always chain exceptions: `raise SearchError(...) from e`. Log with structlog: `logger.error("Operation failed", error=str(e), context=value)`. - -**Logging**: Use `structlog` for ALL logging (NOT `print` or `logging`). Import: `import structlog; logger = structlog.get_logger()`. Log with structured data: `logger.info("event", key=value)`. Use appropriate levels: DEBUG, INFO, WARNING, ERROR. - -**Pydantic Models**: All data exchange uses Pydantic models from `src/utils/models.py`. Models are frozen (`model_config = {"frozen": True}`) for immutability. Use `Field()` with descriptions. Validate with `ge=`, `le=`, `min_length=`, `max_length=` constraints. - -**Code Style**: Ruff with 100-char line length. Ignore rules: `PLR0913` (too many arguments), `PLR0912` (too many branches), `PLR0911` (too many returns), `PLR2004` (magic values), `PLW0603` (global statement), `PLC0415` (lazy imports). - -**Docstrings**: Google-style docstrings for all public functions. Include Args, Returns, Raises sections. Use type hints in docstrings only if needed for clarity. - -**Testing**: Unit tests in `tests/unit/` (mocked, fast). Integration tests in `tests/integration/` (real APIs, marked `@pytest.mark.integration`). Use `respx` for httpx mocking, `pytest-mock` for general mocking. - -**State Management**: Use `ContextVar` in middleware for thread-safe isolation. Never use global mutable state (except singletons via `@lru_cache`). Use `WorkflowState` from `src/middleware/state_machine.py` for workflow state. - -**Citation Validation**: ALWAYS validate references before returning reports. Use `validate_references()` from `src/utils/citation_validator.py`. Remove hallucinated citations. Log warnings for removed citations. - ---- - -## src/agents/ - Agent Implementation Rules - -**Pattern**: All agents use Pydantic AI `Agent` class. Agents have structured output types (Pydantic models) or return strings. Use factory functions in `src/agent_factory/agents.py` for creation. - -**Agent Structure**: -- System prompt as module-level constant (with date injection: `datetime.now().strftime("%Y-%m-%d")`) -- Agent class with `__init__(model: Any | None = None)` -- Main method (e.g., `async def evaluate()`, `async def write_report()`) -- Factory function: `def create_agent_name(model: Any | None = None) -> AgentName` - -**Model Initialization**: Use `get_model()` from `src/agent_factory/judges.py` if no model provided. Support OpenAI/Anthropic/HF Inference via settings. - -**Error Handling**: Return fallback values (e.g., `KnowledgeGapOutput(research_complete=False, outstanding_gaps=[...])`) on failure. Log errors with context. Use retry logic (3 retries) in Pydantic AI Agent initialization. - -**Input Validation**: Validate query/inputs are not empty. Truncate very long inputs with warnings. Handle None values gracefully. - -**Output Types**: Use structured output types from `src/utils/models.py` (e.g., `KnowledgeGapOutput`, `AgentSelectionPlan`, `ReportDraft`). For text output (writer agents), return `str` directly. - -**Agent-Specific Rules**: -- `knowledge_gap.py`: Outputs `KnowledgeGapOutput`. Evaluates research completeness. -- `tool_selector.py`: Outputs `AgentSelectionPlan`. Selects tools (RAG/web/database). -- `writer.py`: Returns markdown string. Includes citations in numbered format. -- `long_writer.py`: Uses `ReportDraft` input/output. Handles section-by-section writing. -- `proofreader.py`: Takes `ReportDraft`, returns polished markdown. -- `thinking.py`: Returns observation string from conversation history. -- `input_parser.py`: Outputs `ParsedQuery` with research mode detection. - ---- - -## src/tools/ - Search Tool Rules - -**Protocol**: All tools implement `SearchTool` protocol from `src/tools/base.py`: `name` property and `async def search(query, max_results) -> list[Evidence]`. - -**Rate Limiting**: Use `@retry` decorator from tenacity: `@retry(stop=stop_after_attempt(3), wait=wait_exponential(...))`. Implement `_rate_limit()` method for APIs with limits. Use shared rate limiters from `src/tools/rate_limiter.py`. - -**Error Handling**: Raise `SearchError` or `RateLimitError` on failures. Handle HTTP errors (429, 500, timeout). Return empty list on non-critical errors (log warning). - -**Query Preprocessing**: Use `preprocess_query()` from `src/tools/query_utils.py` to remove noise and expand synonyms. - -**Evidence Conversion**: Convert API responses to `Evidence` objects with `Citation`. Extract metadata (title, url, date, authors). Set relevance scores (0.0-1.0). Handle missing fields gracefully. - -**Tool-Specific Rules**: -- `pubmed.py`: Use NCBI E-utilities (ESearch → EFetch). Rate limit: 0.34s between requests. Parse XML with `xmltodict`. Handle single vs. multiple articles. -- `clinicaltrials.py`: Use `requests` library (NOT httpx - WAF blocks httpx). Run in thread pool: `await asyncio.to_thread(requests.get, ...)`. Filter: Only interventional studies, active/completed. -- `europepmc.py`: Handle preprint markers: `[PREPRINT - Not peer-reviewed]`. Build URLs from DOI or PMID. -- `rag_tool.py`: Wraps `LlamaIndexRAGService`. Returns Evidence from RAG results. Handles ingestion. -- `search_handler.py`: Orchestrates parallel searches across multiple tools. Uses `asyncio.gather()` with `return_exceptions=True`. Aggregates results into `SearchResult`. - ---- - -## src/middleware/ - Middleware Rules - -**State Management**: Use `ContextVar` for thread-safe isolation. `WorkflowState` uses `ContextVar[WorkflowState | None]`. Initialize with `init_workflow_state(embedding_service)`. Access with `get_workflow_state()` (auto-initializes if missing). - -**WorkflowState**: Tracks `evidence: list[Evidence]`, `conversation: Conversation`, `embedding_service: Any`. Methods: `add_evidence()` (deduplicates by URL), `async search_related()` (semantic search). - -**WorkflowManager**: Manages parallel research loops. Methods: `add_loop()`, `run_loops_parallel()`, `update_loop_status()`, `sync_loop_evidence_to_state()`. Uses `asyncio.gather()` for parallel execution. Handles errors per loop (don't fail all if one fails). - -**BudgetTracker**: Tracks tokens, time, iterations per loop and globally. Methods: `create_budget()`, `add_tokens()`, `start_timer()`, `update_timer()`, `increment_iteration()`, `check_budget()`, `can_continue()`. Token estimation: `estimate_tokens(text)` (~4 chars per token), `estimate_llm_call_tokens(prompt, response)`. - -**Models**: All middleware models in `src/utils/models.py`. `IterationData`, `Conversation`, `ResearchLoop`, `BudgetStatus` are used by middleware. - ---- - -## src/orchestrator/ - Orchestration Rules - -**Research Flows**: Two patterns: `IterativeResearchFlow` (single loop) and `DeepResearchFlow` (plan → parallel loops → synthesis). Both support agent chains (`use_graph=False`) and graph execution (`use_graph=True`). - -**IterativeResearchFlow**: Pattern: Generate observations → Evaluate gaps → Select tools → Execute → Judge → Continue/Complete. Uses `KnowledgeGapAgent`, `ToolSelectorAgent`, `ThinkingAgent`, `WriterAgent`, `JudgeHandler`. Tracks iterations, time, budget. - -**DeepResearchFlow**: Pattern: Planner → Parallel iterative loops per section → Synthesizer. Uses `PlannerAgent`, `IterativeResearchFlow` (per section), `LongWriterAgent` or `ProofreaderAgent`. Uses `WorkflowManager` for parallel execution. - -**Graph Orchestrator**: Uses Pydantic AI Graphs (when available) or agent chains (fallback). Routes based on research mode (iterative/deep/auto). Streams `AgentEvent` objects for UI. - -**State Initialization**: Always call `init_workflow_state()` before running flows. Initialize `BudgetTracker` per loop. Use `WorkflowManager` for parallel coordination. - -**Event Streaming**: Yield `AgentEvent` objects during execution. Event types: "started", "search_complete", "judge_complete", "hypothesizing", "synthesizing", "complete", "error". Include iteration numbers and data payloads. - ---- - -## src/services/ - Service Rules - -**EmbeddingService**: Local sentence-transformers (NO API key required). All operations async-safe via `run_in_executor()`. ChromaDB for vector storage. Deduplication threshold: 0.85 (85% similarity = duplicate). - -**LlamaIndexRAGService**: Uses OpenAI embeddings (requires `OPENAI_API_KEY`). Methods: `ingest_evidence()`, `retrieve()`, `query()`. Returns documents with metadata (source, title, url, date, authors). Lazy initialization with graceful fallback. - -**StatisticalAnalyzer**: Generates Python code via LLM. Executes in Modal sandbox (secure, isolated). Library versions pinned in `SANDBOX_LIBRARIES` dict. Returns `AnalysisResult` with verdict (SUPPORTED/REFUTED/INCONCLUSIVE). - -**Singleton Pattern**: Use `@lru_cache(maxsize=1)` for singletons: `@lru_cache(maxsize=1); def get_service() -> Service: return Service()`. Lazy initialization to avoid requiring dependencies at import time. - ---- - -## src/utils/ - Utility Rules - -**Models**: All Pydantic models in `src/utils/models.py`. Use frozen models (`model_config = {"frozen": True}`) except where mutation needed. Use `Field()` with descriptions. Validate with constraints. - -**Config**: Settings via Pydantic Settings (`src/utils/config.py`). Load from `.env` automatically. Use `settings` singleton: `from src.utils.config import settings`. Validate API keys with properties: `has_openai_key`, `has_anthropic_key`. - -**Exceptions**: Custom exception hierarchy in `src/utils/exceptions.py`. Base: `DeepCriticalError`. Specific: `SearchError`, `RateLimitError`, `JudgeError`, `ConfigurationError`. Always chain exceptions. - -**LLM Factory**: Centralized LLM model creation in `src/utils/llm_factory.py`. Supports OpenAI, Anthropic, HF Inference. Use `get_model()` or factory functions. Check requirements before initialization. - -**Citation Validator**: Use `validate_references()` from `src/utils/citation_validator.py`. Removes hallucinated citations (URLs not in evidence). Logs warnings. Returns validated report string. - ---- - -## src/orchestrator_factory.py Rules - -**Purpose**: Factory for creating orchestrators. Supports "simple" (legacy) and "advanced" (magentic) modes. Auto-detects mode based on API key availability. - -**Pattern**: Lazy import for optional dependencies (`_get_magentic_orchestrator_class()`). Handles `ImportError` gracefully with clear error messages. - -**Mode Detection**: `_determine_mode()` checks explicit mode or auto-detects: "advanced" if `settings.has_openai_key`, else "simple". Maps "magentic" → "advanced". - -**Function Signature**: `create_orchestrator(search_handler, judge_handler, config, mode) -> Any`. Simple mode requires handlers. Advanced mode uses MagenticOrchestrator. - -**Error Handling**: Raise `ValueError` with clear messages if requirements not met. Log mode selection with structlog. - ---- - -## src/orchestrator_hierarchical.py Rules - -**Purpose**: Hierarchical orchestrator using middleware and sub-teams. Adapts Magentic ChatAgent to SubIterationTeam protocol. - -**Pattern**: Uses `SubIterationMiddleware` with `ResearchTeam` and `LLMSubIterationJudge`. Event-driven via callback queue. - -**State Initialization**: Initialize embedding service with graceful fallback. Use `init_magentic_state()` (deprecated, but kept for compatibility). - -**Event Streaming**: Uses `asyncio.Queue` for event coordination. Yields `AgentEvent` objects. Handles event callback pattern with `asyncio.wait()`. - -**Error Handling**: Log errors with context. Yield error events. Process remaining events after task completion. - ---- - -## src/orchestrator_magentic.py Rules - -**Purpose**: Magentic-based orchestrator using ChatAgent pattern. Each agent has internal LLM. Manager orchestrates agents. - -**Pattern**: Uses `MagenticBuilder` with participants (searcher, hypothesizer, judge, reporter). Manager uses `OpenAIChatClient`. Workflow built in `_build_workflow()`. - -**Event Processing**: `_process_event()` converts Magentic events to `AgentEvent`. Handles: `MagenticOrchestratorMessageEvent`, `MagenticAgentMessageEvent`, `MagenticFinalResultEvent`, `MagenticAgentDeltaEvent`, `WorkflowOutputEvent`. - -**Text Extraction**: `_extract_text()` defensively extracts text from messages. Priority: `.content` → `.text` → `str(message)`. Handles buggy message objects. - -**State Initialization**: Initialize embedding service with graceful fallback. Use `init_magentic_state()` (deprecated). - -**Requirements**: Must call `check_magentic_requirements()` in `__init__`. Requires `agent-framework-core` and OpenAI API key. - -**Event Types**: Maps agent names to event types: "search" → "search_complete", "judge" → "judge_complete", "hypothes" → "hypothesizing", "report" → "synthesizing". - ---- - -## src/agent_factory/ - Factory Rules - -**Pattern**: Factory functions for creating agents and handlers. Lazy initialization for optional dependencies. Support OpenAI/Anthropic/HF Inference. - -**Judges**: `create_judge_handler()` creates `JudgeHandler` with structured output (`JudgeAssessment`). Supports `MockJudgeHandler`, `HFInferenceJudgeHandler` as fallbacks. - -**Agents**: Factory functions in `agents.py` for all Pydantic AI agents. Pattern: `create_agent_name(model: Any | None = None) -> AgentName`. Use `get_model()` if model not provided. - -**Graph Builder**: `graph_builder.py` contains utilities for building research graphs. Supports iterative and deep research graph construction. - -**Error Handling**: Raise `ConfigurationError` if required API keys missing. Log agent creation. Handle import errors gracefully. - ---- - -## src/prompts/ - Prompt Rules - -**Pattern**: System prompts stored as module-level constants. Include date injection: `datetime.now().strftime("%Y-%m-%d")`. Format evidence with truncation (1500 chars per item). - -**Judge Prompts**: In `judge.py`. Handle empty evidence case separately. Always request structured JSON output. - -**Hypothesis Prompts**: In `hypothesis.py`. Use diverse evidence selection (MMR algorithm). Sentence-aware truncation. - -**Report Prompts**: In `report.py`. Include full citation details. Use diverse evidence selection (n=20). Emphasize citation validation rules. - ---- - -## Testing Rules - -**Structure**: Unit tests in `tests/unit/` (mocked, fast). Integration tests in `tests/integration/` (real APIs, marked `@pytest.mark.integration`). - -**Mocking**: Use `respx` for httpx mocking. Use `pytest-mock` for general mocking. Mock LLM calls in unit tests (use `MockJudgeHandler`). - -**Fixtures**: Common fixtures in `tests/conftest.py`: `mock_httpx_client`, `mock_llm_response`. - -**Coverage**: Aim for >80% coverage. Test error handling, edge cases, and integration paths. - ---- - -## File-Specific Agent Rules - -**knowledge_gap.py**: Outputs `KnowledgeGapOutput`. System prompt evaluates research completeness. Handles conversation history. Returns fallback on error. - -**writer.py**: Returns markdown string. System prompt includes citation format examples. Validates inputs. Truncates long findings. Retry logic for transient failures. - -**long_writer.py**: Uses `ReportDraft` input/output. Writes sections iteratively. Reformats references (deduplicates, renumbers). Reformats section headings. - -**proofreader.py**: Takes `ReportDraft`, returns polished markdown. Removes duplicates. Adds summary. Preserves references. - -**tool_selector.py**: Outputs `AgentSelectionPlan`. System prompt lists available agents (WebSearchAgent, SiteCrawlerAgent, RAGAgent). Guidelines for when to use each. - -**thinking.py**: Returns observation string. Generates observations from conversation history. Uses query and background context. - -**input_parser.py**: Outputs `ParsedQuery`. Detects research mode (iterative/deep). Extracts entities and research questions. Improves/refines query. - - - - - - - diff --git a/.env.example b/.env.example index 442ff75d33f92422e78850b3c9d6d49af6f1d6e3..b8061357538326dd7fad717c627cdcfa5c0b3eb9 100644 --- a/.env.example +++ b/.env.example @@ -1,83 +1,63 @@ -# HuggingFace -HF_TOKEN=your_huggingface_token_here +# ============== LLM CONFIGURATION ============== -# OpenAI (optional) -OPENAI_API_KEY=your_openai_key_here +# Provider: "openai", "anthropic", or "huggingface" +LLM_PROVIDER=openai -# Anthropic (optional) -ANTHROPIC_API_KEY=your_anthropic_key_here +# API Keys (at least one required for full LLM analysis) +OPENAI_API_KEY=sk-your-key-here +ANTHROPIC_API_KEY=sk-ant-your-key-here # Model names (optional - sensible defaults set in config.py) -# ANTHROPIC_MODEL=claude-sonnet-4-5-20250929 # OPENAI_MODEL=gpt-5.1 +# ANTHROPIC_MODEL=claude-sonnet-4-5-20250929 +# ============== HUGGINGFACE CONFIGURATION ============== -# ============================================ -# Audio Processing Configuration (TTS) -# ============================================ -# Kokoro TTS Model Configuration -TTS_MODEL=hexgrad/Kokoro-82M -TTS_VOICE=af_heart -TTS_SPEED=1.0 -TTS_GPU=T4 -TTS_TIMEOUT=60 - -# Available TTS Voices: -# American English Female: af_heart, af_bella, af_nicole, af_aoede, af_kore, af_sarah, af_nova, af_sky, af_alloy, af_jessica, af_river -# American English Male: am_michael, am_fenrir, am_puck, am_echo, am_eric, am_liam, am_onyx, am_santa, am_adam - -# Available GPU Types (Modal): -# T4 - Cheapest, good for testing (default) -# A10 - Good balance of cost/performance -# A100 - Fastest, most expensive -# L4 - NVIDIA L4 GPU -# L40S - NVIDIA L40S GPU -# Note: GPU type is set at function definition time. Changes require app restart. - -# ============================================ -# Audio Processing Configuration (STT) -# ============================================ -# Speech-to-Text API Configuration -STT_API_URL=nvidia/canary-1b-v2 -STT_SOURCE_LANG=English -STT_TARGET_LANG=English - -# Available STT Languages: -# English, Bulgarian, Croatian, Czech, Danish, Dutch, Estonian, Finnish, French, German, Greek, Hungarian, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish, Swedish, Russian, Ukrainian - -# ============================================ -# Audio Feature Flags -# ============================================ -ENABLE_AUDIO_INPUT=true -ENABLE_AUDIO_OUTPUT=true - -# ============================================ -# Image OCR Configuration -# ============================================ -OCR_API_URL=prithivMLmods/Multimodal-OCR3 -ENABLE_IMAGE_INPUT=true - -# ============== EMBEDDINGS ============== - -# OpenAI Embedding Model (used if LLM_PROVIDER is openai and performing RAG/Embeddings) -OPENAI_EMBEDDING_MODEL=text-embedding-3-small - -# Local Embedding Model (used for local/offline embeddings) -LOCAL_EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2 - -# ============== HUGGINGFACE (FREE TIER) ============== - -# HuggingFace Token - enables Llama 3.1 (best quality free model) +# HuggingFace Token - enables gated models and higher rate limits # Get yours at: https://huggingface.co/settings/tokens -# -# WITHOUT HF_TOKEN: Falls back to ungated models (zephyr-7b-beta) -# WITH HF_TOKEN: Uses Llama 3.1 8B Instruct (requires accepting license) +# +# WITHOUT HF_TOKEN: Falls back to ungated models (zephyr-7b-beta, Qwen2-7B) +# WITH HF_TOKEN: Uses gated models (Llama 3.1, Gemma-2) via inference providers # # For HuggingFace Spaces deployment: # Set this as a "Secret" in Space Settings -> Variables and secrets # Users/judges don't need their own token - the Space secret is used # HF_TOKEN=hf_your-token-here +# Alternative: HUGGINGFACE_API_KEY (same as HF_TOKEN) + +# Default HuggingFace model for inference (gated, requires auth) +# Can be overridden in UI dropdown +# Latest reasoning models: Qwen3-Next-80B-A3B-Thinking, Qwen3-Next-80B-A3B-Instruct, Llama-3.3-70B-Instruct +HUGGINGFACE_MODEL=Qwen/Qwen3-Next-80B-A3B-Thinking + +# Fallback models for HuggingFace Inference API (comma-separated) +# Models are tried in order until one succeeds +# Format: model1,model2,model3 +# Latest reasoning models first, then reliable fallbacks +# Reasoning models: Qwen3-Next (thinking/instruct), Llama-3.3-70B, Qwen3-235B +# Fallbacks: Llama-3.1-8B, Zephyr-7B (ungated), Qwen2-7B (ungated) +HF_FALLBACK_MODELS=Qwen/Qwen3-Next-80B-A3B-Thinking,Qwen/Qwen3-Next-80B-A3B-Instruct,meta-llama/Llama-3.3-70B-Instruct,meta-llama/Llama-3.1-8B-Instruct,HuggingFaceH4/zephyr-7b-beta,Qwen/Qwen2-7B-Instruct + +# Override model/provider selection (optional, usually set via UI) +# HF_MODEL=Qwen/Qwen3-Next-80B-A3B-Thinking +# HF_PROVIDER=hyperbolic + +# ============== EMBEDDING CONFIGURATION ============== + +# Embedding Provider: "openai", "local", or "huggingface" +# Default: "local" (no API key required) +EMBEDDING_PROVIDER=local + +# OpenAI Embedding Model (used if EMBEDDING_PROVIDER=openai) +OPENAI_EMBEDDING_MODEL=text-embedding-3-small + +# Local Embedding Model (sentence-transformers, used if EMBEDDING_PROVIDER=local) +# BAAI/bge-small-en-v1.5 is newer, faster, and better than all-MiniLM-L6-v2 +LOCAL_EMBEDDING_MODEL=BAAI/bge-small-en-v1.5 + +# HuggingFace Embedding Model (used if EMBEDDING_PROVIDER=huggingface) +HUGGINGFACE_EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2 # ============== AGENT CONFIGURATION ============== @@ -85,23 +65,60 @@ MAX_ITERATIONS=10 SEARCH_TIMEOUT=30 LOG_LEVEL=INFO -# ============================================ -# Modal Configuration (Required for TTS) -# ============================================ -# Modal credentials are required for TTS (Text-to-Speech) functionality -# Get your credentials from: https://modal.com/ -MODAL_TOKEN_ID=your_modal_token_id_here -MODAL_TOKEN_SECRET=your_modal_token_secret_here +# Graph-based execution (experimental) +# USE_GRAPH_EXECUTION=false + +# Budget & Rate Limiting +# DEFAULT_TOKEN_LIMIT=100000 +# DEFAULT_TIME_LIMIT_MINUTES=10 +# DEFAULT_ITERATIONS_LIMIT=10 + +# ============== WEB SEARCH CONFIGURATION ============== + +# Web Search Provider: "serper", "searchxng", "brave", "tavily", or "duckduckgo" +# Default: "duckduckgo" (no API key required) +WEB_SEARCH_PROVIDER=duckduckgo + +# Serper API Key (for Google search via Serper) +# SERPER_API_KEY=your-serper-key-here + +# SearchXNG Host URL (for self-hosted search) +# SEARCHXNG_HOST=http://localhost:8080 + +# Brave Search API Key +# BRAVE_API_KEY=your-brave-key-here + +# Tavily API Key +# TAVILY_API_KEY=your-tavily-key-here # ============== EXTERNAL SERVICES ============== -# PubMed (optional - higher rate limits) +# PubMed (optional - higher rate limits: 10 req/sec vs 3 req/sec) NCBI_API_KEY=your-ncbi-key-here -# Vector Database (optional - for LlamaIndex RAG) +# Modal (optional - for secure code execution sandbox) +# MODAL_TOKEN_ID=your-modal-token-id +# MODAL_TOKEN_SECRET=your-modal-token-secret + +# ============== VECTOR DATABASE (ChromaDB) ============== + +# ChromaDB storage path CHROMA_DB_PATH=./chroma_db -# Neo4j Knowledge Graph -NEO4J_URI=bolt://localhost:7687 -NEO4J_USER=neo4j -NEO4J_PASSWORD=your_neo4j_password_here -NEO4J_DATABASE=your_database_name + +# Persist ChromaDB to disk (default: true) +# CHROMA_DB_PERSIST=true + +# Remote ChromaDB server (optional) +# CHROMA_DB_HOST=localhost +# CHROMA_DB_PORT=8000 + +# ============== RAG SERVICE CONFIGURATION ============== + +# ChromaDB collection name for RAG +# RAG_COLLECTION_NAME=deepcritical_evidence + +# Number of top results to retrieve from RAG +# RAG_SIMILARITY_TOP_K=5 + +# Automatically ingest evidence into RAG +# RAG_AUTO_INGEST=true diff --git a/.github/README.md b/.github/README.md index 8f3727f7e12fb16c26e4cc7bd30f99d7ffcf36b2..a3b61ae53484cce05c72522913bb9d15f7e67c90 100644 --- a/.github/README.md +++ b/.github/README.md @@ -3,7 +3,8 @@ > **You are reading the Github README!** > > - 📚 **Documentation**: See our [technical documentation](https://deepcritical.github.io/GradioDemo/) for detailed information -> - 📖 **Demo README**: Check out the [Demo README](..README.md) for more information > - 🏆 **Demo**: Kindly consider using our [Free Demo](https://hf.co/DataQuests/GradioDemo) +> - 📖 **Demo README**: Check out the [Demo README](..README.md) for setup, configuration, and contribution guidelines +> - 🏆 **Hackathon Submission**: Keep reading below for more information about our MCP Hackathon submission
This page documents the API for DeepCritical agents.
Module: src.agents.knowledge_gap
Purpose: Evaluates research state and identifies knowledge gaps.
evaluate¶Evaluates research completeness and identifies outstanding knowledge gaps.
Parameters: - query: Research query string - background_context: Background context for the query (default: "") - conversation_history: History of actions, findings, and thoughts as string (default: "") - iteration: Current iteration number (default: 0) - time_elapsed_minutes: Elapsed time in minutes (default: 0.0) - max_time_minutes: Maximum time limit in minutes (default: 10)
Returns: KnowledgeGapOutput with: - research_complete: Boolean indicating if research is complete - outstanding_gaps: List of remaining knowledge gaps
Module: src.agents.tool_selector
Purpose: Selects appropriate tools for addressing knowledge gaps.
select_tools¶Selects tools for addressing a knowledge gap.
Parameters: - gap: The knowledge gap to address - query: Research query string - background_context: Optional background context (default: "") - conversation_history: History of actions, findings, and thoughts as string (default: "")
Returns: AgentSelectionPlan with list of AgentTask objects.
Module: src.agents.writer
Purpose: Generates final reports from research findings.
write_report¶Generates a markdown report from research findings.
Parameters: - query: Research query string - findings: Research findings to include in report - output_length: Optional description of desired output length (default: "") - output_instructions: Optional additional instructions for report generation (default: "")
Returns: Markdown string with numbered citations.
Module: src.agents.long_writer
Purpose: Long-form report generation with section-by-section writing.
write_next_section¶Writes the next section of a long-form report.
Parameters: - original_query: The original research query - report_draft: Current report draft as string (all sections written so far) - next_section_title: Title of the section to write - next_section_draft: Draft content for the next section
Returns: LongWriterOutput with formatted section and references.
write_report¶Generates final report from draft.
Parameters: - query: Research query string - report_title: Title of the report - report_draft: Complete report draft
Returns: Final markdown report string.
Module: src.agents.proofreader
Purpose: Proofreads and polishes report drafts.
proofread¶Proofreads and polishes a report draft.
Parameters: - query: Research query string - report_title: Title of the report - report_draft: Report draft to proofread
Returns: Polished markdown string.
Module: src.agents.thinking
Purpose: Generates observations from conversation history.
generate_observations¶Generates observations from conversation history.
Parameters: - query: Research query string - background_context: Optional background context (default: "") - conversation_history: History of actions, findings, and thoughts as string (default: "") - iteration: Current iteration number (default: 1)
Returns: Observation string.
Module: src.agents.input_parser
Purpose: Parses and improves user queries, detects research mode.
parse¶Parses and improves a user query.
Parameters: - query: Original query string
Returns: ParsedQuery with: - original_query: Original query string - improved_query: Refined query string - research_mode: "iterative" or "deep" - key_entities: List of key entities - research_questions: List of research questions
All agents have factory functions in src.agent_factory.agents:
Parameters: - model: Optional Pydantic AI model. If None, uses get_model() from settings. - oauth_token: Optional OAuth token from HuggingFace login (takes priority over env vars)
Returns: Agent instance.
This page documents the Pydantic models used throughout DeepCritical.
Module: src.utils.models
Purpose: Represents evidence from search results.
Fields: - citation: Citation information (title, URL, date, authors) - content: Evidence text content - relevance: Relevance score (0.0-1.0) - metadata: Additional metadata dictionary
Module: src.utils.models
Purpose: Citation information for evidence.
Fields: - source: Source name (e.g., "pubmed", "clinicaltrials", "europepmc", "web", "rag") - title: Article/trial title - url: Source URL - date: Publication date (YYYY-MM-DD or "Unknown") - authors: List of authors (optional)
Module: src.utils.models
Purpose: Output from knowledge gap evaluation.
Fields: - research_complete: Boolean indicating if research is complete - outstanding_gaps: List of remaining knowledge gaps
Module: src.utils.models
Purpose: Plan for tool/agent selection.
Fields: - tasks: List of agent tasks to execute
Module: src.utils.models
Purpose: Individual agent task.
Fields: - gap: The knowledge gap being addressed (optional) - agent: Name of agent to use - query: The specific query for the agent - entity_website: The website of the entity being researched, if known (optional)
Module: src.utils.models
Purpose: Draft structure for long-form reports.
Fields: - sections: List of report sections
Module: src.utils.models
Purpose: Individual section in a report draft.
Fields: - section_title: The title of the section - section_content: The content of the section
Module: src.utils.models
Purpose: Parsed and improved query.
Fields: - original_query: Original query string - improved_query: Refined query string - research_mode: Research mode ("iterative" or "deep") - key_entities: List of key entities - research_questions: List of research questions
Module: src.utils.models
Purpose: Conversation history with iterations.
Fields: - history: List of iteration data
Module: src.utils.models
Purpose: Data for a single iteration.
Fields: - gap: The gap addressed in the iteration - tool_calls: The tool calls made - findings: The findings collected from tool calls - thought: The thinking done to reflect on the success of the iteration and next steps
Module: src.utils.models
Purpose: Event emitted during research execution.
Fields: - type: Event type (e.g., "started", "search_complete", "complete") - iteration: Iteration number (optional) - data: Event data dictionary
Module: src.utils.models
Purpose: Current budget status.
Fields: - tokens_used: Total tokens used - tokens_limit: Token budget limit - time_elapsed_seconds: Time elapsed in seconds - time_limit_seconds: Time budget limit (default: 600.0 seconds / 10 minutes) - iterations: Number of iterations completed - iterations_limit: Maximum iterations (default: 10) - iteration_tokens: Tokens used per iteration (iteration number -> token count)
This page documents the API for DeepCritical orchestrators.
Module: src.orchestrator.research_flow
Purpose: Single-loop research with search-judge-synthesize cycles.
run¶Runs iterative research flow.
Parameters: - query: Research query string - background_context: Background context (default: "") - output_length: Optional description of desired output length (default: "") - output_instructions: Optional additional instructions for report generation (default: "")
Returns: Final report string.
Note: max_iterations, max_time_minutes, and token_budget are constructor parameters, not run() parameters.
Module: src.orchestrator.research_flow
Purpose: Multi-section parallel research with planning and synthesis.
run¶Runs deep research flow.
Parameters: - query: Research query string
Returns: Final report string.
Note: max_iterations_per_section, max_time_minutes, and token_budget are constructor parameters, not run() parameters.
Module: src.orchestrator.graph_orchestrator
Purpose: Graph-based execution using Pydantic AI agents as nodes.
run¶Runs graph-based research orchestration.
Parameters: - query: Research query string
Yields: AgentEvent objects during graph execution.
Note: research_mode and use_graph are constructor parameters, not run() parameters.
Module: src.orchestrator_factory
Purpose: Factory for creating orchestrators.
create_orchestrator¶Creates an orchestrator instance.
Parameters: - search_handler: Search handler protocol implementation (optional, required for simple mode) - judge_handler: Judge handler protocol implementation (optional, required for simple mode) - config: Configuration object (optional) - mode: Orchestrator mode ("simple", "advanced", "magentic", "iterative", "deep", "auto", or None for auto-detect) - oauth_token: Optional OAuth token from HuggingFace login (takes priority over env vars)
Returns: Orchestrator instance.
Raises: - ValueError: If requirements not met
Modes: - "simple": Legacy orchestrator - "advanced" or "magentic": Magentic orchestrator (requires OpenAI API key) - None: Auto-detect based on API key availability
Module: src.orchestrator_magentic
Purpose: Multi-agent coordination using Microsoft Agent Framework.
run¶Runs Magentic orchestration.
Parameters: - query: Research query string
Yields: AgentEvent objects converted from Magentic events.
Note: max_rounds and max_stalls are constructor parameters, not run() parameters.
Requirements: - agent-framework-core package - OpenAI API key
This page documents the API for DeepCritical services.
Module: src.services.embeddings
Purpose: Local sentence-transformers for semantic search and deduplication.
embed¶Generates embedding for a text string.
Parameters: - text: Text to embed
Returns: Embedding vector as list of floats.
embed_batch¶Generates embeddings for multiple texts.
Parameters: - texts: List of texts to embed
Returns: List of embedding vectors.
similarity¶Calculates similarity between two texts.
Parameters: - text1: First text - text2: Second text
Returns: Similarity score (0.0-1.0).
find_duplicates¶async def find_duplicates(
- self,
- texts: list[str],
- threshold: float = 0.85
-) -> list[tuple[int, int]]
-Finds duplicate texts based on similarity threshold.
Parameters: - texts: List of texts to check - threshold: Similarity threshold (default: 0.85)
Returns: List of (index1, index2) tuples for duplicate pairs.
add_evidence¶async def add_evidence(
- self,
- evidence_id: str,
- content: str,
- metadata: dict[str, Any]
-) -> None
-Adds evidence to vector store for semantic search.
Parameters: - evidence_id: Unique identifier for the evidence - content: Evidence text content - metadata: Additional metadata dictionary
search_similar¶Finds semantically similar evidence.
Parameters: - query: Search query string - n_results: Number of results to return (default: 5)
Returns: List of dictionaries with id, content, metadata, and distance keys.
deduplicate¶async def deduplicate(
- self,
- new_evidence: list[Evidence],
- threshold: float = 0.9
-) -> list[Evidence]
-Removes semantically duplicate evidence.
Parameters: - new_evidence: List of evidence items to deduplicate - threshold: Similarity threshold (default: 0.9, where 0.9 = 90% similar is duplicate)
Returns: List of unique evidence items (not already in vector store).
get_embedding_service¶Returns singleton EmbeddingService instance.
Module: src.services.rag
Purpose: Retrieval-Augmented Generation using LlamaIndex.
ingest_evidence¶Ingests evidence into RAG service.
Parameters: - evidence_list: List of Evidence objects to ingest
Note: Supports multiple embedding providers (OpenAI, local sentence-transformers, Hugging Face).
retrieve¶Retrieves relevant documents for a query.
Parameters: - query: Search query string - top_k: Number of top results to return (defaults to similarity_top_k from constructor)
Returns: List of dictionaries with text, score, and metadata keys.
query¶Queries RAG service and returns synthesized response.
Parameters: - query_str: Query string - top_k: Number of results to use (defaults to similarity_top_k from constructor)
Returns: Synthesized response string.
Raises: - ConfigurationError: If no LLM API key is available for query synthesis
ingest_documents¶Ingests raw LlamaIndex Documents.
Parameters: - documents: List of LlamaIndex Document objects
clear_collection¶Clears all documents from the collection.
get_rag_service¶def get_rag_service(
- collection_name: str = "deepcritical_evidence",
- oauth_token: str | None = None,
- **kwargs: Any
-) -> LlamaIndexRAGService
-Get or create a RAG service instance.
Parameters: - collection_name: Name of the ChromaDB collection (default: "deepcritical_evidence") - oauth_token: Optional OAuth token from HuggingFace login (takes priority over env vars) - **kwargs: Additional arguments for LlamaIndexRAGService (e.g., use_openai_embeddings=False)
Returns: Configured LlamaIndexRAGService instance.
Note: By default, uses local embeddings (sentence-transformers) which require no API keys.
Module: src.services.statistical_analyzer
Purpose: Secure execution of AI-generated statistical code.
analyze¶async def analyze(
- self,
- query: str,
- evidence: list[Evidence],
- hypothesis: dict[str, Any] | None = None
-) -> AnalysisResult
-Analyzes a research question using statistical methods.
Parameters: - query: The research question - evidence: List of Evidence objects to analyze - hypothesis: Optional hypothesis dict with drug, target, pathway, effect, confidence keys
Returns: AnalysisResult with: - verdict: SUPPORTED, REFUTED, or INCONCLUSIVE - confidence: Confidence in verdict (0.0-1.0) - statistical_evidence: Summary of statistical findings - code_generated: Python code that was executed - execution_output: Output from code execution - key_takeaways: Key takeaways from analysis - limitations: List of limitations
Note: Requires Modal credentials for sandbox execution.
This page documents the API for DeepCritical search tools.
All tools implement the SearchTool protocol:
class SearchTool(Protocol):
- @property
- def name(self) -> str: ...
-
- async def search(
- self,
- query: str,
- max_results: int = 10
- ) -> list[Evidence]: ...
-Module: src.tools.pubmed
Purpose: Search peer-reviewed biomedical literature from PubMed.
name¶Returns tool name: "pubmed"
search¶Searches PubMed for articles.
Parameters: - query: Search query string - max_results: Maximum number of results to return (default: 10)
Returns: List of Evidence objects with PubMed articles.
Raises: - SearchError: If search fails (timeout, HTTP error, XML parsing error) - RateLimitError: If rate limit is exceeded (429 status code)
Note: Uses NCBI E-utilities (ESearch → EFetch). Rate limit: 0.34s between requests. Handles single vs. multiple articles.
Module: src.tools.clinicaltrials
Purpose: Search ClinicalTrials.gov for interventional studies.
name¶Returns tool name: "clinicaltrials"
search¶Searches ClinicalTrials.gov for trials.
Parameters: - query: Search query string - max_results: Maximum number of results to return (default: 10)
Returns: List of Evidence objects with clinical trials.
Note: Only returns interventional studies with status: COMPLETED, ACTIVE_NOT_RECRUITING, RECRUITING, ENROLLING_BY_INVITATION. Uses requests library (NOT httpx - WAF blocks httpx). Runs in thread pool for async compatibility.
Raises: - SearchError: If search fails (HTTP error, request exception)
Module: src.tools.europepmc
Purpose: Search Europe PMC for preprints and peer-reviewed articles.
name¶Returns tool name: "europepmc"
search¶Searches Europe PMC for articles and preprints.
Parameters: - query: Search query string - max_results: Maximum number of results to return (default: 10)
Returns: List of Evidence objects with articles/preprints.
Note: Includes both preprints (marked with [PREPRINT - Not peer-reviewed]) and peer-reviewed articles. Handles preprint markers. Builds URLs from DOI or PMID.
Raises: - SearchError: If search fails (HTTP error, connection error)
Module: src.tools.rag_tool
Purpose: Semantic search within collected evidence.
def __init__(
- self,
- rag_service: LlamaIndexRAGService | None = None,
- oauth_token: str | None = None
-) -> None
-Parameters: - rag_service: Optional RAG service instance. If None, will be lazy-initialized. - oauth_token: Optional OAuth token from HuggingFace login (for RAG LLM)
name¶Returns tool name: "rag"
search¶Searches collected evidence using semantic similarity.
Parameters: - query: Search query string - max_results: Maximum number of results to return (default: 10)
Returns: List of Evidence objects from collected evidence.
Raises: - ConfigurationError: If RAG service is unavailable
Note: Requires evidence to be ingested into RAG service first. Wraps LlamaIndexRAGService. Returns Evidence from RAG results.
Module: src.tools.search_handler
Purpose: Orchestrates parallel searches across multiple tools.
def __init__(
- self,
- tools: list[SearchTool],
- timeout: float = 30.0,
- include_rag: bool = False,
- auto_ingest_to_rag: bool = True,
- oauth_token: str | None = None
-) -> None
-Parameters: - tools: List of search tools to use - timeout: Timeout for each search in seconds (default: 30.0) - include_rag: Whether to include RAG tool in searches (default: False) - auto_ingest_to_rag: Whether to automatically ingest results into RAG (default: True) - oauth_token: Optional OAuth token from HuggingFace login (for RAG LLM)
execute¶Searches multiple tools in parallel.
Parameters: - query: Search query string - max_results_per_tool: Maximum results per tool (default: 10)
Returns: SearchResult with: - query: The search query - evidence: Aggregated list of evidence - sources_searched: List of source names searched - total_found: Total number of results - errors: List of error messages from failed tools
Raises: - SearchError: If search times out
Note: Uses asyncio.gather() for parallel execution. Handles tool failures gracefully (returns errors in SearchResult.errors). Automatically ingests evidence into RAG if enabled.
DeepCritical uses Pydantic AI agents for all AI-powered operations. All agents follow a consistent pattern and use structured output types.
Pydantic AI agents use the Agent class with the following structure:
__init__(model: Any | None = None)async def evaluate(), async def write_report())def create_agent_name(model: Any | None = None, oauth_token: str | None = None) -> AgentNameNote: Factory functions accept an optional oauth_token parameter for HuggingFace authentication, which takes priority over environment variables.
Agents use get_model() from src/agent_factory/judges.py if no model is provided. This supports:
The model selection is based on the configured LLM_PROVIDER in settings.
Agents return fallback values on failure rather than raising exceptions:
KnowledgeGapOutput(research_complete=False, outstanding_gaps=[...])All errors are logged with context using structlog.
All agents validate inputs:
Agents use structured output types from src/utils/models.py:
KnowledgeGapOutput: Research completeness evaluationAgentSelectionPlan: Tool selection planReportDraft: Long-form report structureParsedQuery: Query parsing and mode detectionFor text output (writer agents), agents return str directly.
File: src/agents/knowledge_gap.py
Purpose: Evaluates research state and identifies knowledge gaps.
Output: KnowledgeGapOutput with: - research_complete: Boolean indicating if research is complete - outstanding_gaps: List of remaining knowledge gaps
Methods: - async def evaluate(query, background_context, conversation_history, iteration, time_elapsed_minutes, max_time_minutes) -> KnowledgeGapOutput
File: src/agents/tool_selector.py
Purpose: Selects appropriate tools for addressing knowledge gaps.
Output: AgentSelectionPlan with list of AgentTask objects.
Available Agents: - WebSearchAgent: General web search for fresh information - SiteCrawlerAgent: Research specific entities/companies - RAGAgent: Semantic search within collected evidence
File: src/agents/writer.py
Purpose: Generates final reports from research findings.
Output: Markdown string with numbered citations.
Methods: - async def write_report(query, findings, output_length, output_instructions) -> str
Features: - Validates inputs - Truncates very long findings (max 50000 chars) with warning - Retry logic for transient failures (3 retries) - Citation validation before returning
File: src/agents/long_writer.py
Purpose: Long-form report generation with section-by-section writing.
Input/Output: Uses ReportDraft models.
Methods: - async def write_next_section(query, draft, section_title, section_content) -> LongWriterOutput - async def write_report(query, report_title, report_draft) -> str
Features: - Writes sections iteratively - Aggregates references across sections - Reformats section headings and references - Deduplicates and renumbers references
File: src/agents/proofreader.py
Purpose: Proofreads and polishes report drafts.
Input: ReportDraft Output: Polished markdown string
Methods: - async def proofread(query, report_title, report_draft) -> str
Features: - Removes duplicate content across sections - Adds executive summary if multiple sections - Preserves all references and citations - Improves flow and readability
File: src/agents/thinking.py
Purpose: Generates observations from conversation history.
Output: Observation string
Methods: - async def generate_observations(query, background_context, conversation_history) -> str
File: src/agents/input_parser.py
Purpose: Parses and improves user queries, detects research mode.
Output: ParsedQuery with: - original_query: Original query string - improved_query: Refined query string - research_mode: "iterative" or "deep" - key_entities: List of key entities - research_questions: List of research questions
The following agents use the BaseAgent pattern from agent-framework and are used exclusively with MagenticOrchestrator:
File: src/agents/hypothesis_agent.py
Purpose: Generates mechanistic hypotheses based on evidence.
Pattern: BaseAgent from agent-framework
Methods: - async def run(messages, thread, **kwargs) -> AgentRunResponse
Features: - Uses internal Pydantic AI Agent with HypothesisAssessment output type - Accesses shared evidence_store for evidence - Uses embedding service for diverse evidence selection (MMR algorithm) - Stores hypotheses in shared context
File: src/agents/search_agent.py
Purpose: Wraps SearchHandler as an agent for Magentic orchestrator.
Pattern: BaseAgent from agent-framework
Methods: - async def run(messages, thread, **kwargs) -> AgentRunResponse
Features: - Executes searches via SearchHandlerProtocol - Deduplicates evidence using embedding service - Searches for semantically related evidence - Updates shared evidence store
File: src/agents/analysis_agent.py
Purpose: Performs statistical analysis using Modal sandbox.
Pattern: BaseAgent from agent-framework
Methods: - async def run(messages, thread, **kwargs) -> AgentRunResponse
Features: - Wraps StatisticalAnalyzer service - Analyzes evidence and hypotheses - Returns verdict (SUPPORTED/REFUTED/INCONCLUSIVE) - Stores analysis results in shared context
File: src/agents/report_agent.py
Purpose: Generates structured scientific reports from evidence and hypotheses.
Pattern: BaseAgent from agent-framework
Methods: - async def run(messages, thread, **kwargs) -> AgentRunResponse
Features: - Uses internal Pydantic AI Agent with ResearchReport output type - Accesses shared evidence store and hypotheses - Validates citations before returning - Formats report as markdown
File: src/agents/judge_agent.py
Purpose: Evaluates evidence quality and determines if sufficient for synthesis.
Pattern: BaseAgent from agent-framework
Methods: - async def run(messages, thread, **kwargs) -> AgentRunResponse - async def run_stream(messages, thread, **kwargs) -> AsyncIterable[AgentRunResponseUpdate]
Features: - Wraps JudgeHandlerProtocol - Accesses shared evidence store - Returns JudgeAssessment with sufficient flag, confidence, and recommendation
DeepCritical uses two distinct agent patterns:
These agents use the Pydantic AI Agent class directly and are used in iterative and deep research flows:
Agent(model, output_type, system_prompt)__init__(model: Any | None = None)async def evaluate(), async def write_report())KnowledgeGapAgent, ToolSelectorAgent, WriterAgent, LongWriterAgent, ProofreaderAgent, ThinkingAgent, InputParserAgentThese agents use the BaseAgent class from agent-framework and are used in Magentic orchestrator:
BaseAgent from agent-framework with async def run() method__init__(evidence_store, embedding_service, ...)async def run(messages, thread, **kwargs) -> AgentRunResponseHypothesisAgent, SearchAgent, AnalysisAgent, ReportAgent, JudgeAgentNote: Magentic agents are used exclusively with the MagenticOrchestrator and follow the agent-framework protocol for multi-agent coordination.
All agents have factory functions in src/agent_factory/agents.py:
Factory functions: - Use get_model() if no model provided - Accept oauth_token parameter for HuggingFace authentication - Raise ConfigurationError if creation fails - Log agent creation
DeepCritical implements a graph-based orchestration system for research workflows using Pydantic AI agents as nodes. This enables better parallel execution, conditional routing, and state management compared to simple agent chains.
The iterative research graph follows this pattern:
[Input] → [Thinking] → [Knowledge Gap] → [Decision: Complete?]
- ↓ No ↓ Yes
- [Tool Selector] [Writer]
- ↓
- [Execute Tools] → [Loop Back]
-Node IDs: thinking → knowledge_gap → continue_decision → tool_selector/writer → execute_tools → (loop back to thinking)
Special Node Handling: - execute_tools: State node that uses search_handler to execute searches and add evidence to workflow state - continue_decision: Decision node that routes based on research_complete flag from KnowledgeGapOutput
The deep research graph follows this pattern:
[Input] → [Planner] → [Store Plan] → [Parallel Loops] → [Collect Drafts] → [Synthesizer]
- ↓ ↓ ↓
- [Loop1] [Loop2] [Loop3]
-Node IDs: planner → store_plan → parallel_loops → collect_drafts → synthesizer
Special Node Handling: - planner: Agent node that creates ReportPlan with report outline - store_plan: State node that stores ReportPlan in context for parallel loops - parallel_loops: Parallel node that executes IterativeResearchFlow instances for each section - collect_drafts: State node that collects section drafts from parallel loops - synthesizer: Agent node that calls LongWriterAgent.write_report() directly with ReportDraft
-sequenceDiagram
- actor User
- participant GraphOrchestrator
- participant InputParser
- participant GraphBuilder
- participant GraphExecutor
- participant Agent
- participant BudgetTracker
- participant WorkflowState
-
- User->>GraphOrchestrator: run(query)
- GraphOrchestrator->>InputParser: detect_research_mode(query)
- InputParser-->>GraphOrchestrator: mode (iterative/deep)
- GraphOrchestrator->>GraphBuilder: build_graph(mode)
- GraphBuilder-->>GraphOrchestrator: ResearchGraph
- GraphOrchestrator->>WorkflowState: init_workflow_state()
- GraphOrchestrator->>BudgetTracker: create_budget()
- GraphOrchestrator->>GraphExecutor: _execute_graph(graph)
-
- loop For each node in graph
- GraphExecutor->>Agent: execute_node(agent_node)
- Agent->>Agent: process_input
- Agent-->>GraphExecutor: result
- GraphExecutor->>WorkflowState: update_state(result)
- GraphExecutor->>BudgetTracker: add_tokens(used)
- GraphExecutor->>BudgetTracker: check_budget()
- alt Budget exceeded
- GraphExecutor->>GraphOrchestrator: emit(error_event)
- else Continue
- GraphExecutor->>GraphOrchestrator: emit(progress_event)
- end
- end
-
- GraphOrchestrator->>User: AsyncGenerator[AgentEvent]
- sequenceDiagram
- participant IterativeFlow
- participant ThinkingAgent
- participant KnowledgeGapAgent
- participant ToolSelector
- participant ToolExecutor
- participant JudgeHandler
- participant WriterAgent
-
- IterativeFlow->>IterativeFlow: run(query)
-
- loop Until complete or max_iterations
- IterativeFlow->>ThinkingAgent: generate_observations()
- ThinkingAgent-->>IterativeFlow: observations
-
- IterativeFlow->>KnowledgeGapAgent: evaluate_gaps()
- KnowledgeGapAgent-->>IterativeFlow: KnowledgeGapOutput
-
- alt Research complete
- IterativeFlow->>WriterAgent: create_final_report()
- WriterAgent-->>IterativeFlow: final_report
- else Gaps remain
- IterativeFlow->>ToolSelector: select_agents(gap)
- ToolSelector-->>IterativeFlow: AgentSelectionPlan
-
- IterativeFlow->>ToolExecutor: execute_tool_tasks()
- ToolExecutor-->>IterativeFlow: ToolAgentOutput[]
-
- IterativeFlow->>JudgeHandler: assess_evidence()
- JudgeHandler-->>IterativeFlow: should_continue
- end
- end Graph nodes represent different stages in the research workflow:
Examples: KnowledgeGapAgent, ToolSelectorAgent, ThinkingAgent
State Nodes: Update or read workflow state
Examples: Update evidence, update conversation history
Decision Nodes: Make routing decisions based on conditions
Examples: Continue research vs. complete research
Parallel Nodes: Execute multiple nodes concurrently
Edges define transitions between nodes:
Condition: None (always True)
Conditional Edges: Traversed based on condition
Example: If research complete → go to writer, else → continue loop
Parallel Edges: Used for parallel execution branches
State is managed via WorkflowState using ContextVar for thread-safe isolation:
State transitions occur at state nodes, which update the global workflow state.
create_iterative_graph() or create_deep_graph()ResearchGraph.validate_structure()GraphOrchestrator._execute_graph()agent.run() with transformed inputstate_updater functiondecision_function to get next node IDasyncio.gather()asyncio.gather() for parallel nodesGraphExecutionContext.update_state()AgentEvent objects during execution for UIThe GraphExecutionContext class manages execution state during graph traversal:
WorkflowState instanceBudgetTracker instance for budget enforcementMethods: - set_node_result(node_id, result): Store result from node execution - get_node_result(node_id): Retrieve stored result - has_visited(node_id): Check if node was visited - mark_visited(node_id): Mark node as visited - update_state(updater, data): Update workflow state
Decision nodes evaluate conditions and return next node IDs:
research_complete → writer, else → tool selectorParallel nodes execute multiple nodes concurrently:
Budget constraints are enforced at decision nodes:
If any budget is exceeded, execution routes to exit node.
Errors are handled at multiple levels:
Errors are logged and yield error events for UI.
Graph execution is optional via feature flag:
USE_GRAPH_EXECUTION=true: Use graph-based executionUSE_GRAPH_EXECUTION=false: Use agent chain execution (existing)This allows gradual migration and fallback if needed.
DeepCritical uses middleware for state management, budget tracking, and workflow coordination.
File: src/middleware/state_machine.py
Purpose: Thread-safe state management for research workflows
Implementation: Uses ContextVar for thread-safe isolation
State Components: - evidence: list[Evidence]: Collected evidence from searches - conversation: Conversation: Iteration history (gaps, tool calls, findings, thoughts) - embedding_service: Any: Embedding service for semantic search
Methods: - add_evidence(new_evidence: list[Evidence]) -> int: Adds evidence with URL-based deduplication. Returns the number of new items added (excluding duplicates). - async search_related(query: str, n_results: int = 5) -> list[Evidence]: Semantic search for related evidence using embedding service
Initialization:
Access:
File: src/middleware/workflow_manager.py
Purpose: Coordinates parallel research loops
Methods: - async add_loop(loop_id: str, query: str) -> ResearchLoop: Add a new research loop to manage - async run_loops_parallel(loop_configs: list[dict], loop_func: Callable, judge_handler: Any | None = None, budget_tracker: Any | None = None) -> list[Any]: Run multiple research loops in parallel. Takes configuration dicts and a loop function. - async update_loop_status(loop_id: str, status: LoopStatus, error: str | None = None): Update loop status - async sync_loop_evidence_to_state(loop_id: str): Synchronize evidence from a specific loop to global state
Features: - Uses asyncio.gather() for parallel execution - Handles errors per loop (doesn't fail all if one fails) - Tracks loop status: pending, running, completed, failed, cancelled - Evidence deduplication across parallel loops
Usage:
from src.middleware.workflow_manager import WorkflowManager
-
-manager = WorkflowManager()
-await manager.add_loop("loop1", "Research query 1")
-await manager.add_loop("loop2", "Research query 2")
-
-async def run_research(config: dict) -> str:
- loop_id = config["loop_id"]
- query = config["query"]
- # ... research logic ...
- return "report"
-
-results = await manager.run_loops_parallel(
- loop_configs=[
- {"loop_id": "loop1", "query": "Research query 1"},
- {"loop_id": "loop2", "query": "Research query 2"},
- ],
- loop_func=run_research,
-)
-File: src/middleware/budget_tracker.py
Purpose: Tracks and enforces resource limits
Budget Components: - Tokens: LLM token usage - Time: Elapsed time in seconds - Iterations: Number of iterations
Methods: - create_budget(loop_id: str, tokens_limit: int = 100000, time_limit_seconds: float = 600.0, iterations_limit: int = 10) -> BudgetStatus: Create a budget for a specific loop - add_tokens(loop_id: str, tokens: int): Add token usage to a loop's budget - start_timer(loop_id: str): Start time tracking for a loop - update_timer(loop_id: str): Update elapsed time for a loop - increment_iteration(loop_id: str): Increment iteration count for a loop - check_budget(loop_id: str) -> tuple[bool, str]: Check if a loop's budget has been exceeded. Returns (exceeded: bool, reason: str) - can_continue(loop_id: str) -> bool: Check if a loop can continue based on budget
Token Estimation: - estimate_tokens(text: str) -> int: ~4 chars per token - estimate_llm_call_tokens(prompt: str, response: str) -> int: Estimate LLM call tokens
Usage:
from src.middleware.budget_tracker import BudgetTracker
-
-tracker = BudgetTracker()
-budget = tracker.create_budget(
- loop_id="research_loop",
- tokens_limit=100000,
- time_limit_seconds=600,
- iterations_limit=10
-)
-tracker.start_timer("research_loop")
-# ... research operations ...
-tracker.add_tokens("research_loop", 5000)
-tracker.update_timer("research_loop")
-exceeded, reason = tracker.check_budget("research_loop")
-if exceeded:
- # Budget exceeded, stop research
- pass
-if not tracker.can_continue("research_loop"):
- # Budget exceeded, stop research
- pass
-All middleware models are defined in src/utils/models.py:
IterationData: Data for a single iterationConversation: Conversation history with iterationsResearchLoop: Research loop state and configurationBudgetStatus: Current budget statusAll middleware components use ContextVar for thread-safe isolation:
DeepCritical supports multiple orchestration patterns for research workflows.
File: src/orchestrator/research_flow.py
Pattern: Generate observations → Evaluate gaps → Select tools → Execute → Judge → Continue/Complete
Agents Used: - KnowledgeGapAgent: Evaluates research completeness - ToolSelectorAgent: Selects tools for addressing gaps - ThinkingAgent: Generates observations - WriterAgent: Creates final report - JudgeHandler: Assesses evidence sufficiency
Features: - Tracks iterations, time, budget - Supports graph execution (use_graph=True) and agent chains (use_graph=False) - Iterates until research complete or constraints met
Usage:
File: src/orchestrator/research_flow.py
Pattern: Planner → Parallel iterative loops per section → Synthesizer
Agents Used: - PlannerAgent: Breaks query into report sections - IterativeResearchFlow: Per-section research (parallel) - LongWriterAgent or ProofreaderAgent: Final synthesis
Features: - Uses WorkflowManager for parallel execution - Budget tracking per section and globally - State synchronization across parallel loops - Supports graph execution and agent chains
Usage:
File: src/orchestrator/graph_orchestrator.py
Purpose: Graph-based execution using Pydantic AI agents as nodes
Features: - Uses graph execution (use_graph=True) or agent chains (use_graph=False) as fallback - Routes based on research mode (iterative/deep/auto) - Streams AgentEvent objects for UI - Uses GraphExecutionContext to manage execution state
Node Types: - Agent Nodes: Execute Pydantic AI agents - State Nodes: Update or read workflow state - Decision Nodes: Make routing decisions - Parallel Nodes: Execute multiple nodes concurrently
Edge Types: - Sequential Edges: Always traversed - Conditional Edges: Traversed based on condition - Parallel Edges: Used for parallel execution branches
Special Node Handling:
The GraphOrchestrator has special handling for certain nodes:
execute_tools node: State node that uses search_handler to execute searches and add evidence to workflow stateparallel_loops node: Parallel node that executes IterativeResearchFlow instances for each section in deep research modesynthesizer node: Agent node that calls LongWriterAgent.write_report() directly with ReportDraft instead of using agent.run()writer node: Agent node that calls WriterAgent.write_report() directly with findings instead of using agent.run()GraphExecutionContext:
The orchestrator uses GraphExecutionContext to manage execution state: - Tracks current node, visited nodes, and node results - Manages workflow state and budget tracker - Provides methods to store and retrieve node execution results
File: src/orchestrator_factory.py
Purpose: Factory for creating orchestrators
Modes: - Simple: Legacy orchestrator (backward compatible) - Advanced: Magentic orchestrator (requires OpenAI API key) - Auto-detect: Chooses based on API key availability
Usage:
File: src/orchestrator_magentic.py
Purpose: Multi-agent coordination using Microsoft Agent Framework
Features: - Uses agent-framework-core - ChatAgent pattern with internal LLMs per agent - MagenticBuilder with participants: - searcher: SearchAgent (wraps SearchHandler) - hypothesizer: HypothesisAgent (generates hypotheses) - judge: JudgeAgent (evaluates evidence) - reporter: ReportAgent (generates final report) - Manager orchestrates agents via chat client (OpenAI or HuggingFace) - Event-driven: converts Magentic events to AgentEvent for UI streaming via _process_event() method - Supports max rounds, stall detection, and reset handling
Event Processing:
The orchestrator processes Magentic events and converts them to AgentEvent: - MagenticOrchestratorMessageEvent → AgentEvent with type based on message content - MagenticAgentMessageEvent → AgentEvent with type based on agent name - MagenticAgentDeltaEvent → AgentEvent for streaming updates - MagenticFinalResultEvent → AgentEvent with type "complete"
Requirements: - agent-framework-core package - OpenAI API key or HuggingFace authentication
File: src/orchestrator_hierarchical.py
Purpose: Hierarchical orchestrator using middleware and sub-teams
Features: - Uses SubIterationMiddleware with ResearchTeam and LLMSubIterationJudge - Adapts Magentic ChatAgent to SubIterationTeam protocol - Event-driven via asyncio.Queue for coordination - Supports sub-iteration patterns for complex research tasks
File: src/legacy_orchestrator.py
Purpose: Linear search-judge-synthesize loop
Features: - Uses SearchHandlerProtocol and JudgeHandlerProtocol - Generator-based design yielding AgentEvent objects - Backward compatibility for simple use cases
All orchestrators must initialize workflow state:
All orchestrators yield AgentEvent objects:
Event Types: - started: Research started - searching: Search in progress - search_complete: Search completed - judging: Evidence evaluation in progress - judge_complete: Evidence evaluation completed - looping: Iteration in progress - hypothesizing: Generating hypotheses - analyzing: Statistical analysis in progress - analysis_complete: Statistical analysis completed - synthesizing: Synthesizing results - complete: Research completed - error: Error occurred - streaming: Streaming update (delta events)
Event Structure:
DeepCritical provides several services for embeddings, RAG, and statistical analysis.
File: src/services/embeddings.py
Purpose: Local sentence-transformers for semantic search and deduplication
Features: - No API Key Required: Uses local sentence-transformers models - Async-Safe: All operations use run_in_executor() to avoid blocking the event loop - ChromaDB Storage: In-memory vector storage for embeddings - Deduplication: 0.9 similarity threshold by default (90% similarity = duplicate, configurable)
Model: Configurable via settings.local_embedding_model (default: all-MiniLM-L6-v2)
Methods: - async def embed(text: str) -> list[float]: Generate embeddings (async-safe via run_in_executor()) - async def embed_batch(texts: list[str]) -> list[list[float]]: Batch embedding (more efficient) - async def add_evidence(evidence_id: str, content: str, metadata: dict[str, Any]) -> None: Add evidence to vector store - async def search_similar(query: str, n_results: int = 5) -> list[dict[str, Any]]: Find semantically similar evidence - async def deduplicate(new_evidence: list[Evidence], threshold: float = 0.9) -> list[Evidence]: Remove semantically duplicate evidence
Usage:
from src.services.embeddings import get_embedding_service
-
-service = get_embedding_service()
-embedding = await service.embed("text to embed")
-File: src/services/llamaindex_rag.py
Purpose: Retrieval-Augmented Generation using LlamaIndex
Features: - Multiple Embedding Providers: OpenAI embeddings (requires OPENAI_API_KEY) or local sentence-transformers (no API key) - Multiple LLM Providers: HuggingFace LLM (preferred) or OpenAI LLM (fallback) for query synthesis - ChromaDB Storage: Vector database for document storage (supports in-memory mode) - Metadata Preservation: Preserves source, title, URL, date, authors - Lazy Initialization: Graceful fallback if dependencies not available
Initialization Parameters: - use_openai_embeddings: bool | None: Force OpenAI embeddings (None = auto-detect) - use_in_memory: bool: Use in-memory ChromaDB client (useful for tests) - oauth_token: str | None: Optional OAuth token from HuggingFace login (takes priority over env vars)
Methods: - async def ingest_evidence(evidence: list[Evidence]) -> None: Ingest evidence into RAG - async def retrieve(query: str, top_k: int = 5) -> list[Document]: Retrieve relevant documents - async def query(query: str, top_k: int = 5) -> str: Query with RAG
Usage:
from src.services.llamaindex_rag import get_rag_service
-
-service = get_rag_service(
- use_openai_embeddings=False, # Use local embeddings
- use_in_memory=True, # Use in-memory ChromaDB
- oauth_token=token # Optional HuggingFace token
-)
-if service:
- documents = await service.retrieve("query", top_k=5)
-File: src/services/statistical_analyzer.py
Purpose: Secure execution of AI-generated statistical code
Features: - Modal Sandbox: Secure, isolated execution environment - Code Generation: Generates Python code via LLM - Library Pinning: Version-pinned libraries in SANDBOX_LIBRARIES - Network Isolation: block_network=True by default
Libraries Available: - pandas, numpy, scipy - matplotlib, scikit-learn - statsmodels
Output: AnalysisResult with: - verdict: SUPPORTED, REFUTED, or INCONCLUSIVE - code: Generated analysis code - output: Execution output - error: Error message if execution failed
Usage:
from src.services.statistical_analyzer import StatisticalAnalyzer
-
-analyzer = StatisticalAnalyzer()
-result = await analyzer.analyze(
- hypothesis="Metformin reduces cancer risk",
- evidence=evidence_list
-)
-Services use singleton patterns for lazy initialization:
EmbeddingService: Uses a global variable pattern:
LlamaIndexRAGService: Direct instantiation (no caching):
This ensures: - Single instance per process - Lazy initialization - No dependencies required at import time
Services check availability before use:
from src.utils.config import settings
-
-if settings.modal_available:
- # Use Modal sandbox
- pass
-
-if settings.has_openai_key:
- # Use OpenAI embeddings for RAG
- pass
-DeepCritical implements a protocol-based search tool system for retrieving evidence from multiple sources.
All tools implement the SearchTool protocol from src/tools/base.py:
All tools use the @retry decorator from tenacity:
Tools with API rate limits implement _rate_limit() method and use shared rate limiters from src/tools/rate_limiter.py.
Tools raise custom exceptions:
SearchError: General search failuresRateLimitError: Rate limit exceededTools handle HTTP errors (429, 500, timeout) and return empty lists on non-critical errors (with warning logs).
Tools use preprocess_query() from src/tools/query_utils.py to:
All tools convert API responses to Evidence objects with:
Citation: Title, URL, date, authorscontent: Evidence textrelevance_score: 0.0-1.0 relevance scoremetadata: Additional metadataMissing fields are handled gracefully with defaults.
File: src/tools/pubmed.py
API: NCBI E-utilities (ESearch → EFetch)
Rate Limiting: - 0.34s between requests (3 req/sec without API key) - 0.1s between requests (10 req/sec with NCBI API key)
Features: - XML parsing with xmltodict - Handles single vs. multiple articles - Query preprocessing - Evidence conversion with metadata extraction
File: src/tools/clinicaltrials.py
API: ClinicalTrials.gov API v2
Important: Uses requests library (NOT httpx) because WAF blocks httpx TLS fingerprint.
Execution: Runs in thread pool: await asyncio.to_thread(requests.get, ...)
Filtering: - Only interventional studies - Status: COMPLETED, ACTIVE_NOT_RECRUITING, RECRUITING, ENROLLING_BY_INVITATION
Features: - Parses nested JSON structure - Extracts trial metadata - Evidence conversion
File: src/tools/europepmc.py
API: Europe PMC REST API
Features: - Handles preprint markers: [PREPRINT - Not peer-reviewed] - Builds URLs from DOI or PMID - Checks pubTypeList for preprint detection - Includes both preprints and peer-reviewed articles
File: src/tools/rag_tool.py
Purpose: Semantic search within collected evidence
Implementation: Wraps LlamaIndexRAGService
Features: - Returns Evidence from RAG results - Handles evidence ingestion - Semantic similarity search - Metadata preservation
File: src/tools/search_handler.py
Purpose: Orchestrates parallel searches across multiple tools
Initialization Parameters: - tools: list[SearchTool]: List of search tools to use - timeout: float = 30.0: Timeout for each search in seconds - include_rag: bool = False: Whether to include RAG tool in searches - auto_ingest_to_rag: bool = True: Whether to automatically ingest results into RAG - oauth_token: str | None = None: Optional OAuth token from HuggingFace login (for RAG LLM)
Methods: - async def execute(query: str, max_results_per_tool: int = 10) -> SearchResult: Execute search across all tools in parallel
Features: - Uses asyncio.gather() with return_exceptions=True for parallel execution - Aggregates results into SearchResult with evidence and metadata - Handles tool failures gracefully (continues with other tools) - Deduplicates results by URL - Automatically ingests results into RAG if auto_ingest_to_rag=True - Can add RAG tool dynamically via add_rag_tool() method
Tools are registered in the search handler:
from src.tools.pubmed import PubMedTool
-from src.tools.clinicaltrials import ClinicalTrialsTool
-from src.tools.europepmc import EuropePMCTool
-from src.tools.search_handler import SearchHandler
-
-search_handler = SearchHandler(
- tools=[
- PubMedTool(),
- ClinicalTrialsTool(),
- EuropePMCTool(),
- ],
- include_rag=True, # Include RAG tool for semantic search
- auto_ingest_to_rag=True, # Automatically ingest results into RAG
- oauth_token=token # Optional HuggingFace token for RAG LLM
-)
-
-# Execute search
-result = await search_handler.execute("query", max_results_per_tool=10)
-Architecture Pattern: Microsoft Magentic Orchestration Design Philosophy: Simple, dynamic, manager-driven coordination Key Innovation: Intelligent manager replaces rigid sequential phases
flowchart TD
- Start([User Query]) --> Manager[Magentic Manager<br/>Plan • Select • Assess • Adapt]
-
- Manager -->|Plans| Task1[Task Decomposition]
- Task1 --> Manager
-
- Manager -->|Selects & Executes| HypAgent[Hypothesis Agent]
- Manager -->|Selects & Executes| SearchAgent[Search Agent]
- Manager -->|Selects & Executes| AnalysisAgent[Analysis Agent]
- Manager -->|Selects & Executes| ReportAgent[Report Agent]
-
- HypAgent -->|Results| Manager
- SearchAgent -->|Results| Manager
- AnalysisAgent -->|Results| Manager
- ReportAgent -->|Results| Manager
-
- Manager -->|Assesses Quality| Decision{Good Enough?}
- Decision -->|No - Refine| Manager
- Decision -->|No - Different Agent| Manager
- Decision -->|No - Stalled| Replan[Reset Plan]
- Replan --> Manager
-
- Decision -->|Yes| Synthesis[Synthesize Final Result]
- Synthesis --> Output([Research Report])
-
- style Start fill:#e1f5e1
- style Manager fill:#ffe6e6
- style HypAgent fill:#fff4e6
- style SearchAgent fill:#fff4e6
- style AnalysisAgent fill:#fff4e6
- style ReportAgent fill:#fff4e6
- style Decision fill:#ffd6d6
- style Synthesis fill:#d4edda
- style Output fill:#e1f5e1 flowchart LR
- P1[1. Planning<br/>Analyze task<br/>Create strategy] --> P2[2. Agent Selection<br/>Pick best agent<br/>for subtask]
- P2 --> P3[3. Execution<br/>Run selected<br/>agent with tools]
- P3 --> P4[4. Assessment<br/>Evaluate quality<br/>Check progress]
- P4 --> Decision{Quality OK?<br/>Progress made?}
- Decision -->|Yes| P6[6. Synthesis<br/>Combine results<br/>Generate report]
- Decision -->|No| P5[5. Iteration<br/>Adjust plan<br/>Try again]
- P5 --> P2
- P6 --> Done([Complete])
-
- style P1 fill:#fff4e6
- style P2 fill:#ffe6e6
- style P3 fill:#e6f3ff
- style P4 fill:#ffd6d6
- style P5 fill:#fff3cd
- style P6 fill:#d4edda
- style Done fill:#e1f5e1 graph TB
- subgraph "Orchestration Layer"
- Manager[Magentic Manager<br/>• Plans workflow<br/>• Selects agents<br/>• Assesses quality<br/>• Adapts strategy]
- SharedContext[(Shared Context<br/>• Hypotheses<br/>• Search Results<br/>• Analysis<br/>• Progress)]
- Manager <--> SharedContext
- end
-
- subgraph "Specialist Agents"
- HypAgent[Hypothesis Agent<br/>• Domain understanding<br/>• Hypothesis generation<br/>• Testability refinement]
- SearchAgent[Search Agent<br/>• Multi-source search<br/>• RAG retrieval<br/>• Result ranking]
- AnalysisAgent[Analysis Agent<br/>• Evidence extraction<br/>• Statistical analysis<br/>• Code execution]
- ReportAgent[Report Agent<br/>• Report assembly<br/>• Visualization<br/>• Citation formatting]
- end
-
- subgraph "MCP Tools"
- WebSearch[Web Search<br/>PubMed • arXiv • bioRxiv]
- CodeExec[Code Execution<br/>Sandboxed Python]
- RAG[RAG Retrieval<br/>Vector DB • Embeddings]
- Viz[Visualization<br/>Charts • Graphs]
- end
-
- Manager -->|Selects & Directs| HypAgent
- Manager -->|Selects & Directs| SearchAgent
- Manager -->|Selects & Directs| AnalysisAgent
- Manager -->|Selects & Directs| ReportAgent
-
- HypAgent --> SharedContext
- SearchAgent --> SharedContext
- AnalysisAgent --> SharedContext
- ReportAgent --> SharedContext
-
- SearchAgent --> WebSearch
- SearchAgent --> RAG
- AnalysisAgent --> CodeExec
- ReportAgent --> CodeExec
- ReportAgent --> Viz
-
- style Manager fill:#ffe6e6
- style SharedContext fill:#ffe6f0
- style HypAgent fill:#fff4e6
- style SearchAgent fill:#fff4e6
- style AnalysisAgent fill:#fff4e6
- style ReportAgent fill:#fff4e6
- style WebSearch fill:#e6f3ff
- style CodeExec fill:#e6f3ff
- style RAG fill:#e6f3ff
- style Viz fill:#e6f3ff sequenceDiagram
- participant User
- participant Manager
- participant HypAgent
- participant SearchAgent
- participant AnalysisAgent
- participant ReportAgent
-
- User->>Manager: "Research protein folding in Alzheimer's"
-
- Note over Manager: PLAN: Generate hypotheses → Search → Analyze → Report
-
- Manager->>HypAgent: Generate 3 hypotheses
- HypAgent-->>Manager: Returns 3 hypotheses
- Note over Manager: ASSESS: Good quality, proceed
-
- Manager->>SearchAgent: Search literature for hypothesis 1
- SearchAgent-->>Manager: Returns 15 papers
- Note over Manager: ASSESS: Good results, continue
-
- Manager->>SearchAgent: Search for hypothesis 2
- SearchAgent-->>Manager: Only 2 papers found
- Note over Manager: ASSESS: Insufficient, refine search
-
- Manager->>SearchAgent: Refined query for hypothesis 2
- SearchAgent-->>Manager: Returns 12 papers
- Note over Manager: ASSESS: Better, proceed
-
- Manager->>AnalysisAgent: Analyze evidence for all hypotheses
- AnalysisAgent-->>Manager: Returns analysis with code
- Note over Manager: ASSESS: Complete, generate report
-
- Manager->>ReportAgent: Create comprehensive report
- ReportAgent-->>Manager: Returns formatted report
- Note over Manager: SYNTHESIZE: Combine all results
-
- Manager->>User: Final Research Report flowchart TD
- Start([Manager Receives Task]) --> Plan[Create Initial Plan]
-
- Plan --> Select[Select Agent for Next Subtask]
- Select --> Execute[Execute Agent]
- Execute --> Collect[Collect Results]
-
- Collect --> Assess[Assess Quality & Progress]
-
- Assess --> Q1{Quality Sufficient?}
- Q1 -->|No| Q2{Same Agent Can Fix?}
- Q2 -->|Yes| Feedback[Provide Specific Feedback]
- Feedback --> Execute
- Q2 -->|No| Different[Try Different Agent]
- Different --> Select
-
- Q1 -->|Yes| Q3{Task Complete?}
- Q3 -->|No| Q4{Making Progress?}
- Q4 -->|Yes| Select
- Q4 -->|No - Stalled| Replan[Reset Plan & Approach]
- Replan --> Plan
-
- Q3 -->|Yes| Synth[Synthesize Final Result]
- Synth --> Done([Return Report])
-
- style Start fill:#e1f5e1
- style Plan fill:#fff4e6
- style Select fill:#ffe6e6
- style Execute fill:#e6f3ff
- style Assess fill:#ffd6d6
- style Q1 fill:#ffe6e6
- style Q2 fill:#ffe6e6
- style Q3 fill:#ffe6e6
- style Q4 fill:#ffe6e6
- style Synth fill:#d4edda
- style Done fill:#e1f5e1 flowchart LR
- Input[Research Query] --> Domain[Identify Domain<br/>& Key Concepts]
- Domain --> Context[Retrieve Background<br/>Knowledge]
- Context --> Generate[Generate 3-5<br/>Initial Hypotheses]
- Generate --> Refine[Refine for<br/>Testability]
- Refine --> Rank[Rank by<br/>Quality Score]
- Rank --> Output[Return Top<br/>Hypotheses]
-
- Output --> Struct[Hypothesis Structure:<br/>• Statement<br/>• Rationale<br/>• Testability Score<br/>• Data Requirements<br/>• Expected Outcomes]
-
- style Input fill:#e1f5e1
- style Output fill:#fff4e6
- style Struct fill:#e6f3ff flowchart TD
- Input[Hypotheses] --> Strategy[Formulate Search<br/>Strategy per Hypothesis]
-
- Strategy --> Multi[Multi-Source Search]
-
- Multi --> PubMed[PubMed Search<br/>via MCP]
- Multi --> ArXiv[arXiv Search<br/>via MCP]
- Multi --> BioRxiv[bioRxiv Search<br/>via MCP]
-
- PubMed --> Aggregate[Aggregate Results]
- ArXiv --> Aggregate
- BioRxiv --> Aggregate
-
- Aggregate --> Filter[Filter & Rank<br/>by Relevance]
- Filter --> Dedup[Deduplicate<br/>Cross-Reference]
- Dedup --> Embed[Embed Documents<br/>via MCP]
- Embed --> Vector[(Vector DB)]
- Vector --> RAGRetrieval[RAG Retrieval<br/>Top-K per Hypothesis]
- RAGRetrieval --> Output[Return Contextualized<br/>Search Results]
-
- style Input fill:#fff4e6
- style Multi fill:#ffe6e6
- style Vector fill:#ffe6f0
- style Output fill:#e6f3ff flowchart TD
- Input1[Hypotheses] --> Extract
- Input2[Search Results] --> Extract[Extract Evidence<br/>per Hypothesis]
-
- Extract --> Methods[Determine Analysis<br/>Methods Needed]
-
- Methods --> Branch{Requires<br/>Computation?}
- Branch -->|Yes| GenCode[Generate Python<br/>Analysis Code]
- Branch -->|No| Qual[Qualitative<br/>Synthesis]
-
- GenCode --> Execute[Execute Code<br/>via MCP Sandbox]
- Execute --> Interpret1[Interpret<br/>Results]
- Qual --> Interpret2[Interpret<br/>Findings]
-
- Interpret1 --> Synthesize[Synthesize Evidence<br/>Across Sources]
- Interpret2 --> Synthesize
-
- Synthesize --> Verdict[Determine Verdict<br/>per Hypothesis]
- Verdict --> Support[• Supported<br/>• Refuted<br/>• Inconclusive]
- Support --> Gaps[Identify Knowledge<br/>Gaps & Limitations]
- Gaps --> Output[Return Analysis<br/>Report]
-
- style Input1 fill:#fff4e6
- style Input2 fill:#e6f3ff
- style Execute fill:#ffe6e6
- style Output fill:#e6ffe6 flowchart TD
- Input1[Query] --> Assemble
- Input2[Hypotheses] --> Assemble
- Input3[Search Results] --> Assemble
- Input4[Analysis] --> Assemble[Assemble Report<br/>Sections]
-
- Assemble --> Exec[Executive Summary]
- Assemble --> Intro[Introduction]
- Assemble --> Methods[Methods]
- Assemble --> Results[Results per<br/>Hypothesis]
- Assemble --> Discussion[Discussion]
- Assemble --> Future[Future Directions]
- Assemble --> Refs[References]
-
- Results --> VizCheck{Needs<br/>Visualization?}
- VizCheck -->|Yes| GenViz[Generate Viz Code]
- GenViz --> ExecViz[Execute via MCP<br/>Create Charts]
- ExecViz --> Combine
- VizCheck -->|No| Combine[Combine All<br/>Sections]
-
- Exec --> Combine
- Intro --> Combine
- Methods --> Combine
- Discussion --> Combine
- Future --> Combine
- Refs --> Combine
-
- Combine --> Format[Format Output]
- Format --> MD[Markdown]
- Format --> PDF[PDF]
- Format --> JSON[JSON]
-
- MD --> Output[Return Final<br/>Report]
- PDF --> Output
- JSON --> Output
-
- style Input1 fill:#e1f5e1
- style Input2 fill:#fff4e6
- style Input3 fill:#e6f3ff
- style Input4 fill:#e6ffe6
- style Output fill:#d4edda flowchart TD
- User[👤 User] -->|Research Query| UI[Gradio UI]
- UI -->|Submit| Manager[Magentic Manager]
-
- Manager -->|Event: Planning| UI
- Manager -->|Select Agent| HypAgent[Hypothesis Agent]
- HypAgent -->|Event: Delta/Message| UI
- HypAgent -->|Hypotheses| Context[(Shared Context)]
-
- Context -->|Retrieved by| Manager
- Manager -->|Select Agent| SearchAgent[Search Agent]
- SearchAgent -->|MCP Request| WebSearch[Web Search Tool]
- WebSearch -->|Results| SearchAgent
- SearchAgent -->|Event: Delta/Message| UI
- SearchAgent -->|Documents| Context
- SearchAgent -->|Embeddings| VectorDB[(Vector DB)]
-
- Context -->|Retrieved by| Manager
- Manager -->|Select Agent| AnalysisAgent[Analysis Agent]
- AnalysisAgent -->|MCP Request| CodeExec[Code Execution Tool]
- CodeExec -->|Results| AnalysisAgent
- AnalysisAgent -->|Event: Delta/Message| UI
- AnalysisAgent -->|Analysis| Context
-
- Context -->|Retrieved by| Manager
- Manager -->|Select Agent| ReportAgent[Report Agent]
- ReportAgent -->|MCP Request| CodeExec
- ReportAgent -->|Event: Delta/Message| UI
- ReportAgent -->|Report| Context
-
- Manager -->|Event: Final Result| UI
- UI -->|Display| User
-
- style User fill:#e1f5e1
- style UI fill:#e6f3ff
- style Manager fill:#ffe6e6
- style Context fill:#ffe6f0
- style VectorDB fill:#ffe6f0
- style WebSearch fill:#f0f0f0
- style CodeExec fill:#f0f0f0 graph TB
- subgraph "Agent Layer"
- Manager[Magentic Manager]
- HypAgent[Hypothesis Agent]
- SearchAgent[Search Agent]
- AnalysisAgent[Analysis Agent]
- ReportAgent[Report Agent]
- end
-
- subgraph "MCP Protocol Layer"
- Registry[MCP Tool Registry<br/>• Discovers tools<br/>• Routes requests<br/>• Manages connections]
- end
-
- subgraph "MCP Servers"
- Server1[Web Search Server<br/>localhost:8001<br/>• PubMed<br/>• arXiv<br/>• bioRxiv]
- Server2[Code Execution Server<br/>localhost:8002<br/>• Sandboxed Python<br/>• Package management]
- Server3[RAG Server<br/>localhost:8003<br/>• Vector embeddings<br/>• Similarity search]
- Server4[Visualization Server<br/>localhost:8004<br/>• Chart generation<br/>• Plot rendering]
- end
-
- subgraph "External Services"
- PubMed[PubMed API]
- ArXiv[arXiv API]
- BioRxiv[bioRxiv API]
- Modal[Modal Sandbox]
- ChromaDB[(ChromaDB)]
- end
-
- SearchAgent -->|Request| Registry
- AnalysisAgent -->|Request| Registry
- ReportAgent -->|Request| Registry
-
- Registry --> Server1
- Registry --> Server2
- Registry --> Server3
- Registry --> Server4
-
- Server1 --> PubMed
- Server1 --> ArXiv
- Server1 --> BioRxiv
- Server2 --> Modal
- Server3 --> ChromaDB
-
- style Manager fill:#ffe6e6
- style Registry fill:#fff4e6
- style Server1 fill:#e6f3ff
- style Server2 fill:#e6f3ff
- style Server3 fill:#e6f3ff
- style Server4 fill:#e6f3ff stateDiagram-v2
- [*] --> Initialization: User Query
-
- Initialization --> Planning: Manager starts
-
- Planning --> AgentExecution: Select agent
-
- AgentExecution --> Assessment: Collect results
-
- Assessment --> QualityCheck: Evaluate output
-
- QualityCheck --> AgentExecution: Poor quality<br/>(retry < max_rounds)
- QualityCheck --> Planning: Poor quality<br/>(try different agent)
- QualityCheck --> NextAgent: Good quality<br/>(task incomplete)
- QualityCheck --> Synthesis: Good quality<br/>(task complete)
-
- NextAgent --> AgentExecution: Select next agent
-
- state StallDetection <<choice>>
- Assessment --> StallDetection: Check progress
- StallDetection --> Planning: No progress<br/>(stall count < max)
- StallDetection --> ErrorRecovery: No progress<br/>(max stalls reached)
-
- ErrorRecovery --> PartialReport: Generate partial results
- PartialReport --> [*]
-
- Synthesis --> FinalReport: Combine all outputs
- FinalReport --> [*]
-
- note right of QualityCheck
- Manager assesses:
- • Output completeness
- • Quality metrics
- • Progress made
- end note
-
- note right of StallDetection
- Stall = no new progress
- after agent execution
- Triggers plan reset
- end note graph TD
- App[Gradio App<br/>DeepCritical Research Agent]
-
- App --> Input[Input Section]
- App --> Status[Status Section]
- App --> Output[Output Section]
-
- Input --> Query[Research Question<br/>Text Area]
- Input --> Controls[Controls]
- Controls --> MaxHyp[Max Hypotheses: 1-10]
- Controls --> MaxRounds[Max Rounds: 5-20]
- Controls --> Submit[Start Research Button]
-
- Status --> Log[Real-time Event Log<br/>• Manager planning<br/>• Agent selection<br/>• Execution updates<br/>• Quality assessment]
- Status --> Progress[Progress Tracker<br/>• Current agent<br/>• Round count<br/>• Stall count]
-
- Output --> Tabs[Tabbed Results]
- Tabs --> Tab1[Hypotheses Tab<br/>Generated hypotheses with scores]
- Tabs --> Tab2[Search Results Tab<br/>Papers & sources found]
- Tabs --> Tab3[Analysis Tab<br/>Evidence & verdicts]
- Tabs --> Tab4[Report Tab<br/>Final research report]
- Tab4 --> Download[Download Report<br/>MD / PDF / JSON]
-
- Submit -.->|Triggers| Workflow[Magentic Workflow]
- Workflow -.->|MagenticOrchestratorMessageEvent| Log
- Workflow -.->|MagenticAgentDeltaEvent| Log
- Workflow -.->|MagenticAgentMessageEvent| Log
- Workflow -.->|MagenticFinalResultEvent| Tab4
-
- style App fill:#e1f5e1
- style Input fill:#fff4e6
- style Status fill:#e6f3ff
- style Output fill:#e6ffe6
- style Workflow fill:#ffe6e6 graph LR
- User[👤 Researcher<br/>Asks research questions] -->|Submits query| DC[DeepCritical<br/>Magentic Workflow]
-
- DC -->|Literature search| PubMed[PubMed API<br/>Medical papers]
- DC -->|Preprint search| ArXiv[arXiv API<br/>Scientific preprints]
- DC -->|Biology search| BioRxiv[bioRxiv API<br/>Biology preprints]
- DC -->|Agent reasoning| Claude[Claude API<br/>Sonnet 4 / Opus]
- DC -->|Code execution| Modal[Modal Sandbox<br/>Safe Python env]
- DC -->|Vector storage| Chroma[ChromaDB<br/>Embeddings & RAG]
-
- DC -->|Deployed on| HF[HuggingFace Spaces<br/>Gradio 6.0]
-
- PubMed -->|Results| DC
- ArXiv -->|Results| DC
- BioRxiv -->|Results| DC
- Claude -->|Responses| DC
- Modal -->|Output| DC
- Chroma -->|Context| DC
-
- DC -->|Research report| User
-
- style User fill:#e1f5e1
- style DC fill:#ffe6e6
- style PubMed fill:#e6f3ff
- style ArXiv fill:#e6f3ff
- style BioRxiv fill:#e6f3ff
- style Claude fill:#ffd6d6
- style Modal fill:#f0f0f0
- style Chroma fill:#ffe6f0
- style HF fill:#d4edda gantt
- title DeepCritical Magentic Workflow - Typical Execution
- dateFormat mm:ss
- axisFormat %M:%S
-
- section Manager Planning
- Initial planning :p1, 00:00, 10s
-
- section Hypothesis Agent
- Generate hypotheses :h1, after p1, 30s
- Manager assessment :h2, after h1, 5s
-
- section Search Agent
- Search hypothesis 1 :s1, after h2, 20s
- Search hypothesis 2 :s2, after s1, 20s
- Search hypothesis 3 :s3, after s2, 20s
- RAG processing :s4, after s3, 15s
- Manager assessment :s5, after s4, 5s
-
- section Analysis Agent
- Evidence extraction :a1, after s5, 15s
- Code generation :a2, after a1, 20s
- Code execution :a3, after a2, 25s
- Synthesis :a4, after a3, 20s
- Manager assessment :a5, after a4, 5s
-
- section Report Agent
- Report assembly :r1, after a5, 30s
- Visualization :r2, after r1, 15s
- Formatting :r3, after r2, 10s
-
- section Manager Synthesis
- Final synthesis :f1, after r3, 10s | Aspect | Original (Judge-in-Loop) | New (Magentic) |
|---|---|---|
| Control Flow | Fixed sequential phases | Dynamic agent selection |
| Quality Control | Separate Judge Agent | Manager assessment built-in |
| Retry Logic | Phase-level with feedback | Agent-level with adaptation |
| Flexibility | Rigid 4-phase pipeline | Adaptive workflow |
| Complexity | 5 agents (including Judge) | 4 agents (no Judge) |
| Progress Tracking | Manual state management | Built-in round/stall detection |
| Agent Coordination | Sequential handoff | Manager-driven dynamic selection |
| Error Recovery | Retry same phase | Try different agent or replan |
Simple 4-Agent Setup:
Manager handles quality assessment in its instructions: - Checks hypothesis quality (testable, novel, clear) - Validates search results (relevant, authoritative, recent) - Assesses analysis soundness (methodology, evidence, conclusions) - Ensures report completeness (all sections, proper citations)
No separate Judge Agent needed - manager does it all!
Document Version: 2.0 (Magentic Simplified) Last Updated: 2025-11-24 Architecture: Microsoft Magentic Orchestration Pattern Agents: 4 (Hypothesis, Search, Analysis, Report) + 1 Manager License: MIT
0&&i[i.length-1])&&(p[0]===6||p[0]===2)){r=0;continue}if(p[0]===3&&(!i||p[1]>i[0]&&p[1]=e.length&&(e=void 0),{value:e&&e[o++],done:!e}}};throw new TypeError(t?"Object is not iterable.":"Symbol.iterator is not defined.")}function K(e,t){var r=typeof Symbol=="function"&&e[Symbol.iterator];if(!r)return e;var o=r.call(e),n,i=[],s;try{for(;(t===void 0||t-- >0)&&!(n=o.next()).done;)i.push(n.value)}catch(a){s={error:a}}finally{try{n&&!n.done&&(r=o.return)&&r.call(o)}finally{if(s)throw s.error}}return i}function B(e,t,r){if(r||arguments.length===2)for(var o=0,n=t.length,i;o