Spaces:
Running
A newer version of the Gradio SDK is available:
6.1.0
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
DeepBoner is an AI-native sexual health research agent. It uses a search-and-judge loop to autonomously search biomedical databases (PubMed, ClinicalTrials.gov, Europe PMC) and synthesize evidence for queries like "What drugs improve female libido post-menopause?" or "Evidence for testosterone therapy in women with HSDD?".
Current Status: Phases 1-14 COMPLETE (Foundation through Demo Submission).
Development Commands
# Install all dependencies (including dev)
make install # or: uv sync --all-extras && uv run pre-commit install
# Run all quality checks (lint + typecheck + test) - MUST PASS BEFORE COMMIT
make check
# Individual commands
make test # uv run pytest tests/unit/ -v
make lint # uv run ruff check src tests
make format # uv run ruff format src tests
make typecheck # uv run mypy src
make test-cov # uv run pytest --cov=src --cov-report=term-missing
# Run single test
uv run pytest tests/unit/utils/test_config.py::TestSettings::test_default_max_iterations -v
# Integration tests (real APIs)
uv run pytest -m integration
Architecture
Pattern: Search-and-judge loop with multi-tool orchestration.
User Question β Orchestrator
β
Search Loop:
1. Query PubMed, ClinicalTrials.gov, Europe PMC
2. Gather evidence
3. Judge quality ("Do we have enough?")
4. If NO β Refine query, search more
5. If YES β Synthesize findings (+ optional Modal analysis)
β
Research Report with Citations
Key Components:
src/orchestrators/- Orchestrator package (simple, advanced, langgraph modes)simple.py- Main search-and-judge loopadvanced.py- Multi-agent Magentic modelanggraph_orchestrator.py- LangGraph-based workflow
src/tools/pubmed.py- PubMed E-utilities searchsrc/tools/clinicaltrials.py- ClinicalTrials.gov APIsrc/tools/europepmc.py- Europe PMC searchsrc/tools/code_execution.py- Modal sandbox executionsrc/tools/search_handler.py- Scatter-gather orchestrationsrc/services/embeddings.py- Local embeddings (sentence-transformers, in-memory)src/services/llamaindex_rag.py- Premium embeddings (OpenAI, persistent ChromaDB)src/services/embedding_protocol.py- Protocol interface for embedding servicessrc/services/research_memory.py- Shared memory layer for research statesrc/services/statistical_analyzer.py- Statistical analysis via Modalsrc/utils/service_loader.py- Tiered service selection (free vs premium)src/agent_factory/judges.py- LLM-based evidence assessmentsrc/agents/- Magentic multi-agent mode (SearchAgent, JudgeAgent, etc.)src/mcp_tools.py- MCP tool wrappers for Claude Desktopsrc/utils/config.py- Pydantic Settings (loads from.env)src/utils/models.py- Evidence, Citation, SearchResult modelssrc/utils/exceptions.py- Exception hierarchysrc/app.py- Gradio UI with MCP server (HuggingFace Spaces)
Break Conditions: Judge approval, token budget (50K max), or max iterations (default 10).
Configuration
Settings via pydantic-settings from .env:
LLM_PROVIDER: "openai" or "anthropic"OPENAI_API_KEY/ANTHROPIC_API_KEY: LLM keysNCBI_API_KEY: Optional, for higher PubMed rate limitsMODAL_TOKEN_ID/MODAL_TOKEN_SECRET: For Modal sandbox (optional)MAX_ITERATIONS: 1-50, default 10LOG_LEVEL: DEBUG, INFO, WARNING, ERROR
Exception Hierarchy
DeepBonerError (base)
βββ SearchError
β βββ RateLimitError
βββ JudgeError
βββ ConfigurationError
βββ EmbeddingError
Testing
- TDD: Write tests first in
tests/unit/, implement insrc/ - Markers:
unit,integration,slow - Mocking:
respxfor httpx,pytest-mockfor general mocking - Fixtures:
tests/conftest.pyhasmock_httpx_client,mock_llm_response
LLM Model Defaults (November 2025)
Given the rapid advancements, as of November 29, 2025, the DeepBoner project uses the following default LLM models in its configuration (src/utils/config.py):
- OpenAI:
gpt-5- Current flagship model (November 2025). Requires Tier 5 access.
- Anthropic:
claude-sonnet-4-5-20250929- This is the mid-range Claude 4.5 model, released on September 29, 2025.
- The flagship
Claude Opus 4.5(released November 24, 2025) is also available and can be configured by advanced users for enhanced capabilities.
- HuggingFace (Free Tier):
meta-llama/Llama-3.1-70B-Instruct- This remains the default for the free tier, subject to quota limits.
It is crucial to keep these defaults updated as the LLM landscape evolves.
Git Workflow
main: Production-ready (GitHub)dev: Development integration (GitHub)- Remote
origin: GitHub (source of truth for PRs/code review) - Remote
huggingface-upstream: HuggingFace Spaces (deployment target)
HuggingFace Spaces Collaboration:
- Each contributor should use their own dev branch:
yourname-dev(e.g.,vcms-dev,mario-dev) - DO NOT push directly to
mainordevon HuggingFace - these can be overwritten easily - GitHub is the source of truth; HuggingFace is for deployment/demo
- Consider using git hooks to prevent accidental pushes to protected branches