Spaces:
Runtime error
Runtime error
A newer version of the Gradio SDK is available:
6.1.0
Current GAIA Multi-Agent Framework Architecture
This document summarizes the architecture of the GAIA multi-agent framework based on the provided Python source files.
Core Framework
- Technology: The system is built using the
llama_index.core.agent.workflow.AgentWorkflowfrom the LlamaIndex library. - Orchestration:
app.pyserves as the main entry point. It initializes a Gradio web interface, fetches benchmark questions from a specified API endpoint, manages file handling (text, image, audio) associated with questions, runs the agent workflow for each question, and submits the answers back to the API. - Root Agent: The workflow designates
planner_agentas theroot_agent, meaning it receives the initial user request (question) and orchestrates the subsequent steps.
Agent Roster and Capabilities
The framework comprises several specialized agents, each designed for specific tasks:
planner_agent(Root):- Purpose: Strategic planning, task decomposition, and final synthesis.
- Tools:
generate_substeps(breaks down objectives using an LLM),synthesize_and_respond(aggregates results into a final report using an LLM). - Workflow: Receives the initial objective, breaks it into sub-steps, delegates these steps to appropriate specialist agents, and finally synthesizes the collected results into a coherent answer.
- Handoffs: Can delegate to
code_agent,research_agent,math_agent,role_agent,image_analyzer_agent,text_analyzer_agent,verifier_agent,reasoning_agent.
role_agent:- Purpose: Determines and sets the appropriate persona or context for the task.
- Tools:
role_prompt_retriever(uses a combination of vector search and BM25 retrieval on thefka/awesome-chatgpt-promptsdataset, followed by reranking, to find the best role/prompt). - Workflow: Interprets user intent, retrieves relevant role descriptions, selects the best fit, and provides the role/prompt.
- Handoffs: Hands off to
planner_agentafter setting the role.
code_agent:- Purpose: Generates and executes Python code.
- Tools:
python_code_generator(uses an OpenAI modelo4-minito generate code from a prompt),code_interpreter(uses LlamaIndex's tool spec, likely for sandboxed execution), and a customSimpleCodeExecutor(executes Python code viasubprocess, not safe for production). - Workflow: Takes a description, generates code, executes/tests it, and returns the result or final code.
- Handoffs: Hands off to
planner_agentorreasoning_agent.
math_agent:- Purpose: Performs mathematical computations.
- Tools: A large suite of functions covering symbolic math (SymPy), matrix operations (NumPy), statistics (NumPy), numerical methods (NumPy, SciPy), vector math (NumPy), probability (SciPy), and potentially more (file was truncated). Also includes WolframAlpha integration.
- Workflow: Executes specific mathematical operations based on requests.
- Handoffs: (Inferred) Likely hands off to
planner_agentorreasoning_agent.
research_agent:- Purpose: Gathers information from the web and specialized sources.
- Tools: Web search (Google, DuckDuckGo, Tavily), web browsing/interaction (Helium/Selenium:
visit,get_text_by_css,get_page_html,click_element,search_item_ctrl_f,go_back,close_popups), Wikipedia search/loading, Yahoo Finance data retrieval, ArXiv paper search. - Workflow: Executes a plan-act-observe loop to find and extract information from various online sources.
- Handoffs: Can delegate to
code_agent,math_agent,analyzer_agent(likely meanttext_analyzer_agentorimage_analyzer_agent),planner_agent,reasoning_agent.
text_analyzer_agent:- Purpose: Extracts text from PDFs and analyzes text content.
- Tools:
extract_text_from_pdf(uses PyPDF2, handles URLs and local files),analyze_text(uses an LLM to generate summary and key facts). - Workflow: If input is PDF, extracts text; then analyzes the text to produce a summary and list of facts.
- Handoffs: Hands off to
verifier_agent.
image_analyzer_agent:- Purpose: Analyzes image content factually.
- Tools: Relies directly on the multimodal capabilities of its underlying LLM (Gemini 1.5 Pro) to process image inputs provided via
ChatMessageblocks. No specific image analysis tool is defined, but the system prompt dictates a detailed, structured analysis format. - Workflow: Receives an image, performs analysis according to a strict factual template.
- Handoffs: Hands off to
planner_agent,research_agent, orreasoning_agent.
verifier_agent:- Purpose: Assesses the confidence of factual statements and detects contradictions.
- Tools:
verify_facts(uses an LLM - Gemini 2.0 Flash - to assign confidence scores),find_contradictions(uses simple string matching for negation pairs). - Workflow: Takes a list of facts, scores them, checks for contradictions, and reports results.
- Handoffs: Hands off to
reasoning_agentorplanner_agent.
reasoning_agent:- Purpose: Performs explicit chain-of-thought reasoning.
- Tools:
reasoning_tool(uses an OpenAI modelo4-miniwith a detailed prompt to perform CoT reasoning over the provided context). - Workflow: Takes context, applies reasoning via the tool, and provides the structured reasoning output.
- Handoffs: Hands off to
planner_agent.
Workflow and Data Flow
- A question (potentially with associated files) arrives at
app.py. app.pyformats the input (e.g.,ChatMessagewithTextBlock,ImageBlock,AudioBlock) and passes it to theAgentWorkflowstarting withplanner_agent.planner_agentbreaks down the task.- It may call
role_agentto set context. - It delegates sub-tasks to specialized agents (
research,code,math,text_analyzer,image_analyzer). - Agents execute their tasks, potentially calling tools or other agents (e.g.,
text_analyzercallsverifier_agent). reasoning_agentmight be called for complex logical steps or verification.- Results flow back up, eventually reaching
planner_agent. planner_agentsynthesizes the final answer usingsynthesize_and_respond.app.pyreceives the final answer and submits it.
Technology Stack Summary
- Core: Python, LlamaIndex
- LLMs: Google Gemini (1.5 Pro, 2.0 Flash), OpenAI (o4-mini)
- UI: Gradio
- Web Interaction: Selenium, Helium
- Data Handling: Pandas, PyPDF2, Requests
- Search/Retrieval: HuggingFace Embeddings/Rerankers, Datasets, LlamaIndex Tool Specs (Google, Tavily, Wikipedia, DuckDuckGo, Yahoo Finance, ArXiv)
- Math: SymPy, NumPy, SciPy, WolframAlpha
- Code Execution: Subprocess (basic executor), LlamaIndex Code Interpreter