File size: 6,307 Bytes
cdeb1d3
 
fae4e5b
cdeb1d3
fae4e5b
 
664f166
fae4e5b
98dc4d3
cdeb1d3
fae4e5b
 
 
 
659d404
 
cdeb1d3
 
fae4e5b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
994b341
 
fae4e5b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
664f166
 
fae4e5b
 
 
 
 
 
 
 
 
 
 
 
994b341
 
fae4e5b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
---
title: TraceMind AI
emoji: πŸ”
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
short_description: AI agent evaluation with MCP-powered intelligence
pinned: false
tags:
  - mcp-in-action-track-enterprise
  - agent-evaluation
  - mcp-client
  - leaderboard
  - gradio
---

# πŸ” TraceMind-AI

Agent Evaluation Platform with MCP-Powered Intelligence

## Overview

TraceMind-AI is a comprehensive platform for evaluating AI agent performance across different models, providers, and configurations. It provides real-time insights, cost analysis, and detailed trace visualization powered by the Model Context Protocol (MCP).

## Features

- **πŸ“Š Real-time Leaderboard**: Live evaluation data from HuggingFace datasets
- **πŸ€– MCP Integration**: AI-powered analysis using remote MCP servers
- **πŸ’° Cost Estimation**: Calculate evaluation costs for different models and configurations
- **πŸ” Trace Visualization**: Detailed OpenTelemetry trace analysis
- **πŸ“ˆ Performance Metrics**: GPU utilization, CO2 emissions, token usage tracking

## MCP Integration

TraceMind-AI demonstrates enterprise MCP client usage by connecting to [TraceMind-mcp-server](https://huggingface.co/spaces/kshitijthakkar/TraceMind-mcp-server) via the Model Context Protocol.

**MCP Tools Used:**
- `analyze_leaderboard` - AI-generated insights about evaluation trends
- `estimate_cost` - Cost estimation with hardware recommendations
- `debug_trace` - Interactive trace analysis and debugging
- `compare_runs` - Side-by-side run comparison
- `analyze_results` - Test case analysis with optimization recommendations

## Quick Start

### Prerequisites
- Python 3.10+
- HuggingFace account (for authentication)
- HuggingFace token (optional, for private datasets)

### Installation

1. Clone the repository:
```bash
git clone https://github.com/Mandark-droid/TraceMind-AI.git
cd TraceMind-AI
```

2. Install dependencies:
```bash
pip install -r requirements.txt
```

3. Configure environment:
```bash
cp .env.example .env
# Edit .env with your configuration
```

4. Run the application:
```bash
python app.py
```

Visit http://localhost:7860

## Configuration

Create a `.env` file with the following variables:

```env
# HuggingFace Configuration
HF_TOKEN=your_token_here

# MCP Server URL
MCP_SERVER_URL=https://kshitijthakkar-tracemind-mcp-server.hf.space/gradio_api/mcp/

# Dataset Configuration
LEADERBOARD_REPO=kshitijthakkar/smoltrace-leaderboard

# Development Mode (optional - disables OAuth for local testing)
DISABLE_OAUTH=true
```

## Data Sources

TraceMind-AI loads evaluation data from HuggingFace datasets:

- **Leaderboard**: Aggregate statistics for all evaluation runs
- **Results**: Individual test case results
- **Traces**: OpenTelemetry trace data
- **Metrics**: GPU metrics and performance data

## Architecture

### Project Structure

```
TraceMind-AI/
β”œβ”€β”€ app.py                 # Main Gradio application
β”œβ”€β”€ data_loader.py         # HuggingFace dataset integration
β”œβ”€β”€ mcp_client/            # MCP client implementation
β”‚   β”œβ”€β”€ client.py          # Async MCP client
β”‚   └── sync_wrapper.py    # Synchronous wrapper
β”œβ”€β”€ utils/                 # Utilities
β”‚   β”œβ”€β”€ auth.py            # HuggingFace OAuth
β”‚   └── navigation.py      # Screen navigation
β”œβ”€β”€ screens/               # UI screens
β”œβ”€β”€ components/            # Reusable components
└── styles/                # Custom CSS
```

### MCP Client Integration

TraceMind-AI uses the MCP Python SDK to connect to remote MCP servers:

```python
from mcp_client.sync_wrapper import get_sync_mcp_client

# Initialize MCP client
mcp_client = get_sync_mcp_client()
mcp_client.initialize()

# Call MCP tools
insights = mcp_client.analyze_leaderboard(
    metric_focus="overall",
    time_range="last_week",
    top_n=5
)
```

## Usage

### Viewing the Leaderboard

1. Log in with your HuggingFace account
2. Navigate to the "Leaderboard" tab
3. Click "Load Leaderboard" to fetch the latest data
4. View AI-powered insights generated by the MCP server

### Estimating Costs

1. Navigate to the "Cost Estimator" tab
2. Enter the model name (e.g., `openai/gpt-4`)
3. Select agent type and number of tests
4. Click "Estimate Cost" for AI-powered analysis

### Viewing Trace Details

1. Select an evaluation run from the leaderboard
2. Click on a specific test case
3. View detailed OpenTelemetry trace visualization
4. Ask questions about the trace using MCP-powered analysis

## Technology Stack

- **UI Framework**: Gradio 5.49.1
- **MCP Protocol**: MCP integration via Gradio
- **Data**: HuggingFace Datasets API
- **Authentication**: HuggingFace OAuth
- **AI**: Google Gemini 2.5 Flash (via MCP server)

## Development

### Running Locally

```bash
# Install dependencies
pip install -r requirements.txt

# Set development mode (optional - disables OAuth)
export DISABLE_OAUTH=true

# Run the app
python app.py
```

### Running on HuggingFace Spaces

This application is configured for deployment on HuggingFace Spaces using the Gradio SDK. The `app.py` file serves as the entry point.

## Documentation

For detailed implementation documentation, see:
- [Data Loader API](data_loader.py) - Dataset loading and caching
- [MCP Client API](mcp_client/client.py) - MCP protocol integration
- [Authentication](utils/auth.py) - HuggingFace OAuth integration

## Demo Video

[Link to demo video showing the application in action]

## Social Media

[Link to social media post about this project]

## License

MIT License - See LICENSE file for details

## Contributing

Contributions are welcome! Please open an issue or submit a pull request.

## Acknowledgments

- **MCP Team** - For the Model Context Protocol specification
- **Gradio Team** - For Gradio 6 with MCP integration
- **HuggingFace** - For Spaces hosting and dataset infrastructure
- **Google** - For Gemini API access

## Links

- **Live Demo**: https://huggingface.co/spaces/kshitijthakkar/TraceMind-AI
- **MCP Server**: https://huggingface.co/spaces/kshitijthakkar/TraceMind-mcp-server
- **GitHub**: https://github.com/Mandark-droid/TraceMind-AI
- **MCP Specification**: https://modelcontextprotocol.io

---

**MCP's 1st Birthday Hackathon Submission**
*Track: MCP in Action - Enterprise*