Mandark-droid commited on
Commit
942ce50
·
1 Parent(s): 98dc4d3

Implement Run Detail and Trace Detail screens with full navigation

Browse files

Day 3 (Guide): Run Detail + Trace Detail screens completed

Run Detail Screen (Screen 3):
- Enhanced with 3 tabs: Overview, Test Cases, Performance
- Overview tab: Run metadata with gradient card styling
- Test Cases tab: Interactive dataframe with click-to-trace navigation
- Performance tab: 4-chart dashboard (response time histogram, token usage, cost, success/failure pie)
- Added create_performance_charts() function for performance visualizations

Trace Detail Screen (Screen 4):
- Created complete screen with 5 tabs:
* Thought Graph: Network visualization of agent reasoning flow
* Waterfall: Interactive timeline diagram of span execution
* GPU Metrics: Time series dashboard + raw metrics data (2 sub-tabs)
* Span Details: Detailed table with tokens, cost, duration per span
* Raw Data: JSON view of OpenTelemetry trace data
* Ask About This Trace: Accordion with Q&A placeholder (for MCP integration)

Components Added:
- components/thought_graph.py: Network graph visualization of agent reasoning
- screens/trace_detail.py: All trace visualization functions
* create_span_visualization(): Waterfall chart with color-coded spans
* create_gpu_metrics_dashboard(): Multi-panel GPU metrics time series
* create_gpu_summary_cards(): HTML summary cards for GPU metrics
* process_trace_data(): Trace data processor with timestamp handling
* create_span_table(): JSON view of span details

Navigation Handlers:
- on_test_case_select(): Navigate from Run Detail to Trace Detail
- go_back_to_run_detail(): Back button from Trace Detail to Run Detail
- create_trace_metadata_html(): Trace metadata HTML generator
- create_span_details_table(): Span details dataframe generator

Event Wiring:
- test_cases_table.select → on_test_case_select (loads trace, switches screens)
- back_to_run_detail_btn.click → go_back_to_run_detail (returns to run detail)
- Integrated all 11 trace detail outputs (graphs, tables, JSON)

Navigation Flow:
Leaderboard (Screen 1) → Run Detail (Screen 3) → Trace Detail (Screen 4)
- Click DrillDown row → navigate to Run Detail with 3 tabs
- Click Test Case row → navigate to Trace Detail with 5 tabs
- Back buttons work correctly between all screens

File Stats:
- app.py: 832 → 1193 lines (+361 lines)
- New files: components/thought_graph.py, screens/trace_detail.py
- All functions compile and type-check successfully

Files changed (3) hide show
  1. app.py +447 -85
  2. components/thought_graph.py +398 -0
  3. screens/trace_detail.py +721 -0
app.py CHANGED
@@ -21,8 +21,339 @@ from components.analytics_charts import (
21
  create_cost_efficiency_scatter
22
  )
23
  from components.report_cards import generate_leaderboard_summary_card
 
 
 
 
 
 
24
  from utils.navigation import Navigator, Screen
25
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
  # Initialize data loader
27
  data_loader = create_data_loader_from_env()
28
  navigator = Navigator()
@@ -265,30 +596,8 @@ def on_html_table_row_click(row_index_str):
265
 
266
  results_df = data_loader.load_results(results_dataset)
267
 
268
- # Create metadata HTML
269
- metadata_html = f"""
270
- <div style="background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
271
- padding: 20px; border-radius: 10px; color: white; margin-bottom: 20px;">
272
- <h2 style="margin: 0 0 10px 0;">📊 Run Detail: {run_data.get('model', 'Unknown')}</h2>
273
- <div style="display: grid; grid-template-columns: 1fr 1fr 1fr; gap: 20px; margin-top: 15px;">
274
- <div>
275
- <strong>Agent Type:</strong> {run_data.get('agent_type', 'N/A')}<br>
276
- <strong>Provider:</strong> {run_data.get('provider', 'N/A')}<br>
277
- <strong>Success Rate:</strong> {run_data.get('success_rate', 0):.1f}%
278
- </div>
279
- <div>
280
- <strong>Total Tests:</strong> {run_data.get('total_tests', 0)}<br>
281
- <strong>Successful:</strong> {run_data.get('successful_tests', 0)}<br>
282
- <strong>Failed:</strong> {run_data.get('failed_tests', 0)}
283
- </div>
284
- <div>
285
- <strong>Total Cost:</strong> ${run_data.get('total_cost_usd', 0):.4f}<br>
286
- <strong>Avg Duration:</strong> {run_data.get('avg_duration_ms', 0):.0f}ms<br>
287
- <strong>Submitted By:</strong> {run_data.get('submitted_by', 'Unknown')}
288
- </div>
289
- </div>
290
- </div>
291
- """
292
 
293
  # Format results for display
294
  display_df = results_df.copy()
@@ -358,30 +667,8 @@ def load_run_detail(run_id):
358
 
359
  results_df = data_loader.load_results(results_dataset)
360
 
361
- # Create metadata HTML
362
- metadata_html = f"""
363
- <div style="background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
364
- padding: 20px; border-radius: 10px; color: white; margin-bottom: 20px;">
365
- <h2 style="margin: 0 0 10px 0;">📊 Run Detail: {run_data.get('model', 'Unknown')}</h2>
366
- <div style="display: grid; grid-template-columns: 1fr 1fr 1fr; gap: 20px; margin-top: 15px;">
367
- <div>
368
- <strong>Agent Type:</strong> {run_data.get('agent_type', 'N/A')}<br>
369
- <strong>Provider:</strong> {run_data.get('provider', 'N/A')}<br>
370
- <strong>Success Rate:</strong> {run_data.get('success_rate', 0):.1f}%
371
- </div>
372
- <div>
373
- <strong>Total Tests:</strong> {run_data.get('total_tests', 0)}<br>
374
- <strong>Successful:</strong> {run_data.get('successful_tests', 0)}<br>
375
- <strong>Failed:</strong> {run_data.get('failed_tests', 0)}
376
- </div>
377
- <div>
378
- <strong>Total Cost:</strong> ${run_data.get('total_cost_usd', 0):.4f}<br>
379
- <strong>Avg Duration:</strong> {run_data.get('avg_duration_ms', 0):.0f}ms<br>
380
- <strong>Submitted By:</strong> {run_data.get('submitted_by', 'Unknown')}
381
- </div>
382
- </div>
383
- </div>
384
- """
385
 
386
  # Format results for display
387
  display_df = results_df.copy()
@@ -458,30 +745,8 @@ def on_drilldown_select(evt: gr.SelectData, df):
458
 
459
  results_df = data_loader.load_results(results_dataset)
460
 
461
- # Create metadata HTML
462
- metadata_html = f"""
463
- <div style="background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
464
- padding: 20px; border-radius: 10px; color: white; margin-bottom: 20px;">
465
- <h2 style="margin: 0 0 10px 0;">📊 Run Detail: {run_data.get('model', 'Unknown')}</h2>
466
- <div style="display: grid; grid-template-columns: 1fr 1fr 1fr; gap: 20px; margin-top: 15px;">
467
- <div>
468
- <strong>Agent Type:</strong> {run_data.get('agent_type', 'N/A')}<br>
469
- <strong>Provider:</strong> {run_data.get('provider', 'N/A')}<br>
470
- <strong>Success Rate:</strong> {run_data.get('success_rate', 0):.1f}%
471
- </div>
472
- <div>
473
- <strong>Total Tests:</strong> {run_data.get('total_tests', 0)}<br>
474
- <strong>Successful:</strong> {run_data.get('successful_tests', 0)}<br>
475
- <strong>Failed:</strong> {run_data.get('failed_tests', 0)}
476
- </div>
477
- <div>
478
- <strong>Total Cost:</strong> ${run_data.get('total_cost_usd', 0):.4f}<br>
479
- <strong>Avg Duration:</strong> {run_data.get('avg_duration_ms', 0):.0f}ms<br>
480
- <strong>Submitted By:</strong> {run_data.get('submitted_by', 'Unknown')}
481
- </div>
482
- </div>
483
- </div>
484
- """
485
 
486
  # Format results for display
487
  display_df = results_df.copy()
@@ -697,23 +962,95 @@ with gr.Blocks(title="TraceMind-AI", theme=theme) as app:
697
  # Hidden textbox for row selection (JavaScript bridge)
698
  selected_row_index = gr.Textbox(visible=False, elem_id="selected_row_index")
699
 
700
- # Screen 3: Run Detail
701
  with gr.Column(visible=False) as run_detail_screen:
702
  # Navigation
703
  with gr.Row():
704
  back_to_leaderboard_btn = gr.Button("⬅️ Back to Leaderboard", variant="secondary", size="sm")
705
-
706
- # Run metadata display
707
- run_metadata_html = gr.HTML()
708
-
709
- # Test cases table
710
- gr.Markdown("## 📋 Test Cases")
711
- test_cases_table = gr.Dataframe(
712
- headers=["Task ID", "Status", "Tool", "Duration", "Tokens", "Cost", "Trace ID"],
713
- interactive=False,
714
- wrap=True
715
- )
716
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
717
  # Event handlers
718
  app.load(
719
  fn=load_leaderboard,
@@ -812,6 +1149,31 @@ with gr.Blocks(title="TraceMind-AI", theme=theme) as app:
812
  outputs=[leaderboard_screen, run_detail_screen]
813
  )
814
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
815
  # HTML table row click handler (JavaScript bridge via hidden textbox)
816
  selected_row_index.change(
817
  fn=on_html_table_row_click,
 
21
  create_cost_efficiency_scatter
22
  )
23
  from components.report_cards import generate_leaderboard_summary_card
24
+ from screens.trace_detail import (
25
+ create_span_visualization,
26
+ create_span_table,
27
+ create_gpu_metrics_dashboard,
28
+ create_gpu_summary_cards
29
+ )
30
  from utils.navigation import Navigator, Screen
31
 
32
+
33
+
34
+ # Trace Detail handlers and helpers
35
+
36
+ def create_span_details_table(spans):
37
+ """
38
+ Create table view of span details
39
+
40
+ Args:
41
+ spans: List of span dictionaries
42
+
43
+ Returns:
44
+ DataFrame with span details
45
+ """
46
+ try:
47
+ if not spans:
48
+ return pd.DataFrame(columns=["Span Name", "Kind", "Duration (ms)", "Tokens", "Cost (USD)", "Status"])
49
+
50
+ rows = []
51
+ for span in spans:
52
+ name = span.get('name', 'Unknown')
53
+ kind = span.get('kind', 'INTERNAL')
54
+
55
+ # Get attributes
56
+ attributes = span.get('attributes', {})
57
+ if isinstance(attributes, dict) and 'openinference.span.kind' in attributes:
58
+ kind = attributes.get('openinference.span.kind', kind)
59
+
60
+ # Calculate duration
61
+ start = span.get('startTime') or span.get('startTimeUnixNano', 0)
62
+ end = span.get('endTime') or span.get('endTimeUnixNano', 0)
63
+ duration = (end - start) / 1000000 if start and end else 0 # Convert to ms
64
+
65
+ status = span.get('status', {}).get('code', 'OK') if isinstance(span.get('status'), dict) else 'OK'
66
+
67
+ # Extract tokens and cost information
68
+ tokens_str = "-"
69
+ cost_str = "-"
70
+
71
+ if isinstance(attributes, dict):
72
+ # Check for token usage
73
+ prompt_tokens = attributes.get('gen_ai.usage.prompt_tokens') or attributes.get('llm.token_count.prompt')
74
+ completion_tokens = attributes.get('gen_ai.usage.completion_tokens') or attributes.get('llm.token_count.completion')
75
+ total_tokens = attributes.get('llm.usage.total_tokens')
76
+
77
+ # Build tokens string
78
+ if prompt_tokens is not None and completion_tokens is not None:
79
+ total = int(prompt_tokens) + int(completion_tokens)
80
+ tokens_str = f"{total} ({int(prompt_tokens)}+{int(completion_tokens)})"
81
+ elif total_tokens is not None:
82
+ tokens_str = str(int(total_tokens))
83
+
84
+ # Check for cost
85
+ cost = attributes.get('gen_ai.usage.cost.total') or attributes.get('llm.usage.cost')
86
+ if cost is not None:
87
+ cost_str = f"${float(cost):.6f}"
88
+
89
+ rows.append({
90
+ "Span Name": name,
91
+ "Kind": kind,
92
+ "Duration (ms)": round(duration, 2),
93
+ "Tokens": tokens_str,
94
+ "Cost (USD)": cost_str,
95
+ "Status": status
96
+ })
97
+
98
+ return pd.DataFrame(rows)
99
+
100
+ except Exception as e:
101
+ print(f"[ERROR] create_span_details_table: {e}")
102
+ import traceback
103
+ traceback.print_exc()
104
+ return pd.DataFrame(columns=["Span Name", "Kind", "Duration (ms)", "Tokens", "Cost (USD)", "Status"])
105
+
106
+
107
+ def create_trace_metadata_html(trace_data: dict) -> str:
108
+ """Create HTML for trace metadata display"""
109
+ trace_id = trace_data.get('trace_id', 'Unknown')
110
+ spans = trace_data.get('spans', [])
111
+ if hasattr(spans, 'tolist'):
112
+ spans = spans.tolist()
113
+ elif not isinstance(spans, list):
114
+ spans = list(spans) if spans is not None else []
115
+
116
+ metadata_html = f"""
117
+ <div style="background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
118
+ padding: 20px; border-radius: 10px; color: white; margin-bottom: 20px;">
119
+ <h3 style="margin: 0 0 10px 0;">Trace Information</h3>
120
+ <div style="display: grid; grid-template-columns: 1fr 1fr; gap: 15px;">
121
+ <div>
122
+ <strong>Trace ID:</strong> {trace_id}<br>
123
+ <strong>Total Spans:</strong> {len(spans)}
124
+ </div>
125
+ </div>
126
+ </div>
127
+ """
128
+ return metadata_html
129
+
130
+
131
+ def on_test_case_select(evt: gr.SelectData, df):
132
+ """Handle test case selection in run detail - navigate to trace detail"""
133
+ global current_selected_run, current_selected_trace
134
+
135
+ print(f"[DEBUG] on_test_case_select called with index: {evt.index}")
136
+
137
+ # Check if we have a selected run
138
+ if current_selected_run is None:
139
+ print("[ERROR] No run selected - current_selected_run is None")
140
+ gr.Warning("Please select a run from the leaderboard first")
141
+ return {}
142
+
143
+ try:
144
+ # Get selected test case
145
+ selected_idx = evt.index[0]
146
+ if df is None or df.empty or selected_idx >= len(df):
147
+ gr.Warning("Invalid test case selection")
148
+ return {}
149
+
150
+ test_case = df.iloc[selected_idx].to_dict()
151
+ trace_id = test_case.get('trace_id')
152
+
153
+ print(f"[DEBUG] Selected test case: {test_case.get('task_id', 'Unknown')} (trace_id: {trace_id})")
154
+
155
+ # Load trace data
156
+ traces_dataset = current_selected_run.get('traces_dataset')
157
+ if not traces_dataset:
158
+ gr.Warning("No traces dataset found in current run")
159
+ return {}
160
+
161
+ trace_data = data_loader.get_trace_by_id(traces_dataset, trace_id)
162
+
163
+ if not trace_data:
164
+ gr.Warning(f"Trace not found: {trace_id}")
165
+ return {}
166
+
167
+ current_selected_trace = trace_data
168
+
169
+ # Get spans and ensure it's a list
170
+ spans = trace_data.get('spans', [])
171
+ if hasattr(spans, 'tolist'):
172
+ spans = spans.tolist()
173
+ elif not isinstance(spans, list):
174
+ spans = list(spans) if spans is not None else []
175
+
176
+ print(f"[DEBUG] Loaded trace with {len(spans)} spans")
177
+
178
+ # Create visualizations
179
+ span_viz_plot = create_span_visualization(spans, trace_id)
180
+ span_details_json = create_span_table(spans).value
181
+
182
+ # Create thought graph
183
+ from components.thought_graph import create_thought_graph as create_network_graph
184
+ thought_graph_plot = create_network_graph(spans, trace_id)
185
+
186
+ # Create span details table
187
+ span_table_df = create_span_details_table(spans)
188
+
189
+ # Load GPU metrics (if available)
190
+ gpu_summary_html = "<div style='padding: 20px; text-align: center;'>⚠️ No GPU metrics available (expected for API models)</div>"
191
+ gpu_plot = None
192
+ gpu_json_data = {}
193
+
194
+ try:
195
+ if 'metrics_dataset' in current_selected_run and current_selected_run['metrics_dataset']:
196
+ metrics_dataset = current_selected_run['metrics_dataset']
197
+ gpu_metrics_data = data_loader.load_metrics(metrics_dataset)
198
+
199
+ if gpu_metrics_data is not None and not gpu_metrics_data.empty:
200
+ gpu_plot = create_gpu_metrics_dashboard(gpu_metrics_data)
201
+ gpu_summary_html = create_gpu_summary_cards(gpu_metrics_data)
202
+ gpu_json_data = gpu_metrics_data.to_dict('records')
203
+ except Exception as e:
204
+ print(f"[WARNING] Could not load GPU metrics: {e}")
205
+
206
+ # Return dictionary with visibility updates and data
207
+ return {
208
+ run_detail_screen: gr.update(visible=False),
209
+ trace_detail_screen: gr.update(visible=True),
210
+ trace_title: gr.update(value=f"# 🔍 Trace Detail: {trace_id}"),
211
+ trace_metadata_html: gr.update(value=create_trace_metadata_html(trace_data)),
212
+ trace_thought_graph: gr.update(value=thought_graph_plot),
213
+ span_visualization: gr.update(value=span_viz_plot),
214
+ span_details_table: gr.update(value=span_table_df),
215
+ span_details_json: gr.update(value=span_details_json),
216
+ gpu_summary_cards_html: gr.update(value=gpu_summary_html),
217
+ gpu_metrics_plot: gr.update(value=gpu_plot),
218
+ gpu_metrics_json: gr.update(value=gpu_json_data)
219
+ }
220
+
221
+ except Exception as e:
222
+ print(f"[ERROR] on_test_case_select failed: {e}")
223
+ import traceback
224
+ traceback.print_exc()
225
+ gr.Warning(f"Error loading trace: {e}")
226
+ return {}
227
+
228
+
229
+
230
+ def create_performance_charts(results_df):
231
+ """
232
+ Create performance analysis charts for the Performance tab
233
+
234
+ Args:
235
+ results_df: DataFrame with test results
236
+
237
+ Returns:
238
+ Plotly figure with performance metrics
239
+ """
240
+ import plotly.graph_objects as go
241
+ from plotly.subplots import make_subplots
242
+
243
+ try:
244
+ if results_df.empty:
245
+ fig = go.Figure()
246
+ fig.add_annotation(text="No performance data available", showarrow=False)
247
+ return fig
248
+
249
+ # Create 2x2 subplots
250
+ fig = make_subplots(
251
+ rows=2, cols=2,
252
+ subplot_titles=(
253
+ "Response Time Distribution",
254
+ "Token Usage per Test",
255
+ "Cost per Test",
256
+ "Success vs Failure"
257
+ ),
258
+ specs=[[{"type": "histogram"}, {"type": "bar"}],
259
+ [{"type": "bar"}, {"type": "pie"}]]
260
+ )
261
+
262
+ # 1. Response Time Distribution (Histogram)
263
+ if 'execution_time_ms' in results_df.columns:
264
+ fig.add_trace(
265
+ go.Histogram(
266
+ x=results_df['execution_time_ms'],
267
+ nbinsx=20,
268
+ marker_color='#3498DB',
269
+ name='Response Time',
270
+ showlegend=False
271
+ ),
272
+ row=1, col=1
273
+ )
274
+ fig.update_xaxes(title_text="Time (ms)", row=1, col=1)
275
+ fig.update_yaxes(title_text="Count", row=1, col=1)
276
+
277
+ # 2. Token Usage per Test (Bar)
278
+ if 'total_tokens' in results_df.columns:
279
+ test_indices = list(range(len(results_df)))
280
+ fig.add_trace(
281
+ go.Bar(
282
+ x=test_indices,
283
+ y=results_df['total_tokens'],
284
+ marker_color='#9B59B6',
285
+ name='Tokens',
286
+ showlegend=False
287
+ ),
288
+ row=1, col=2
289
+ )
290
+ fig.update_xaxes(title_text="Test Index", row=1, col=2)
291
+ fig.update_yaxes(title_text="Tokens", row=1, col=2)
292
+
293
+ # 3. Cost per Test (Bar)
294
+ if 'cost_usd' in results_df.columns:
295
+ test_indices = list(range(len(results_df)))
296
+ fig.add_trace(
297
+ go.Bar(
298
+ x=test_indices,
299
+ y=results_df['cost_usd'],
300
+ marker_color='#E67E22',
301
+ name='Cost',
302
+ showlegend=False
303
+ ),
304
+ row=2, col=1
305
+ )
306
+ fig.update_xaxes(title_text="Test Index", row=2, col=1)
307
+ fig.update_yaxes(title_text="Cost (USD)", row=2, col=1)
308
+
309
+ # 4. Success vs Failure (Pie)
310
+ if 'success' in results_df.columns:
311
+ # Convert to boolean if needed
312
+ success_series = results_df['success']
313
+ if success_series.dtype == object:
314
+ success_series = success_series == "✅"
315
+
316
+ success_count = int(success_series.sum())
317
+ failure_count = len(results_df) - success_count
318
+
319
+ fig.add_trace(
320
+ go.Pie(
321
+ labels=['Success', 'Failure'],
322
+ values=[success_count, failure_count],
323
+ marker_colors=['#2ECC71', '#E74C3C'],
324
+ showlegend=True
325
+ ),
326
+ row=2, col=2
327
+ )
328
+
329
+ # Update layout
330
+ fig.update_layout(
331
+ height=700,
332
+ showlegend=False,
333
+ title_text="Performance Analysis Dashboard",
334
+ title_x=0.5
335
+ )
336
+
337
+ return fig
338
+
339
+ except Exception as e:
340
+ print(f"[ERROR] create_performance_charts: {e}")
341
+ import traceback
342
+ traceback.print_exc()
343
+ fig = go.Figure()
344
+ fig.add_annotation(text=f"Error creating charts: {str(e)}", showarrow=False)
345
+ return fig
346
+
347
+
348
+
349
+ def go_back_to_run_detail():
350
+ """Navigate from trace detail back to run detail"""
351
+ return {
352
+ run_detail_screen: gr.update(visible=True),
353
+ trace_detail_screen: gr.update(visible=False)
354
+ }
355
+
356
+
357
  # Initialize data loader
358
  data_loader = create_data_loader_from_env()
359
  navigator = Navigator()
 
596
 
597
  results_df = data_loader.load_results(results_dataset)
598
 
599
+ # Generate performance chart
600
+ perf_chart = create_performance_charts(results_df)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
601
 
602
  # Format results for display
603
  display_df = results_df.copy()
 
667
 
668
  results_df = data_loader.load_results(results_dataset)
669
 
670
+ # Generate performance chart
671
+ perf_chart = create_performance_charts(results_df)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
672
 
673
  # Format results for display
674
  display_df = results_df.copy()
 
745
 
746
  results_df = data_loader.load_results(results_dataset)
747
 
748
+ # Generate performance chart
749
+ perf_chart = create_performance_charts(results_df)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
750
 
751
  # Format results for display
752
  display_df = results_df.copy()
 
962
  # Hidden textbox for row selection (JavaScript bridge)
963
  selected_row_index = gr.Textbox(visible=False, elem_id="selected_row_index")
964
 
965
+ # Screen 3: Run Detail (Enhanced with Tabs)
966
  with gr.Column(visible=False) as run_detail_screen:
967
  # Navigation
968
  with gr.Row():
969
  back_to_leaderboard_btn = gr.Button("⬅️ Back to Leaderboard", variant="secondary", size="sm")
970
+
971
+ run_detail_title = gr.Markdown("# 📊 Run Detail")
972
+
973
+ with gr.Tabs():
974
+ with gr.TabItem("📋 Overview"):
975
+ gr.Markdown("*Run metadata and summary*")
976
+ run_metadata_html = gr.HTML("")
977
+
978
+ with gr.TabItem("✅ Test Cases"):
979
+ gr.Markdown("*Individual test case results*")
980
+ test_cases_table = gr.Dataframe(
981
+ headers=["Task ID", "Status", "Tool", "Duration", "Tokens", "Cost", "Trace ID"],
982
+ interactive=False,
983
+ wrap=True
984
+ )
985
+ gr.Markdown("*Click a test case to view detailed trace (including Thought Graph)*")
986
+
987
+ with gr.TabItem("⚡ Performance"):
988
+ gr.Markdown("*Performance metrics and charts*")
989
+ performance_charts = gr.Plot(label="Performance Analysis", show_label=False)
990
+
991
+ # Screen 4: Trace Detail with Sub-tabs
992
+ with gr.Column(visible=False) as trace_detail_screen:
993
+ with gr.Row():
994
+ back_to_run_detail_btn = gr.Button("⬅️ Back to Run Detail", variant="secondary", size="sm")
995
+
996
+ trace_title = gr.Markdown("# 🔍 Trace Detail")
997
+ trace_metadata_html = gr.HTML("")
998
+
999
+ with gr.Tabs():
1000
+ with gr.TabItem("🧠 Thought Graph"):
1001
+ gr.Markdown("""
1002
+ ### Agent Reasoning Flow
1003
+
1004
+ This interactive network graph shows **how your agent thinks** - the logical flow of reasoning steps,
1005
+ tool calls, and LLM interactions.
1006
+
1007
+ **How to read it:**
1008
+ - 🟣 **Purple nodes** = LLM reasoning steps
1009
+ - 🟠 **Orange nodes** = Tool calls
1010
+ - 🔵 **Blue nodes** = Chains/Agents
1011
+ - **Arrows** = Flow from one step to the next
1012
+ - **Hover** = See tokens, costs, and timing details
1013
+ """)
1014
+ trace_thought_graph = gr.Plot(label="Thought Graph", show_label=False)
1015
+
1016
+ with gr.TabItem("📊 Waterfall"):
1017
+ gr.Markdown("*Interactive waterfall diagram showing span execution timeline*")
1018
+ gr.Markdown("*Hover over spans for details. Drag to zoom, double-click to reset.*")
1019
+ span_visualization = gr.Plot(label="Trace Waterfall", show_label=False)
1020
+
1021
+ with gr.TabItem("🖥️ GPU Metrics"):
1022
+ gr.Markdown("*Performance metrics for GPU-based models (not available for API models)*")
1023
+ gpu_summary_cards_html = gr.HTML(label="GPU Summary", show_label=False)
1024
+
1025
+ with gr.Tabs():
1026
+ with gr.TabItem("📈 Time Series Dashboard"):
1027
+ gpu_metrics_plot = gr.Plot(label="GPU Metrics Over Time", show_label=False)
1028
+
1029
+ with gr.TabItem("📋 Raw Metrics Data"):
1030
+ gpu_metrics_json = gr.JSON(label="GPU Metrics Data")
1031
+
1032
+ with gr.TabItem("📝 Span Details"):
1033
+ gr.Markdown("*Detailed span information with token and cost data*")
1034
+ span_details_table = gr.Dataframe(
1035
+ headers=["Span Name", "Kind", "Duration (ms)", "Tokens", "Cost (USD)", "Status"],
1036
+ interactive=False,
1037
+ wrap=True,
1038
+ label="Span Breakdown"
1039
+ )
1040
+
1041
+ with gr.TabItem("🔍 Raw Data"):
1042
+ gr.Markdown("*Raw OpenTelemetry trace data (JSON)*")
1043
+ span_details_json = gr.JSON()
1044
+
1045
+ with gr.Accordion("🤖 Ask About This Trace", open=False):
1046
+ trace_question = gr.Textbox(
1047
+ label="Question",
1048
+ placeholder="e.g., Why was the tool called twice?",
1049
+ lines=2
1050
+ )
1051
+ trace_ask_btn = gr.Button("Ask", variant="primary")
1052
+ trace_answer = gr.Markdown("*Ask a question to get AI-powered insights*")
1053
+
1054
  # Event handlers
1055
  app.load(
1056
  fn=load_leaderboard,
 
1149
  outputs=[leaderboard_screen, run_detail_screen]
1150
  )
1151
 
1152
+ # Trace detail navigation
1153
+ test_cases_table.select(
1154
+ fn=on_test_case_select,
1155
+ inputs=[test_cases_table],
1156
+ outputs=[
1157
+ run_detail_screen,
1158
+ trace_detail_screen,
1159
+ trace_title,
1160
+ trace_metadata_html,
1161
+ trace_thought_graph,
1162
+ span_visualization,
1163
+ span_details_table,
1164
+ span_details_json,
1165
+ gpu_summary_cards_html,
1166
+ gpu_metrics_plot,
1167
+ gpu_metrics_json
1168
+ ]
1169
+ )
1170
+
1171
+ back_to_run_detail_btn.click(
1172
+ fn=go_back_to_run_detail,
1173
+ outputs=[run_detail_screen, trace_detail_screen]
1174
+ )
1175
+
1176
+
1177
  # HTML table row click handler (JavaScript bridge via hidden textbox)
1178
  selected_row_index.change(
1179
  fn=on_html_table_row_click,
components/thought_graph.py ADDED
@@ -0,0 +1,398 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Thought Graph Visualization Component
3
+ Visualizes agent reasoning flow as an interactive network graph
4
+ """
5
+
6
+ import plotly.graph_objects as go
7
+ import networkx as nx
8
+ from typing import List, Dict, Any, Tuple
9
+ import colorsys
10
+
11
+
12
+ def create_thought_graph(spans: List[Dict[str, Any]], trace_id: str = "Unknown") -> go.Figure:
13
+ """
14
+ Create an interactive thought graph showing agent reasoning flow
15
+
16
+ This is different from the waterfall chart - it shows the logical flow
17
+ of the agent's thinking process (LLM calls, Tool calls, etc.) as a
18
+ directed graph rather than a timeline.
19
+
20
+ Args:
21
+ spans: List of OpenTelemetry span dictionaries
22
+ trace_id: Trace identifier
23
+
24
+ Returns:
25
+ Plotly figure with interactive network graph
26
+ """
27
+
28
+ # Ensure spans is a list
29
+ if hasattr(spans, 'tolist'):
30
+ spans = spans.tolist()
31
+ elif not isinstance(spans, list):
32
+ spans = list(spans) if spans is not None else []
33
+
34
+ if not spans:
35
+ # Return empty figure with message
36
+ fig = go.Figure()
37
+ fig.add_annotation(
38
+ text="No reasoning steps to display",
39
+ xref="paper", yref="paper",
40
+ x=0.5, y=0.5, xanchor='center', yanchor='middle',
41
+ showarrow=False,
42
+ font=dict(size=20)
43
+ )
44
+ return fig
45
+
46
+ # Build graph from spans
47
+ G = nx.DiGraph()
48
+
49
+ # First pass: Add all nodes and build span_map
50
+ span_map = {}
51
+ for span in spans:
52
+ span_id = span.get('spanId') or span.get('span_id') or span.get('spanID')
53
+ if not span_id:
54
+ continue
55
+
56
+ # Get span details
57
+ name = span.get('name', 'Unknown')
58
+ kind = span.get('kind', 'INTERNAL')
59
+ attributes = span.get('attributes', {})
60
+
61
+ # Check for OpenInference span kind
62
+ if isinstance(attributes, dict) and 'openinference.span.kind' in attributes:
63
+ openinference_kind = attributes.get('openinference.span.kind', kind)
64
+ if openinference_kind: # Only call .upper() if not None
65
+ kind = openinference_kind.upper()
66
+
67
+ # Extract metadata for node
68
+ node_data = {
69
+ 'span_id': span_id,
70
+ 'name': name,
71
+ 'kind': kind,
72
+ 'attributes': attributes,
73
+ 'status': span.get('status', {}).get('code', 'OK')
74
+ }
75
+
76
+ # Add token and cost info if available
77
+ if isinstance(attributes, dict):
78
+ # Token info
79
+ if 'gen_ai.usage.prompt_tokens' in attributes:
80
+ node_data['prompt_tokens'] = attributes['gen_ai.usage.prompt_tokens']
81
+ if 'gen_ai.usage.completion_tokens' in attributes:
82
+ node_data['completion_tokens'] = attributes['gen_ai.usage.completion_tokens']
83
+
84
+ # Cost info
85
+ if 'gen_ai.usage.cost.total' in attributes:
86
+ node_data['cost'] = attributes['gen_ai.usage.cost.total']
87
+ elif 'llm.usage.cost' in attributes:
88
+ node_data['cost'] = attributes['llm.usage.cost']
89
+
90
+ # Model info
91
+ if 'gen_ai.request.model' in attributes:
92
+ node_data['model'] = attributes['gen_ai.request.model']
93
+ elif 'llm.model' in attributes:
94
+ node_data['model'] = attributes['llm.model']
95
+
96
+ # Tool info
97
+ if 'tool.name' in attributes:
98
+ node_data['tool_name'] = attributes['tool.name']
99
+
100
+ # Add node to graph
101
+ G.add_node(span_id, **node_data)
102
+ span_map[span_id] = span
103
+
104
+ # Second pass: Add all edges (now all nodes exist in span_map)
105
+ for span in spans:
106
+ span_id = span.get('spanId') or span.get('span_id') or span.get('spanID')
107
+ if not span_id:
108
+ continue
109
+
110
+ parent_id = span.get('parentSpanId') or span.get('parent_span_id') or span.get('parentSpanID')
111
+ if parent_id and parent_id in span_map:
112
+ G.add_edge(parent_id, span_id)
113
+ print(f"[DEBUG] Added edge: {parent_id} → {span_id}")
114
+
115
+ print(f"[DEBUG] Graph created: {G.number_of_nodes()} nodes, {G.number_of_edges()} edges")
116
+
117
+ if G.number_of_nodes() == 0:
118
+ # Return empty figure with message
119
+ fig = go.Figure()
120
+ fig.add_annotation(
121
+ text="No valid spans to display",
122
+ xref="paper", yref="paper",
123
+ x=0.5, y=0.5, xanchor='center', yanchor='middle',
124
+ showarrow=False,
125
+ font=dict(size=20)
126
+ )
127
+ return fig
128
+
129
+ # Calculate layout using hierarchical layout
130
+ try:
131
+ # Try to use hierarchical layout (for DAGs)
132
+ pos = nx.spring_layout(G, k=2, iterations=50, seed=42)
133
+
134
+ # If graph is a DAG, use hierarchical layout
135
+ if nx.is_directed_acyclic_graph(G):
136
+ # Get levels using longest_path_length
137
+ levels = {}
138
+ for node in G.nodes():
139
+ # Find longest path from any root to this node
140
+ try:
141
+ # Get all paths from roots to this node
142
+ roots = [n for n in G.nodes() if G.in_degree(n) == 0]
143
+ max_depth = 0
144
+ for root in roots:
145
+ if nx.has_path(G, root, node):
146
+ paths = list(nx.all_simple_paths(G, root, node))
147
+ max_depth = max(max_depth, max(len(p) for p in paths) if paths else 0)
148
+ levels[node] = max_depth
149
+ except:
150
+ levels[node] = 0
151
+
152
+ # Create hierarchical layout
153
+ pos = create_hierarchical_layout(G, levels)
154
+ except Exception as e:
155
+ print(f"[DEBUG] Layout calculation error: {e}")
156
+ # Fallback to circular layout
157
+ pos = nx.circular_layout(G)
158
+
159
+ # Extract node positions
160
+ node_x = []
161
+ node_y = []
162
+ node_text = []
163
+ node_colors = []
164
+ node_sizes = []
165
+ hover_text = []
166
+
167
+ for node in G.nodes():
168
+ x, y = pos[node]
169
+ node_x.append(x)
170
+ node_y.append(y)
171
+
172
+ # Get node data
173
+ node_data = G.nodes[node]
174
+ name = node_data.get('name', 'Unknown')
175
+ kind = node_data.get('kind', 'INTERNAL')
176
+
177
+ # Create label (shortened)
178
+ label = shorten_label(name, max_length=20)
179
+ node_text.append(label)
180
+
181
+ # Assign color based on kind
182
+ color = get_node_color(kind, node_data.get('status', 'OK'))
183
+ node_colors.append(color)
184
+
185
+ # Size based on importance (LLM and AGENT nodes are larger)
186
+ size = 40 if kind in ['LLM', 'AGENT', 'CHAIN'] else 30
187
+ node_sizes.append(size)
188
+
189
+ # Create detailed hover text
190
+ hover = f"<b>{name}</b><br>"
191
+ hover += f"Type: {kind}<br>"
192
+ hover += f"Status: {node_data.get('status', 'OK')}<br>"
193
+
194
+ if 'model' in node_data:
195
+ hover += f"Model: {node_data['model']}<br>"
196
+ if 'tool_name' in node_data:
197
+ hover += f"Tool: {node_data['tool_name']}<br>"
198
+ if 'prompt_tokens' in node_data or 'completion_tokens' in node_data:
199
+ prompt = node_data.get('prompt_tokens', 0)
200
+ completion = node_data.get('completion_tokens', 0)
201
+ hover += f"Tokens: {prompt + completion} (p:{prompt}, c:{completion})<br>"
202
+ if 'cost' in node_data and node_data['cost'] is not None:
203
+ hover += f"Cost: ${node_data['cost']:.6f}<br>"
204
+
205
+ hover_text.append(hover)
206
+
207
+ # Extract edges
208
+ edge_x = []
209
+ edge_y = []
210
+ edge_traces = []
211
+
212
+ print(f"[DEBUG] Drawing {G.number_of_edges()} edges")
213
+ for edge in G.edges():
214
+ x0, y0 = pos[edge[0]]
215
+ x1, y1 = pos[edge[1]]
216
+ print(f"[DEBUG] Edge from ({x0:.2f}, {y0:.2f}) to ({x1:.2f}, {y1:.2f})")
217
+
218
+ # Create edge line (make it thicker and darker for visibility)
219
+ edge_trace = go.Scatter(
220
+ x=[x0, x1, None],
221
+ y=[y0, y1, None],
222
+ mode='lines',
223
+ line=dict(width=3, color='#555'), # Increased width from 2 to 3, darker color
224
+ hoverinfo='none',
225
+ showlegend=False
226
+ )
227
+ edge_traces.append(edge_trace)
228
+
229
+ # Add arrow annotation
230
+ edge_traces.append(create_arrow_annotation(x0, y0, x1, y1))
231
+
232
+ # Create node trace
233
+ node_trace = go.Scatter(
234
+ x=node_x,
235
+ y=node_y,
236
+ mode='markers+text',
237
+ marker=dict(
238
+ size=node_sizes,
239
+ color=node_colors,
240
+ line=dict(width=2, color='white')
241
+ ),
242
+ text=node_text,
243
+ textposition='bottom center',
244
+ textfont=dict(size=10, color='#333'),
245
+ hovertext=hover_text,
246
+ hoverinfo='text',
247
+ showlegend=False
248
+ )
249
+
250
+ # Create figure
251
+ fig = go.Figure(data=edge_traces + [node_trace])
252
+
253
+ # Update layout with better visibility settings
254
+ fig.update_layout(
255
+ title={
256
+ 'text': f"🧠 Agent Thought Graph: {trace_id}",
257
+ 'x': 0.5,
258
+ 'xanchor': 'center',
259
+ 'font': {'size': 20}
260
+ },
261
+ showlegend=False,
262
+ hovermode='closest',
263
+ margin=dict(t=100, b=40, l=40, r=40),
264
+ height=600,
265
+ xaxis=dict(
266
+ showgrid=False,
267
+ zeroline=False,
268
+ showticklabels=False,
269
+ range=[-0.1, 1.1] # Add padding to see edges at boundaries
270
+ ),
271
+ yaxis=dict(
272
+ showgrid=False,
273
+ zeroline=False,
274
+ showticklabels=False,
275
+ range=[-0.1, 1.1] # Add padding to see edges at boundaries
276
+ ),
277
+ plot_bgcolor='white', # Pure white background for maximum contrast
278
+ paper_bgcolor='#f8f9fa', # Light gray paper
279
+ annotations=[
280
+ dict(
281
+ text="💡 Hover over nodes to see details | Arrows show execution flow",
282
+ xref="paper", yref="paper",
283
+ x=0.5, y=-0.05, xanchor='center', yanchor='top',
284
+ showarrow=False,
285
+ font=dict(size=11, color='#666')
286
+ )
287
+ ]
288
+ )
289
+
290
+ # Add legend for node types
291
+ legend_items = create_legend_items()
292
+ fig.add_annotation(
293
+ text=legend_items,
294
+ xref="paper", yref="paper",
295
+ x=1.0, y=1.0, xanchor='right', yanchor='top',
296
+ showarrow=False,
297
+ font=dict(size=10),
298
+ align='left',
299
+ bgcolor='white',
300
+ bordercolor='#ccc',
301
+ borderwidth=1,
302
+ borderpad=8
303
+ )
304
+
305
+ return fig
306
+
307
+
308
+ def create_hierarchical_layout(G: nx.DiGraph, levels: Dict[str, int]) -> Dict[str, Tuple[float, float]]:
309
+ """Create a hierarchical layout for the graph"""
310
+ pos = {}
311
+
312
+ # Group nodes by level
313
+ level_nodes = {}
314
+ for node, level in levels.items():
315
+ if level not in level_nodes:
316
+ level_nodes[level] = []
317
+ level_nodes[level].append(node)
318
+
319
+ # Assign positions
320
+ max_level = max(levels.values()) if levels else 0
321
+ for level, nodes in level_nodes.items():
322
+ y = 1.0 - (level / max(max_level, 1)) # Top to bottom
323
+ num_nodes = len(nodes)
324
+ for i, node in enumerate(nodes):
325
+ x = (i + 1) / (num_nodes + 1) # Spread evenly
326
+ pos[node] = (x, y)
327
+
328
+ return pos
329
+
330
+
331
+ def get_node_color(kind: str, status: str) -> str:
332
+ """Get color for node based on kind and status"""
333
+
334
+ # Error status overrides kind color
335
+ if status == 'ERROR':
336
+ return '#DC143C' # Crimson
337
+
338
+ # Color by kind
339
+ color_map = {
340
+ 'LLM': '#9B59B6', # Purple
341
+ 'AGENT': '#1ABC9C', # Turquoise
342
+ 'CHAIN': '#3498DB', # Light Blue
343
+ 'TOOL': '#E67E22', # Orange
344
+ 'RETRIEVER': '#F39C12', # Yellow-Orange
345
+ 'EMBEDDING': '#8E44AD', # Dark Purple
346
+ 'CLIENT': '#4169E1', # Royal Blue
347
+ 'SERVER': '#2E8B57', # Sea Green
348
+ 'INTERNAL': '#95A5A6', # Gray
349
+ }
350
+
351
+ return color_map.get(kind, '#4682B4') # Steel Blue default
352
+
353
+
354
+ def shorten_label(text: str, max_length: int = 20) -> str:
355
+ """Shorten label for display"""
356
+ if len(text) <= max_length:
357
+ return text
358
+ return text[:max_length-3] + '...'
359
+
360
+
361
+ def create_arrow_annotation(x0: float, y0: float, x1: float, y1: float) -> go.Scatter:
362
+ """Create an arrow annotation between two points"""
363
+ # Calculate arrow position (70% along the line, closer to end)
364
+ arrow_x = x0 + 0.7 * (x1 - x0)
365
+ arrow_y = y0 + 0.7 * (y1 - y0)
366
+
367
+ # Calculate angle for arrow direction
368
+ import math
369
+ angle = math.atan2(y1 - y0, x1 - x0)
370
+
371
+ # Create arrow head (larger and more visible)
372
+ arrow_size = 0.03 # Increased from 0.02
373
+ arrow_dx = arrow_size * math.cos(angle + 2.8)
374
+ arrow_dy = arrow_size * math.sin(angle + 2.8)
375
+
376
+ arrow_trace = go.Scatter(
377
+ x=[arrow_x - arrow_dx, arrow_x, arrow_x + arrow_size * math.cos(angle - 2.8)],
378
+ y=[arrow_y - arrow_dy, arrow_y, arrow_y + arrow_size * math.sin(angle - 2.8)],
379
+ mode='lines',
380
+ line=dict(width=2, color='#555'), # Match edge color
381
+ fill='toself',
382
+ fillcolor='#555', # Darker fill color
383
+ hoverinfo='none',
384
+ showlegend=False
385
+ )
386
+
387
+ return arrow_trace
388
+
389
+
390
+ def create_legend_items() -> str:
391
+ """Create HTML legend for node types"""
392
+ legend = "<b>Node Types:</b><br>"
393
+ legend += "🟣 LLM Call<br>"
394
+ legend += "🟠 Tool Call<br>"
395
+ legend += "🔵 Chain/Agent<br>"
396
+ legend += "⚪ Other<br>"
397
+ legend += "🔴 Error"
398
+ return legend
screens/trace_detail.py ADDED
@@ -0,0 +1,721 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Screen 4: Trace Detail View
3
+ Shows detailed OpenTelemetry trace visualization
4
+ """
5
+
6
+ import gradio as gr
7
+ import plotly.graph_objects as go
8
+ from plotly.subplots import make_subplots
9
+ from datetime import datetime
10
+ import pandas as pd
11
+ from typing import Optional, Callable, Dict, Any, List
12
+ from components.thought_graph import create_thought_graph
13
+
14
+
15
+ def create_trace_detail_screen(
16
+ trace_data: dict,
17
+ on_back: Optional[Callable] = None,
18
+ mcp_qa_enabled: bool = True
19
+ ) -> gr.Blocks:
20
+ """
21
+ Create the trace detail screen UI
22
+
23
+ Args:
24
+ trace_data: OpenTelemetry trace data
25
+ on_back: Callback for back button
26
+ mcp_qa_enabled: Enable MCP Q&A tool
27
+
28
+ Returns:
29
+ Gradio Blocks for trace detail screen
30
+ """
31
+
32
+ with gr.Blocks() as trace_detail:
33
+ with gr.Row():
34
+ if on_back:
35
+ back_btn = gr.Button("⬅️ Back to Run Detail", variant="secondary", size="sm")
36
+
37
+ gr.Markdown(f"# 🔍 Trace Detail: {trace_data.get('trace_id', 'Unknown')}")
38
+
39
+ # Safely extract spans
40
+ spans = trace_data.get('spans', [])
41
+ if hasattr(spans, 'tolist'):
42
+ spans = spans.tolist()
43
+ elif not isinstance(spans, list):
44
+ spans = list(spans) if spans is not None else []
45
+
46
+ # Trace metadata
47
+ with gr.Row():
48
+ gr.Markdown(f"""
49
+ **Trace ID:** `{trace_data.get('trace_id', 'N/A')}`
50
+ **Total Spans:** {len(spans)}
51
+ """)
52
+
53
+ # Tabs for different visualizations
54
+ with gr.Tabs() as tabs:
55
+ # Tab 1: Thought Graph (STAR FEATURE!)
56
+ with gr.Tab("🧠 Thought Graph"):
57
+ gr.Markdown("""
58
+ ### Agent Reasoning Flow
59
+ This graph visualizes how your agent thinks - showing the flow of reasoning steps,
60
+ tool calls, and LLM interactions as a network.
61
+
62
+ **Node Colors:**
63
+ - 🟣 Purple: LLM reasoning steps
64
+ - 🟠 Orange: Tool calls
65
+ - 🔵 Blue: Chains/Agents
66
+ - 🔴 Red: Errors
67
+ """)
68
+
69
+ # Create and display thought graph
70
+ thought_graph_plot = gr.Plot(
71
+ value=create_thought_graph(spans, trace_data.get('trace_id', 'Unknown')),
72
+ label=""
73
+ )
74
+
75
+ # Tab 2: Execution Timeline (Waterfall)
76
+ with gr.Tab("⏱️ Execution Timeline"):
77
+ gr.Markdown("""
78
+ ### Waterfall Chart
79
+ Timeline view showing when each span executed and for how long.
80
+ """)
81
+
82
+ # Span visualization
83
+ span_viz = gr.Plot(
84
+ value=create_span_visualization(spans, trace_data.get('trace_id', 'Unknown')),
85
+ label=""
86
+ )
87
+
88
+ # Tab 3: Span Details
89
+ with gr.Tab("📋 Span Details"):
90
+ gr.Markdown("""
91
+ ### Detailed Span Information
92
+ Raw span data with attributes, status, and metadata.
93
+ """)
94
+
95
+ # Span details table
96
+ span_table = create_span_table(spans)
97
+
98
+ # MCP Q&A Tool (below tabs)
99
+ gr.Markdown("---")
100
+ if mcp_qa_enabled:
101
+ with gr.Accordion("🤖 Ask About This Trace", open=False):
102
+ question_input = gr.Textbox(
103
+ label="Question",
104
+ placeholder="e.g., Why was the tool called twice? What tool did the agent use first?",
105
+ lines=2
106
+ )
107
+ ask_btn = gr.Button("Ask", variant="primary")
108
+ answer_output = gr.Markdown("*Ask a question to get AI-powered insights*")
109
+
110
+ # Wire up MCP Q&A (placeholder for now)
111
+ ask_btn.click(
112
+ fn=lambda q: f"**Answer:** This is a placeholder. MCP integration coming soon.\n\n**Your question:** {q}",
113
+ inputs=[question_input],
114
+ outputs=[answer_output]
115
+ )
116
+
117
+ # Wire up events
118
+ if on_back:
119
+ back_btn.click(fn=on_back, inputs=[], outputs=[])
120
+
121
+ return trace_detail
122
+
123
+
124
+ def process_trace_data(spans: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
125
+ """Process trace spans for waterfall visualization"""
126
+ # Ensure spans is a list
127
+ if hasattr(spans, 'tolist'):
128
+ spans = spans.tolist()
129
+ elif not isinstance(spans, list):
130
+ spans = list(spans) if spans is not None else []
131
+
132
+ if not spans:
133
+ return []
134
+
135
+ # Helper function to get timestamp from span (handles different field names)
136
+ def get_timestamp(span, field_name):
137
+ """Get timestamp handling different OpenTelemetry field name variations"""
138
+ # Try different variations of field names
139
+ variations = [
140
+ field_name, # e.g., 'startTime'
141
+ field_name.lower(), # e.g., 'starttime'
142
+ field_name.replace('Time', 'TimeUnixNano'), # e.g., 'startTimeUnixNano'
143
+ field_name[0].lower() + field_name[1:], # e.g., 'startTime'
144
+ # Add snake_case variations (start_time, end_time)
145
+ field_name.replace('Time', '_time').lower(), # e.g., 'start_time'
146
+ field_name.replace('Time', '_time_unix_nano').lower(), # e.g., 'start_time_unix_nano'
147
+ ]
148
+
149
+ for var in variations:
150
+ if var in span:
151
+ value = span[var]
152
+ # Handle both string and numeric timestamps
153
+ if isinstance(value, str):
154
+ return int(value)
155
+ return value
156
+
157
+ # If not found, return 0
158
+ return 0
159
+
160
+ # Calculate relative times
161
+ start_times = [get_timestamp(span, 'startTime') for span in spans]
162
+ min_start = min(start_times) if start_times else 0
163
+ max_start = max(start_times) if start_times else 0
164
+
165
+ # Check if we have any actual timing data
166
+ has_timing_data = min_start > 0 or max_start > 0
167
+
168
+ # Debug: Print first span's raw timestamps
169
+ if spans:
170
+ first_span = spans[0]
171
+ print(f"[DEBUG] First span raw data sample:")
172
+ print(f" startTime field: {first_span.get('startTime', 'NOT FOUND')}")
173
+ print(f" endTime field: {first_span.get('endTime', 'NOT FOUND')}")
174
+ print(f" startTimeUnixNano field: {first_span.get('startTimeUnixNano', 'NOT FOUND')}")
175
+ print(f" endTimeUnixNano field: {first_span.get('endTimeUnixNano', 'NOT FOUND')}")
176
+ print(f" HAS_TIMING_DATA: {has_timing_data}")
177
+ if 'attributes' in first_span:
178
+ attrs = first_span['attributes']
179
+ print(f" Sample attributes: {list(attrs.keys())[:5] if isinstance(attrs, dict) else 'N/A'}")
180
+ if isinstance(attrs, dict):
181
+ # Check for cost fields
182
+ cost_fields = [k for k in attrs.keys() if 'cost' in k.lower() or 'price' in k.lower()]
183
+ if cost_fields:
184
+ print(f" Cost-related fields found: {cost_fields}")
185
+
186
+ # Auto-detect timestamp unit based on magnitude
187
+ time_divisor = 1000000 # Default: assume nanoseconds, convert to milliseconds
188
+ if start_times and min_start > 0:
189
+ # If timestamp is > 1e15, it's likely nanoseconds
190
+ # If timestamp is > 1e12, it's likely microseconds
191
+ # If timestamp is > 1e9, it's likely milliseconds
192
+ # If timestamp is < 1e9, it's likely seconds
193
+ if min_start > 1e15:
194
+ time_divisor = 1000000 # nanoseconds to milliseconds
195
+ time_unit = "nanoseconds"
196
+ elif min_start > 1e12:
197
+ time_divisor = 1000 # microseconds to milliseconds
198
+ time_unit = "microseconds"
199
+ elif min_start > 1e9:
200
+ time_divisor = 1 # already in milliseconds
201
+ time_unit = "milliseconds"
202
+ else:
203
+ time_divisor = 0.001 # seconds to milliseconds
204
+ time_unit = "seconds"
205
+ print(f"[DEBUG] Auto-detected timestamp unit: {time_unit} (min_start={min_start}, divisor={time_divisor})")
206
+
207
+ processed_spans = []
208
+ for idx, span in enumerate(spans):
209
+ start_time = get_timestamp(span, 'startTime')
210
+ end_time = get_timestamp(span, 'endTime')
211
+
212
+ # Calculate relative start
213
+ relative_start = (start_time - min_start) / time_divisor if has_timing_data else 0
214
+
215
+ # Calculate duration - prefer duration_ms if available
216
+ if 'duration_ms' in span and span['duration_ms'] is not None:
217
+ actual_duration = float(span['duration_ms'])
218
+ else:
219
+ actual_duration = (end_time - start_time) / time_divisor
220
+
221
+ # Debug: Print first few durations
222
+ if idx < 3:
223
+ duration_source = 'duration_ms' if 'duration_ms' in span else 'calculated'
224
+ print(f"[DEBUG] Span {idx}: start={start_time}, end={end_time}, duration={actual_duration:.3f}ms ({duration_source})")
225
+
226
+ # Handle span ID variations
227
+ span_id = span.get('spanId') or span.get('span_id') or span.get('spanID') or f'span_{idx}'
228
+ parent_id = span.get('parentSpanId') or span.get('parent_span_id') or span.get('parentSpanID')
229
+
230
+ # Get span kind - check both top-level and OpenInference attributes
231
+ span_kind = span.get('kind', 'INTERNAL')
232
+ attributes = span.get('attributes', {})
233
+
234
+ # Check for OpenInference span kind in attributes
235
+ if isinstance(attributes, dict) and 'openinference.span.kind' in attributes:
236
+ openinference_kind = attributes.get('openinference.span.kind')
237
+ # Map OpenInference kinds to OpenTelemetry kinds for consistency
238
+ # OpenInference kinds: CHAIN, TOOL, LLM, RETRIEVER, EMBEDDING, AGENT, etc.
239
+ if openinference_kind:
240
+ span_kind = openinference_kind.upper()
241
+
242
+ # Extract token and cost information from attributes
243
+ token_info = {}
244
+ cost_info = {}
245
+ if isinstance(attributes, dict):
246
+ # Helper to safely extract numeric values
247
+ def safe_numeric(value):
248
+ """Safely convert to numeric, return None if invalid"""
249
+ if value is None:
250
+ return None
251
+ try:
252
+ if isinstance(value, (int, float)):
253
+ return value
254
+ return float(value)
255
+ except (ValueError, TypeError):
256
+ return None
257
+
258
+ # Check for token usage (various formats)
259
+ prompt_tokens = None
260
+ completion_tokens = None
261
+
262
+ if 'gen_ai.usage.prompt_tokens' in attributes:
263
+ prompt_tokens = safe_numeric(attributes['gen_ai.usage.prompt_tokens'])
264
+ if 'gen_ai.usage.completion_tokens' in attributes:
265
+ completion_tokens = safe_numeric(attributes['gen_ai.usage.completion_tokens'])
266
+ if 'llm.token_count.prompt' in attributes and prompt_tokens is None:
267
+ prompt_tokens = safe_numeric(attributes['llm.token_count.prompt'])
268
+ if 'llm.token_count.completion' in attributes and completion_tokens is None:
269
+ completion_tokens = safe_numeric(attributes['llm.token_count.completion'])
270
+
271
+ # Store valid token counts
272
+ if prompt_tokens is not None:
273
+ token_info['prompt_tokens'] = int(prompt_tokens)
274
+ if completion_tokens is not None:
275
+ token_info['completion_tokens'] = int(completion_tokens)
276
+
277
+ # Calculate total tokens
278
+ if 'prompt_tokens' in token_info and 'completion_tokens' in token_info:
279
+ token_info['total_tokens'] = token_info['prompt_tokens'] + token_info['completion_tokens']
280
+ elif 'llm.usage.total_tokens' in attributes:
281
+ total = safe_numeric(attributes['llm.usage.total_tokens'])
282
+ if total is not None:
283
+ token_info['total_tokens'] = int(total)
284
+
285
+ # Check for cost information (various formats)
286
+ if 'gen_ai.usage.cost.total' in attributes:
287
+ cost = safe_numeric(attributes['gen_ai.usage.cost.total'])
288
+ if cost is not None:
289
+ cost_info['total_cost'] = cost
290
+ elif 'llm.usage.cost' in attributes:
291
+ cost = safe_numeric(attributes['llm.usage.cost'])
292
+ if cost is not None:
293
+ cost_info['total_cost'] = cost
294
+
295
+ # Debug: Print cost info for LLM spans
296
+ if idx < 2 and span_kind == 'LLM':
297
+ print(f"[DEBUG] LLM Span {idx} cost extraction:")
298
+ print(f" gen_ai.usage.cost.total: {attributes.get('gen_ai.usage.cost.total', 'NOT FOUND')}")
299
+ print(f" llm.usage.cost: {attributes.get('llm.usage.cost', 'NOT FOUND')}")
300
+ print(f" cost_info: {cost_info}")
301
+
302
+ # Store actual duration for tooltip, use minimum for visualization
303
+ display_duration = max(actual_duration, 0.1) # Minimum width for visibility
304
+
305
+ processed_spans.append({
306
+ 'span_id': span_id,
307
+ 'parent_id': parent_id,
308
+ 'name': span.get('name', 'Unknown'),
309
+ 'kind': span_kind,
310
+ 'start_time': relative_start,
311
+ 'duration': display_duration, # For bar width
312
+ 'actual_duration': actual_duration, # For tooltip
313
+ 'end_time': relative_start + actual_duration, # Use actual for end time
314
+ 'attributes': attributes,
315
+ 'status': span.get('status', {}).get('code', 'UNKNOWN'),
316
+ 'tokens': token_info,
317
+ 'cost': cost_info
318
+ })
319
+
320
+ print(f"[DEBUG] Total spans in input: {len(spans)}")
321
+ print(f"[DEBUG] Processed spans: {len(processed_spans)}")
322
+
323
+ # Debug: Show span kinds and statuses detected
324
+ span_kinds = {}
325
+ span_statuses = {}
326
+ durations = []
327
+ spans_with_tokens = 0
328
+ spans_with_cost = 0
329
+ for span in processed_spans:
330
+ kind = span['kind']
331
+ status = span['status']
332
+ span_kinds[kind] = span_kinds.get(kind, 0) + 1
333
+ span_statuses[status] = span_statuses.get(status, 0) + 1
334
+ durations.append(span['actual_duration'])
335
+ if span['tokens']:
336
+ spans_with_tokens += 1
337
+ if span['cost']:
338
+ spans_with_cost += 1
339
+
340
+ print(f"[DEBUG] Span kinds detected: {span_kinds}")
341
+ print(f"[DEBUG] Span statuses detected: {span_statuses}")
342
+ if durations:
343
+ print(f"[DEBUG] Duration range: {min(durations):.3f}ms - {max(durations):.3f}ms")
344
+ print(f"[DEBUG] Spans with token info: {spans_with_tokens}/{len(processed_spans)}")
345
+ print(f"[DEBUG] Spans with cost info: {spans_with_cost}/{len(processed_spans)}")
346
+
347
+ return processed_spans
348
+
349
+
350
+ def create_span_visualization(spans: List[Dict[str, Any]], trace_id: str = "Unknown") -> go.Figure:
351
+ """Create an interactive Plotly waterfall visualization of spans"""
352
+ processed_spans = process_trace_data(spans)
353
+
354
+ print(f"[DEBUG] create_span_visualization - Received {len(spans)} spans")
355
+ print(f"[DEBUG] create_span_visualization - Processed {len(processed_spans)} spans")
356
+
357
+ if not processed_spans:
358
+ # Return empty figure with message
359
+ fig = go.Figure()
360
+ fig.add_annotation(
361
+ text="No spans to display",
362
+ xref="paper", yref="paper",
363
+ x=0.5, y=0.5, xanchor='center', yanchor='middle',
364
+ showarrow=False,
365
+ font=dict(size=20)
366
+ )
367
+ return fig
368
+
369
+ # Sort spans by start time for better visualization
370
+ processed_spans.sort(key=lambda x: x['start_time'])
371
+
372
+ # Create unique labels for each span (include index to ensure uniqueness)
373
+ for idx, span in enumerate(processed_spans):
374
+ # Add span index to make labels unique
375
+ span['display_name'] = f"{span['name']} [{idx}]"
376
+
377
+ # Create colors based on span status and kind
378
+ colors = []
379
+ color_map = {} # Track which colors are assigned to which kinds
380
+ for span in processed_spans:
381
+ status = span['status']
382
+ kind = span['kind']
383
+
384
+ # Only show red for actual errors (ERROR status)
385
+ if status == 'ERROR':
386
+ color = '#DC143C' # Crimson for errors
387
+ else:
388
+ # Color by span kind (supports both OpenTelemetry and OpenInference)
389
+ if kind == 'SERVER':
390
+ color = '#2E8B57' # Sea Green
391
+ elif kind == 'CLIENT':
392
+ color = '#4169E1' # Royal Blue
393
+ elif kind == 'LLM':
394
+ color = '#9B59B6' # Purple for LLM calls
395
+ elif kind == 'TOOL':
396
+ color = '#E67E22' # Orange for Tool calls
397
+ elif kind == 'CHAIN':
398
+ color = '#3498DB' # Light Blue for Chains
399
+ elif kind == 'AGENT':
400
+ color = '#1ABC9C' # Turquoise for Agents
401
+ elif kind == 'RETRIEVER':
402
+ color = '#F39C12' # Yellow-Orange for Retrievers
403
+ elif kind == 'EMBEDDING':
404
+ color = '#8E44AD' # Dark Purple for Embeddings
405
+ else:
406
+ color = '#4682B4' # Steel Blue for INTERNAL/unknown
407
+
408
+ colors.append(color)
409
+ if kind not in color_map:
410
+ color_map[kind] = color
411
+
412
+ print(f"[DEBUG] Color assignments: {color_map}")
413
+
414
+ # Create the waterfall chart
415
+ fig = go.Figure()
416
+
417
+ # Prepare custom data for hover tooltips
418
+ customdata = []
419
+ for span in processed_spans:
420
+ # Build token info string
421
+ token_str = ""
422
+ if span['tokens']:
423
+ tokens = span['tokens']
424
+ if 'total_tokens' in tokens:
425
+ token_str = f"<br>Tokens: {tokens['total_tokens']}"
426
+ if 'prompt_tokens' in tokens and 'completion_tokens' in tokens:
427
+ token_str += f" (prompt: {tokens['prompt_tokens']}, completion: {tokens['completion_tokens']})"
428
+ elif 'prompt_tokens' in tokens or 'completion_tokens' in tokens:
429
+ parts = []
430
+ if 'prompt_tokens' in tokens:
431
+ parts.append(f"prompt: {tokens['prompt_tokens']}")
432
+ if 'completion_tokens' in tokens:
433
+ parts.append(f"completion: {tokens['completion_tokens']}")
434
+ token_str = f"<br>Tokens: {', '.join(parts)}"
435
+
436
+ # Build cost info string
437
+ cost_str = ""
438
+ if span['cost'] and 'total_cost' in span['cost']:
439
+ cost_str = f"<br>Cost: ${span['cost']['total_cost']:.6f}"
440
+
441
+ customdata.append([
442
+ span['name'],
443
+ span['kind'],
444
+ span['span_id'],
445
+ span['end_time'],
446
+ span['actual_duration'], # Show actual duration, not display duration
447
+ token_str,
448
+ cost_str
449
+ ])
450
+
451
+ # Add bars for each span (use display_name for unique y-axis labels)
452
+ fig.add_trace(go.Bar(
453
+ y=[span['display_name'] for span in processed_spans],
454
+ x=[span['duration'] for span in processed_spans], # Display duration (min 0.1ms)
455
+ base=[span['start_time'] for span in processed_spans],
456
+ orientation='h',
457
+ marker_color=colors,
458
+ hovertemplate=(
459
+ "<b>%{customdata[0]}</b><br>" +
460
+ "Type: %{customdata[1]}<br>" +
461
+ "Span ID: %{customdata[2]}<br>" +
462
+ "Duration: %{customdata[4]:.3f} ms<br>" + # Actual duration with 3 decimal places
463
+ "Start: %{base:.2f} ms<br>" +
464
+ "End: %{customdata[3]:.2f} ms" +
465
+ "%{customdata[5]}" + # Token info (already formatted)
466
+ "%{customdata[6]}" + # Cost info (already formatted)
467
+ "<extra></extra>"
468
+ ),
469
+ customdata=customdata,
470
+ name="Spans"
471
+ ))
472
+
473
+ # Update layout for better visualization
474
+ fig.update_layout(
475
+ title={
476
+ 'text': f"OpenTelemetry Trace: {trace_id}",
477
+ 'x': 0.5,
478
+ 'xanchor': 'center'
479
+ },
480
+ xaxis_title="Time (milliseconds)",
481
+ yaxis_title="Spans",
482
+ showlegend=False,
483
+ height=400 + len(processed_spans) * 30, # Dynamic height based on span count
484
+ bargap=0.2,
485
+ hovermode='closest'
486
+ )
487
+
488
+ return fig
489
+
490
+
491
+ def create_span_table(spans: List[Dict[str, Any]]) -> gr.JSON:
492
+ """Create detailed span information display"""
493
+
494
+ # Ensure spans is a list
495
+ if hasattr(spans, 'tolist'):
496
+ spans = spans.tolist()
497
+ elif not isinstance(spans, list):
498
+ spans = list(spans) if spans is not None else []
499
+
500
+ # Helper function to get timestamp (same as in process_trace_data)
501
+ def get_timestamp(span, field_name):
502
+ variations = [
503
+ field_name,
504
+ field_name.lower(),
505
+ field_name.replace('Time', 'TimeUnixNano'),
506
+ field_name[0].lower() + field_name[1:],
507
+ ]
508
+ for var in variations:
509
+ if var in span:
510
+ value = span[var]
511
+ if isinstance(value, str):
512
+ return int(value)
513
+ return value
514
+ return 0
515
+
516
+ # Simplify span data for display
517
+ simplified_spans = []
518
+ for span in spans:
519
+ start_time = get_timestamp(span, 'startTime')
520
+ end_time = get_timestamp(span, 'endTime')
521
+ duration_ms = (end_time - start_time) / 1000000 if (end_time and start_time) else 0
522
+
523
+ # Handle span ID variations
524
+ span_id = span.get('spanId') or span.get('span_id') or span.get('spanID') or 'N/A'
525
+ parent_id = span.get('parentSpanId') or span.get('parent_span_id') or span.get('parentSpanID') or 'root'
526
+
527
+ simplified_spans.append({
528
+ "Span ID": span_id,
529
+ "Parent": parent_id,
530
+ "Name": span.get('name', 'N/A'),
531
+ "Kind": span.get('kind', 'N/A'),
532
+ "Duration (ms)": round(duration_ms, 2),
533
+ "Attributes": span.get('attributes', {}),
534
+ "Status": span.get('status', {}).get('code', 'UNKNOWN')
535
+ })
536
+
537
+ return gr.JSON(value=simplified_spans, label="Span Details")
538
+
539
+
540
+ # GPU Metrics Visualization Functions
541
+
542
+ def extract_metrics_data(metrics_df):
543
+ """
544
+ Extract and prepare GPU metrics data for visualization
545
+
546
+ Args:
547
+ metrics_df: DataFrame with flat metrics structure (from HuggingFace dataset)
548
+ Expected columns: timestamp, gpu_utilization_percent, gpu_memory_used_mib,
549
+ gpu_temperature_celsius, gpu_power_watts, co2_emissions_gco2e
550
+
551
+ Returns:
552
+ DataFrame ready for visualization
553
+ """
554
+ if metrics_df is None or metrics_df.empty:
555
+ return pd.DataFrame()
556
+
557
+ # Ensure timestamp is datetime
558
+ if 'timestamp' in metrics_df.columns:
559
+ if not pd.api.types.is_datetime64_any_dtype(metrics_df['timestamp']):
560
+ metrics_df['timestamp'] = pd.to_datetime(metrics_df['timestamp'])
561
+
562
+ # Sort by timestamp
563
+ metrics_df = metrics_df.sort_values('timestamp')
564
+
565
+ return metrics_df
566
+
567
+
568
+ def create_gpu_summary_cards(df):
569
+ """
570
+ Create summary cards for GPU metrics
571
+
572
+ Args:
573
+ df: DataFrame with flat metrics structure (columns: gpu_utilization_percent, etc.)
574
+
575
+ Returns:
576
+ HTML string with summary cards
577
+ """
578
+ if df is None or df.empty:
579
+ return "<div style='padding: 20px; text-align: center;'>⚠️ No GPU metrics available (expected for API models)</div>"
580
+
581
+ # Get the latest row (assumes df is sorted by timestamp)
582
+ latest = df.iloc[-1]
583
+
584
+ # Extract values (with safe fallback)
585
+ utilization = latest.get('gpu_utilization_percent', 0)
586
+ memory_used = latest.get('gpu_memory_used_mib', 0)
587
+ temperature = latest.get('gpu_temperature_celsius', 0)
588
+ co2_emissions = latest.get('co2_emissions_gco2e', 0)
589
+ power = latest.get('gpu_power_watts', 0)
590
+
591
+ # Also get memory total if available for percentage
592
+ memory_total = latest.get('gpu_memory_total_mib', 0)
593
+ memory_percent = (memory_used / memory_total * 100) if memory_total > 0 else 0
594
+
595
+ cards_html = f"""
596
+ <div style="display: grid; grid-template-columns: repeat(4, 1fr); gap: 15px; margin: 20px 0;">
597
+ <div style="background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); padding: 20px; border-radius: 10px; color: white; text-align: center;">
598
+ <h3 style="margin: 0 0 10px 0; font-size: 1em;">GPU Utilization</h3>
599
+ <h2 style="margin: 0; font-size: 2em;">{utilization:.1f}%</h2>
600
+ </div>
601
+ <div style="background: linear-gradient(135deg, #f093fb 0%, #f5576c 100%); padding: 20px; border-radius: 10px; color: white; text-align: center;">
602
+ <h3 style="margin: 0 0 10px 0; font-size: 1em;">GPU Memory</h3>
603
+ <h2 style="margin: 0; font-size: 2em;">{memory_used:.0f} MiB</h2>
604
+ <p style="margin: 5px 0 0 0; font-size: 0.8em; opacity: 0.9;">{memory_percent:.1f}% of {memory_total:.0f} MiB</p>
605
+ </div>
606
+ <div style="background: linear-gradient(135deg, #4facfe 0%, #00f2fe 100%); padding: 20px; border-radius: 10px; color: white; text-align: center;">
607
+ <h3 style="margin: 0 0 10px 0; font-size: 1em;">GPU Temperature</h3>
608
+ <h2 style="margin: 0; font-size: 2em;">{temperature:.0f}°C</h2>
609
+ </div>
610
+ <div style="background: linear-gradient(135deg, #43e97b 0%, #38f9d7 100%); padding: 20px; border-radius: 10px; color: white; text-align: center;">
611
+ <h3 style="margin: 0 0 10px 0; font-size: 1em;">CO2 Emissions</h3>
612
+ <h2 style="margin: 0; font-size: 2em;">{co2_emissions:.4f} g</h2>
613
+ <p style="margin: 5px 0 0 0; font-size: 0.8em; opacity: 0.9;">Power: {power:.1f} W</p>
614
+ </div>
615
+ </div>
616
+ """
617
+
618
+ return cards_html
619
+
620
+
621
+ def create_gpu_metrics_dashboard(metrics_df):
622
+ """
623
+ Create a combined dashboard with GPU metric charts
624
+
625
+ Args:
626
+ metrics_df: DataFrame with flat metrics structure (from HuggingFace dataset)
627
+
628
+ Returns:
629
+ Plotly figure with GPU metrics time series
630
+ """
631
+ if metrics_df is None or metrics_df.empty:
632
+ # Return empty figure with message
633
+ fig = go.Figure()
634
+ fig.add_annotation(
635
+ text="No GPU metrics available (expected for API models)",
636
+ xref="paper", yref="paper",
637
+ x=0.5, y=0.5, xanchor='center', yanchor='middle',
638
+ showarrow=False,
639
+ font=dict(size=16)
640
+ )
641
+ return fig
642
+
643
+ # Prepare data
644
+ df = extract_metrics_data(metrics_df)
645
+
646
+ if df.empty:
647
+ return None
648
+
649
+ # Create subplots for GPU metrics
650
+ # We'll show: Utilization, Memory, Temperature, Power, CO2
651
+ fig = make_subplots(
652
+ rows=3, cols=2,
653
+ subplot_titles=[
654
+ 'GPU Utilization (%)',
655
+ 'GPU Memory (MiB)',
656
+ 'GPU Temperature (°C)',
657
+ 'GPU Power (W)',
658
+ 'CO2 Emissions (g)',
659
+ 'Power Cost (USD)'
660
+ ],
661
+ vertical_spacing=0.10,
662
+ horizontal_spacing=0.12,
663
+ specs=[[{}, {}], [{}, {}], [{}, {}]]
664
+ )
665
+
666
+ colors = ['#667eea', '#f093fb', '#4facfe', '#FFE66D', '#43e97b', '#FF6B6B']
667
+
668
+ # Define metrics to plot
669
+ metrics_config = [
670
+ ('gpu_utilization_percent', 'GPU Utilization (%)', 1, 1, colors[0]),
671
+ ('gpu_memory_used_mib', 'GPU Memory (MiB)', 1, 2, colors[1]),
672
+ ('gpu_temperature_celsius', 'GPU Temperature (°C)', 2, 1, colors[2]),
673
+ ('gpu_power_watts', 'GPU Power (W)', 2, 2, colors[3]),
674
+ ('co2_emissions_gco2e', 'CO2 Emissions (g)', 3, 1, colors[4]),
675
+ ('power_cost_usd', 'Power Cost (USD)', 3, 2, colors[5]),
676
+ ]
677
+
678
+ for col_name, title, row, col, color in metrics_config:
679
+ if col_name in df.columns:
680
+ fig.add_trace(
681
+ go.Scatter(
682
+ x=df['timestamp'],
683
+ y=df[col_name],
684
+ mode='lines+markers',
685
+ name=title,
686
+ line=dict(color=color, width=3),
687
+ marker=dict(size=6, color=color),
688
+ hovertemplate=(
689
+ f"<b>{title}</b><br>" +
690
+ "Time: %{x}<br>" +
691
+ "Value: %{y:.2f}<br>" +
692
+ "<extra></extra>"
693
+ )
694
+ ),
695
+ row=row, col=col
696
+ )
697
+
698
+ # Add memory total as a dashed line if available
699
+ if 'gpu_memory_total_mib' in df.columns:
700
+ total_memory = df['gpu_memory_total_mib'].iloc[0]
701
+ fig.add_hline(
702
+ y=total_memory,
703
+ line_dash="dash",
704
+ line_color="gray",
705
+ annotation_text=f"Total: {total_memory:.0f} MiB",
706
+ annotation_position="right",
707
+ row=1, col=2
708
+ )
709
+
710
+ fig.update_layout(
711
+ title_text="GPU Metrics Over Time",
712
+ height=900,
713
+ template="plotly_white",
714
+ showlegend=False,
715
+ hovermode='x unified'
716
+ )
717
+
718
+ # Update x-axes to show time format
719
+ fig.update_xaxes(tickformat='%H:%M:%S')
720
+
721
+ return fig