2 20

xuhuang

xuhuang87

AI & ML interests

None yet

Recent Activity

upvoted a paper about 2 months ago

InteractScience: Programmatic and Visually-Grounded Evaluation of Interactive Scientific Demonstration Code Generation

upvoted a paper about 2 months ago

OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows

upvoted a paper 2 months ago

JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence

View all activity

Organizations

None yet

upvoted 2 papers about 2 months ago

InteractScience: Programmatic and Visually-Grounded Evaluation of Interactive Scientific Demonstration Code Generation

Paper • 2510.09724 • Published Oct 10, 2025 • 10

OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows

Paper • 2510.24411 • Published Oct 28, 2025 • 71

upvoted 2 papers 2 months ago

JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence

Paper • 2510.23538 • Published Oct 27, 2025 • 96

QueST: Incentivizing LLMs to Generate Difficult Problems

Paper • 2510.17715 • Published Oct 20, 2025 • 33

upvoted a paper 3 months ago

Making Mathematical Reasoning Adaptive

Paper • 2510.04617 • Published Oct 6, 2025 • 22

upvoted a paper 4 months ago

ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data

Paper • 2509.15221 • Published Sep 18, 2025 • 111

authored a paper 4 months ago

Intern-S1: A Scientific Multimodal Foundation Model

Paper • 2508.15763 • Published Aug 21, 2025 • 259

upvoted 2 papers 4 months ago

Intern-S1: A Scientific Multimodal Foundation Model

Paper • 2508.15763 • Published Aug 21, 2025 • 259

DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization

Paper • 2508.14460 • Published Aug 20, 2025 • 85

New activity in jon-tow/okapi_arc_challenge 5 months ago

Error with load_dataset()

👍 1

#2 opened 5 months ago by

xuhuang87

upvoted a paper 5 months ago

Seed-X: Building Strong Multilingual Translation LLM with 7B Parameters

Paper • 2507.13618 • Published Jul 18, 2025 • 16

upvoted 2 papers 7 months ago

A Controllable Examination for Long-Context Language Models

Paper • 2506.02921 • Published Jun 3, 2025 • 33

ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows

Paper • 2505.19897 • Published May 26, 2025 • 104

upvoted 2 papers 9 months ago

Could Thinking Multilingually Empower LLM Reasoning?

Paper • 2504.11833 • Published Apr 16, 2025 • 29

Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning

Paper • 2504.08672 • Published Apr 11, 2025 • 55

upvoted 2 papers 10 months ago

CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era

Paper • 2503.12329 • Published Mar 16, 2025 • 27

Process-based Self-Rewarding Language Models

Paper • 2503.03746 • Published Mar 5, 2025 • 39

upvoted a collection 11 months ago

BenchMAX

Collection

11 items • Updated Aug 25, 2025 • 10

upvoted a paper 11 months ago

BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language Models

Paper • 2502.07346 • Published Feb 11, 2025 • 53

commented a paper 11 months ago