GEM benchmark

https://gem-benchmark.com

Activity Feed Request to join this org

AI & ML interests

We develop infrastructure for the evaluation of generated text.

Recent Activity

fladhak authored a paper 2 days ago

SpaceTools: Tool-Augmented Spatial Reasoning via Double Interactive RL

yjernite authored a paper 4 months ago

The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources

yjernite authored a paper 4 months ago

In-House Evaluation Is Not Enough: Towards Robust Third-Party Flaw Disclosure for General-Purpose AI

View all activity

GEM 's models

None public yet