To grasp AI capabilities across these cognitive abilities, we propose a three-stage evaluation protocol that benchmarks system performance in relation to human capabilities:
- Evaluate AI systems across a broad suite of cognitive tasks covering each ability, using held-out test sets to stop data contamination
- Collect human baselines for a similar tasks from a demographically representative sample of adults
- Map each AI system’s performance relative to the distribution of human performance in each ability
Going from theory to practice
Defining these cognitive abilities is a vital first step, but we want greater than a framework to measure progress. To place this theory into practice, we’re launching a brand new Kaggle hackathon — “Measuring progress toward AGI: Cognitive abilities”. The hackathon encourages the community to design evaluations for five cognitive abilities where the evaluation gap is the most important: learning, metacognition, attention, executive functions and social cognition.
Participants can use Kaggle’s newly launched Community Benchmarks platform to construct and test their evaluations against a lineup of frontier models.
We’re offering a complete prize pool of $200,000: $10,000 awards for the highest two submissions in each of the five tracks, and $25,000 grand prizes for the 4 very best overall submissions. Submissions are open March 17 through April 16, and we’ll announce the outcomes June 1. Head over to the Kaggle website to begin constructing.
