InnovatorBench: Evaluating Agents for Innovative LLM Research
Introduces the InnovatorBench benchmark, and ResearchGym execution environment.
Highlights multi-stage research tasks, standardized scoring, and baseline agent implementations.
Explore the foundational and companion benchmarks behind InnovatorBench. Use the quick navigation to jump to each paper's details.
Introduces the InnovatorBench benchmark, and ResearchGym execution environment.
Highlights multi-stage research tasks, standardized scoring, and baseline agent implementations.
Explores asynchronous human-agent collaboration for training on long-horizon tasks.
Introduces novel rollout strategies and training methodologies for extended task sequences.