Related Papers

Explore the foundational and companion benchmarks behind InnovatorBench. Use the quick navigation to jump to each paper's details.

InnovatorBench: Evaluating Agents for Innovative LLM Research

Introduces the InnovatorBench benchmark, and ResearchGym execution environment.

Highlights multi-stage research tasks, standardized scoring, and baseline agent implementations.

Interaction As Intelligence Part2:
Asynchronous Human-Agent Rollout for Long-Horizon Task Training

Explores asynchronous human-agent collaboration for training on long-horizon tasks.

Introduces novel rollout strategies and training methodologies for extended task sequences.