ResearchGym

Overview

ResearchGym is an environment designed to approximate real-world LLM research. It provides a set of tools that the agent can use to interact with the environment.

Actions and Observations

There are 42 primitive tools in ResearchGym, which are grouped into five families:

  • Command execution: Manage execution sessions, run commands within a session, and retrieve session output.
  • File operations: Edit files, create files, search files, list files, and query file metadata.
  • Parsing operations: Extract and preview content from multi-modal sources (e.g., images, audio, video, etc.) for the agent to analyze.
  • Search action: Search the web for information.
  • Browse actions: Browse and retrieve content from web pages.

Each tool family is paired with an observation that normalizes the raw outputs into a structured, agent-readable return.

Multi-Computer Control

ResearchGym agents to control multiple machines (or Docker containers) concur-rently via HTTP. Each computer runs an HTTP server to receive and execute terminal commands, allowing an agent initialized on a single machine to orchestrate long-horizon, distributed experiments across a cluster.

Asynchronous Command Execution

ResearchGym decouples action execution from selection to prevent decision blocking. Agents can bind commands to specific sessions, or let ResearchGym create new ones. This ensures ongoing jobs continue uninterrupted and enable immediate subsequent planning. Agents can later retrieve the result via get_session_output asynchronously. To avoid nonsensical actions during model training, ResearchGym provides a dedicated sleep action.

Snapshots Saving and Loading

A snapshot records the task specification, the agent’s context, the final state of the workspace, and the remaining time budget. ResearchGym can periodically save the full state as snapshots, and it can restore the system from any snapshot. Snapshots support branching. Experiments can resume from different points or proceed along multiple branches.