Skip to content

Robots Sim

Strands Robots Sim is a Python library for controlling robots in simulated environments with natural language through Strands Agents. It lets you develop and test robot control strategies without physical hardware, using the same policy abstraction as Strands Robots.

The library provides two execution modes as Strands agent tools: SimEnv for full episode execution where the agent specifies a task and the policy runs to completion, and SteppedSimEnv for iterative control where the agent observes camera feedback after each batch of steps and adapts its instructions accordingly. This enables a dual-system pattern where the agent handles high-level reasoning and planning while a VLA policy handles low-level motor control.

Terminal window
pip install strands-robots-sim
# For simulation environment dependencies (e.g. Libero)
pip install strands-robots-sim[sim]
from strands import Agent
from strands_robots_sim import SimEnv, gr00t_inference
sim_env = SimEnv(
tool_name="my_sim",
env_type="libero",
task_suite="libero_10",
data_config="libero_10",
)
agent = Agent(tools=[sim_env, gr00t_inference])
# Start inference service
agent.tool.gr00t_inference(
action="start",
checkpoint_path="/data/checkpoints/model",
port=8000,
data_config="examples.Libero.custom_data_config:LiberoDataConfig",
)
# Run a task
agent("Run the task 'pick up the red block' for 5 episodes with video recording")
graph TD
A[Natural Language<br/>'Pick up the red block'] --> B[Strands Agent]
B --> C[SimEnv / SteppedSimEnv]
C --> D[Policy Provider]
C --> G[Simulation Environment]
D --> F[Action Chunk]
F --> G
G -.->|Observation| C
G -.->|Visual Feedback + State<br/>SteppedSimEnv only| B
classDef input fill:#2ea44f,stroke:#1b7735,color:#fff
classDef agent fill:#0969da,stroke:#044289,color:#fff
classDef policy fill:#8250df,stroke:#5a32a3,color:#fff
classDef simulation fill:#bf8700,stroke:#875e00,color:#fff
class A input
class B,C agent
class D,F policy
class G simulation

The agent receives a natural language instruction and routes it to a simulation tool. The tool coordinates with a policy provider to generate action chunks, which are executed in the simulation environment. Observations flow back for the next inference cycle. In SteppedSimEnv mode, camera images and state are also returned to the agent so it can reason about progress and adapt.

flowchart TB
subgraph Agent["🤖 Strands Agent"]
NL[Natural Language Input]
Tools[Tool Registry]
end
subgraph SimTool["🦾 Simulation Tool"]
direction TB
SE[SimEnv:<br/>Full Episode Execution]
SSE[SteppedSimEnv:<br/>Iterative Control]
TM[Task Manager]
AS[Async Executor]
end
subgraph Policy["🧠 Policy Layer"]
direction TB
PA[Policy Abstraction]
GP[GR00T Policy]
MP[Mock Policy]
CP[Custom Policy]
end
subgraph SimLayer["🔧 Simulation Layer"]
direction TB
ENV[Environment Abstraction]
SUITES[Task Suites]
CAM[Camera Interfaces]
STATE[State Management]
end
NL --> Tools
Tools --> SE
Tools --> SSE
SE --> TM
SSE --> TM
TM --> AS
AS --> PA
PA --> GP
PA --> MP
PA --> CP
AS --> ENV
ENV --> SUITES
ENV --> CAM
ENV --> STATE
classDef agentStyle fill:#0969da,stroke:#044289,color:#fff
classDef toolStyle fill:#2ea44f,stroke:#1b7735,color:#fff
classDef policyStyle fill:#8250df,stroke:#5a32a3,color:#fff
classDef simStyle fill:#d73a49,stroke:#a72b3a,color:#fff
class NL,Tools agentStyle
class SE,SSE,TM,AS toolStyle
class PA,GP,MP,CP policyStyle
class ENV,SUITES,CAM,STATE simStyle

The agent specifies a task once and the policy runs the full episode autonomously. This is the simpler mode, suited for benchmarking and well-defined tasks.

from strands_robots_sim import SimEnv
sim_env = SimEnv(
tool_name="my_sim",
env_type="libero",
task_suite="libero_10",
data_config="libero_10",
)
agent = Agent(tools=[sim_env, gr00t_inference])
# Blocking execution
agent.tool.my_sim(
action="execute",
instruction="pick up the red block",
policy_port=8000,
max_episodes=5,
max_steps_per_episode=200,
record_video=True,
)
# Or async execution with status monitoring
agent.tool.my_sim(
action="start",
instruction="stack the blocks",
policy_port=8000,
max_episodes=10,
)
agent.tool.my_sim(action="status")
agent.tool.my_sim(action="stop")

The agent acts as a planner, executing a limited number of steps per call and receiving camera images and state back. It can then reason about progress, decompose complex tasks into subtasks, and adapt instructions based on what it observes.

from strands_robots_sim import SteppedSimEnv
stepped_sim = SteppedSimEnv(
tool_name="my_stepped_sim",
env_type="libero",
task_suite="libero_10",
data_config="libero_10",
steps_per_call=10,
max_steps_per_episode=500,
)
agent = Agent(tools=[stepped_sim, gr00t_inference])
# Reset to a specific task
agent.tool.my_stepped_sim(
action="reset_episode",
task_name="KITCHEN_SCENE1_put_the_black_bowl_on_top_of_the_cabinet",
)
# Execute steps - returns camera images, state, reward, done status
agent.tool.my_stepped_sim(
action="execute_steps",
instruction="move gripper toward the bowl",
policy_port=8000,
num_steps=10,
)
# Agent observes the result and decides what to do next
agent.tool.my_stepped_sim(action="get_state")

In practice, you hand the full loop to the agent with a planning prompt. The agent decomposes a complex task like “pick up the block and place it in the drawer” into subtasks (locate block, grasp, lift, move to drawer, place), executes each with execute_steps, observes camera feedback, and adapts if something goes wrong.

FeatureSimEnvSteppedSimEnv
Control flowOne-shot executionStep-by-step iteration
Agent feedbackFinal reward onlyCamera images + state per batch
Use caseKnown tasks, benchmarkingComplex tasks requiring adaptation
Error recoveryNoneAgent can retry with different instructions

The framework implements a pattern inspired by System 1 / System 2 thinking. The Strands Agent serves as the deliberate planner (System 2) - it reasons about goals, decomposes tasks, and adapts strategy based on observations. The VLA policy serves as the fast executor (System 1) - it maps visual observations and language instructions to motor actions with low latency.

In SimEnv mode, System 2 fires once to specify the task and System 1 handles the rest. In SteppedSimEnv mode, the two systems collaborate iteratively: System 2 observes, plans, and issues instructions every N steps while System 1 executes the low-level control between each planning cycle.

The library uses the same Policy abstract class as Strands Robots. It ships with GR00T and mock providers, and you can add custom VLA models by subclassing Policy.

from strands_robots_sim import create_policy
policy = create_policy(provider="groot", data_config="libero", host="localhost", port=8000)
policy = create_policy(provider="mock")

Simulation environments are similarly abstracted through a SimulationEnvironment base class. The library ships with a Libero integration, and the factory supports adding new backends:

from strands_robots_sim.envs import create_simulation_environment
env = create_simulation_environment(env_type="libero", task_suite="libero_10")

The current Libero integration includes:

SuiteTasksDescription
libero_spatial10Spatial reasoning tasks
libero_object10Object-centric tasks
libero_goal10Goal-conditioned manipulation
libero_1010Standard benchmark
libero_9090Extended benchmark for comprehensive evaluation

This example shows the stepped execution mode where the agent plans and adapts:

from strands import Agent
from strands_robots_sim import SteppedSimEnv, gr00t_inference
stepped_sim = SteppedSimEnv(
tool_name="my_stepped_sim",
env_type="libero",
task_suite="libero_10",
data_config="libero_10",
steps_per_call=10,
max_steps_per_episode=500,
)
agent = Agent(tools=[stepped_sim, gr00t_inference])
agent.tool.gr00t_inference(
action="start",
checkpoint_path="/data/checkpoints/model",
port=8000,
data_config="examples.Libero.custom_data_config:LiberoDataConfig",
)
agent("""
Task: open the top drawer
You are a robot task planner. Decompose this task into subtasks and execute
them step-by-step using the my_stepped_sim tool.
1. Reset the episode with action="reset_episode"
2. For each subtask, call action="execute_steps" with the subtask as instruction
3. Observe camera images and state after each batch
4. Adapt your approach based on what you see
5. Continue until reward reaches 1.0 or the episode ends
""")
agent.tool.gr00t_inference(action="stop", port=8000)