英伟达开源项目GR00T-WholeBodyControl仿真安装指南（

官网教程

GR00T-WholeBodyControl
论文标题: SONIC: Supersizing Motion Tracking for Natural Humanoid Whole-Body Control
作者机构: 英伟达 (Nvidia)
论文地址: https://arxiv.org/abs/2511.07820
项目主页: https://nvlabs.github.io/SONIC/

Install

0. 先安装TensorRT

进入NVIDIA 开发者网站，找一个对应版本的包进行下载。我这里选的是10.13版本，找一个对应linux和cuda版本的tar安装包，复制链接，然后终端输入：

# wget + 你复制的链接
wget https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.13.0/tars/TensorRT-10.13.0.35.Linux.x86_64-gnu.cuda-12.9.tar.gz

下载到本地目录即可。

然后按照官方教程下载pv并解压tar包：

sudo apt-get install -y pv
# pv TensorRT-*.tar.gz | tar -xz -f -（这里换成你下载的tar包）
pv TensorRT-10.13.0.35.Linux.x86_64-gnu.cuda-12.9.tar.gz | tar -xz -f -

解压后文件位置：path/to/your/packages/TensorRT-10.13.0.35
需要写入~/.bashrc，这里为了防止机子每日刷新，直接写成固定脚本了，记得修改成自己的路径：

TENSORRT_ROOT=path/to/your/packages/TensorRT-10.13.0.35

grep -q 'export TensorRT_ROOT=path/to/your/packages/TensorRT-10.13.0.35' ~/.bashrc || cat >> ~/.bashrc <<EOF

# TensorRT
export TensorRT_ROOT=$TENSORRT_ROOT
export LD_LIBRARY_PATH=\$TensorRT_ROOT/lib:\${LD_LIBRARY_PATH}
EOF

source ~/.bashrc

# 检查
echo $TensorRT_ROOT
echo $LD_LIBRARY_PATH

看到打印出结果，说明这一步配置成功。

1. 克隆仓库

git clone https://github.com/NVlabs/GR00T-WholeBodyControl.git
cd GR00T-WholeBodyControl
git lfs pull          # make sure all large files are fetched

2. Setup

这里推荐用官方的Native Development：

2.1 安装系统依赖项

cd gear_sonic_deploy
chmod +x scripts/install_deps.sh
./scripts/install_deps.sh

2.2 环境设置

source scripts/setup_env.sh

2.3 构建项目

just build

2.4 下载checkpoints

先简单装个环境，这一步可有可无

conda create -n sonic python=3.10 -y
conda activate sonic

安装依赖项

pip install huggingface_hub

运行下载脚本

cd ..
python download_from_hf.py

3. 一键安装mujoco

bash install_scripts/install_mujoco_sim.sh

Quick Start

针对服务器等无可视化界面情况

思路是先运行headless得到过程参数，再另外用可视化脚本渲染出视频并保存。

每次环境启动：

cd gear_sonic_deploy
source scripts/setup_env.sh
just build

cd ..
source .venv_sim/bin/activate

推理部分：

自己写了一个headless启动推理 + 渲染为视频的脚本，能根据预设humanoid motion序列执行。

phase 1：headless启动推理

bash ./gear_sonic/scripts/run_headless_reference_motion_phase1.sh

phase 2：渲染为视频

bash ./gear_sonic/scripts/run_headless_reference_motion_phase2.sh \
  ./outputs/headless_test/sim_states.npz

下面附脚本和代码，都放在./gear_sonic/scripts目录下即可：

`run_headless_reference_motion_phase1.sh`

#!/usr/bin/env bash
set -euo pipefail

SCRIPT_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
REPO_ROOT=$(cd "$SCRIPT_DIR/../.." && pwd)

MOTION_PATH="${1:-$REPO_ROOT/gear_sonic_deploy/reference/example/squat_001__A359}"

OUTPUT_DIR="${2:-$REPO_ROOT/outputs/headless_test}"
STATE_NAME="${3:-sim_states.npz}"

EXTRA_ARGS=()
if [[ $# -gt 3 ]]; then
  EXTRA_ARGS=("${@:4}")
fi

START_TIME=$(date +%s)
finish() {
  local exit_code=$?
  local end_time elapsed
  end_time=$(date +%s)
  elapsed=$((end_time - START_TIME))
  echo "[INFO] Phase 1 shell elapsed: ${elapsed}s (exit=${exit_code})"
}
trap finish EXIT

if [[ ! -d "$REPO_ROOT/.venv_sim" ]]; then
  echo ".venv_sim was not found under $REPO_ROOT" >&2
  echo "Run: bash install_scripts/install_mujoco_sim.sh" >&2
  exit 1
fi

source "$REPO_ROOT/.venv_sim/bin/activate"

python "$SCRIPT_DIR/run_headless_reference_motion_phase1.py" \
  --motion-path "$MOTION_PATH" \
  --output-dir "$OUTPUT_DIR" \
  --state-name "$STATE_NAME" \
  --video-width 1440 \
  --video-height 832 \
  "${EXTRA_ARGS[@]}"

`run_headless_reference_motion_phase1.py`

#!/usr/bin/env python3
"""Run SONIC headless reference motion Phase 1 only and save recorded qpos."""

from __future__ import annotations

import argparse
import os
import subprocess
import threading
import time
from pathlib import Path

import numpy as np

import run_headless_reference_motion as combined


def parse_args() -> argparse.Namespace:
    parser = argparse.ArgumentParser(
        description="Run SONIC reference motion Phase 1 only and save times/qpos snapshots."
    )
    parser.add_argument(
        "--motion-path",
        type=Path,
        default=combined.DEPLOY_ROOT / "reference" / "example",
        help=(
            "Either a motion dataset directory containing subfolders, or a single motion folder "
            "containing joint_pos.csv/body_pos.csv/etc."
        ),
    )
    parser.add_argument(
        "--motion-index",
        type=int,
        default=0,
        help="Motion index inside the dataset directory. Ignored for a single motion folder.",
    )
    parser.add_argument(
        "--output-dir",
        type=Path,
        default=combined.DEFAULT_OUTPUT_DIR,
        help="Directory for the recorded qpos dump and deploy log.",
    )
    parser.add_argument(
        "--state-name",
        type=str,
        default=combined.DEFAULT_STATE_DUMP_NAME,
        help="Output NPZ filename. The file stores both `times` and `qpos`.",
    )
    parser.add_argument(
        "--log-name",
        type=str,
        default=combined.DEFAULT_LOG_NAME,
        help="Deploy log filename.",
    )
    parser.add_argument("--video-width", type=int, default=1280)
    parser.add_argument("--video-height", type=int, default=720)
    parser.add_argument(
        "--warmup-seconds",
        type=float,
        default=2.0,
        help="Wait time after starting the simulator before launching deploy.",
    )
    parser.add_argument(
        "--control-ready-seconds",
        type=float,
        default=0.5,
        help="Extra wait time after deploy init completes before sending the start key.",
    )
    parser.add_argument(
        "--play-delay-seconds",
        type=float,
        default=2.0,
        help="Wait time between starting control and dropping the robot to the ground.",
    )
    parser.add_argument(
        "--drop-settle-seconds",
        type=float,
        default=1.0,
        help="Wait time after dropping the robot before recording and starting playback.",
    )
    parser.add_argument(
        "--post-complete-seconds",
        type=float,
        default=1.0,
        help="Extra simulation time after the motion completes before stopping.",
    )
    parser.add_argument(
        "--timeout-seconds",
        type=float,
        default=300.0,
        help="Fail if the motion does not complete within this many seconds.",
    )
    parser.add_argument(
        "--sim-frequency",
        type=int,
        default=200,
        help="Simulation frequency for the MuJoCo loop.",
    )
    parser.add_argument(
        "--camera-distance",
        type=float,
        default=2.8,
        help="Unused in Phase 1, kept for compatibility with the shared simulator builder.",
    )
    parser.add_argument(
        "--camera-azimuth",
        type=float,
        default=120.0,
        help="Unused in Phase 1, kept for compatibility with the shared simulator builder.",
    )
    parser.add_argument(
        "--camera-elevation",
        type=float,
        default=-25.0,
        help="Unused in Phase 1, kept for compatibility with the shared simulator builder.",
    )
    parser.add_argument(
        "--camera-lookat",
        type=float,
        nargs=3,
        default=[0.0, 0.0, 0.7],
        metavar=("X", "Y", "Z"),
        help="Unused in Phase 1, kept for compatibility with the shared simulator builder.",
    )
    parser.add_argument(
        "--decoder",
        type=Path,
        default=combined.DEPLOY_ROOT / "policy" / "release" / "model_decoder.onnx",
        help="Decoder model path.",
    )
    parser.add_argument(
        "--encoder",
        type=Path,
        default=combined.DEPLOY_ROOT / "policy" / "release" / "model_encoder.onnx",
        help="Encoder model path.",
    )
    parser.add_argument(
        "--planner",
        type=Path,
        default=None,
        help="Optional planner model path. Leave unset for fixed reference-motion playback.",
    )
    parser.add_argument(
        "--obs-config",
        type=Path,
        default=combined.DEPLOY_ROOT / "policy" / "release" / "observation_config.yaml",
        help="Observation config path.",
    )
    parser.add_argument(
        "--skip-build",
        action="store_true",
        help="Do not auto-build the deploy binary when missing.",
    )
    return parser.parse_args()


def format_elapsed(seconds: float) -> str:
    minutes, seconds = divmod(seconds, 60.0)
    hours, minutes = divmod(int(minutes), 60)
    if hours > 0:
        return f"{hours}h {minutes}m {seconds:.2f}s"
    if minutes > 0:
        return f"{minutes}m {seconds:.2f}s"
    return f"{seconds:.2f}s"


def resolve_optional_path(path: Path | None) -> Path | None:
    if path is None:
        return None
    return path.expanduser().resolve()


def ensure_phase1_paths(args: argparse.Namespace) -> None:
    required_paths = [
        args.decoder,
        args.encoder,
        args.obs_config,
        args.motion_path,
    ]
    if args.planner is not None:
        required_paths.append(args.planner)
    missing = [str(path) for path in required_paths if not path.exists()]
    if missing:
        raise FileNotFoundError("Missing required paths:\n  " + "\n  ".join(missing))


def list_motion_folders(dataset_root: Path) -> list[Path]:
    return sorted(
        [
            path
            for path in dataset_root.iterdir()
            if path.is_dir() and (path / "joint_pos.csv").exists()
        ],
        key=lambda path: path.name,
    )


def read_reference_timing(motion_dir: Path) -> dict[str, float | int | str]:
    joint_pos_path = motion_dir / "joint_pos.csv"
    metadata_path = motion_dir / "metadata.txt"
    info_path = motion_dir / "info.txt"

    if not joint_pos_path.exists():
        raise FileNotFoundError(f"joint_pos.csv not found under motion folder: {motion_dir}")

    with joint_pos_path.open("r", encoding="utf-8", errors="ignore") as f:
        frame_count = max(0, sum(1 for _ in f) - 1)

    target_fps = 50.0

    if info_path.exists():
        for line in info_path.read_text(encoding="utf-8", errors="ignore").splitlines():
            if line.startswith("target_fps:"):
                target_fps = float(line.split(":", 1)[1].strip())
                break
            if line.startswith("Target FPS:"):
                target_fps = float(line.split(":", 1)[1].strip())
                break
    elif metadata_path.exists():
        for line in metadata_path.read_text(encoding="utf-8", errors="ignore").splitlines():
            if line.startswith("Total timesteps:"):
                metadata_frames = int(line.split(":", 1)[1].strip())
                if metadata_frames > 0 and frame_count <= 0:
                    frame_count = metadata_frames
                break

    if target_fps <= 0:
        target_fps = 50.0

    return {
        "motion_name": motion_dir.name,
        "reference_frames": int(frame_count),
        "reference_fps": float(target_fps),
        "reference_duration_seconds": float(frame_count) / float(target_fps),
    }


def write_state_metadata(state_path: Path, reference_timing: dict[str, float | int | str]) -> dict[str, float]:
    with np.load(state_path) as data:
        payload = {name: np.asarray(data[name]) for name in data.files}

    times = np.asarray(payload["times"], dtype=np.float64)
    recorded_duration = float(times[-1]) if times.size > 0 else 0.0
    reference_duration = float(reference_timing["reference_duration_seconds"])
    render_end_time = min(reference_duration, recorded_duration) if recorded_duration > 0.0 else 0.0

    payload["reference_frames"] = np.asarray(int(reference_timing["reference_frames"]), dtype=np.int64)
    payload["reference_fps"] = np.asarray(float(reference_timing["reference_fps"]), dtype=np.float64)
    payload["reference_duration_seconds"] = np.asarray(reference_duration, dtype=np.float64)
    payload["render_start_time_seconds"] = np.asarray(0.0, dtype=np.float64)
    payload["render_end_time_seconds"] = np.asarray(render_end_time, dtype=np.float64)

    tmp_path = state_path.with_name(f"{state_path.stem}.tmp.npz")
    np.savez_compressed(tmp_path, **payload)
    tmp_path.replace(state_path)

    return {
        "recorded_duration_seconds": recorded_duration,
        "render_end_time_seconds": render_end_time,
    }


def main() -> int:
    args = parse_args()
    start_time = time.perf_counter()

    args.motion_path = args.motion_path.expanduser().resolve()
    args.output_dir = args.output_dir.expanduser().resolve()
    args.decoder = args.decoder.expanduser().resolve()
    args.encoder = args.encoder.expanduser().resolve()
    args.obs_config = args.obs_config.expanduser().resolve()
    args.planner = resolve_optional_path(args.planner)

    try:
        ensure_phase1_paths(args)
        combined.ensure_deploy_binary(skip_build=args.skip_build)

        args.output_dir.mkdir(parents=True, exist_ok=True)
        state_path = args.output_dir / args.state_name
        log_path = args.output_dir / args.log_name

        dataset_root, dataset_motion_index, temp_dir = combined.prepare_motion_dataset(args.motion_path)
        motion_index = dataset_motion_index if temp_dir is not None else args.motion_index
        motion_folders = list_motion_folders(dataset_root)
        if not motion_folders:
            raise FileNotFoundError(f"No motion folders with joint_pos.csv found under: {dataset_root}")
        if motion_index < 0 or motion_index >= len(motion_folders):
            raise IndexError(
                f"Motion index {motion_index} is out of range for dataset {dataset_root} "
                f"(found {len(motion_folders)} motions)."
            )
        selected_motion_dir = motion_folders[motion_index]
        reference_timing = read_reference_timing(selected_motion_dir)

        stop_event = threading.Event()
        record_event = threading.Event()
        sim_ready_event = threading.Event()
        sim_status: dict[str, object] = {}
        sim_thread = threading.Thread(
            target=combined.run_headless_simulation,
            args=(args, state_path, stop_event, record_event, sim_ready_event, sim_status),
            daemon=True,
        )

        process: subprocess.Popen[bytes] | None = None
        master_fd: int | None = None
        monitor: combined.DeployMonitor | None = None

        try:
            sim_thread.start()
            if not sim_ready_event.wait(timeout=30.0):
                raise TimeoutError("Timed out waiting for the headless simulator to start.")
            if "error" in sim_status:
                raise RuntimeError(f"Simulator thread failed during startup: {sim_status['error']}")

            time.sleep(args.warmup_seconds)

            process, master_fd, monitor = combined.launch_deploy(
                dataset_root=dataset_root,
                args=args,
                log_path=log_path,
            )
            combined.automate_deploy(
                process=process,
                master_fd=master_fd,
                monitor=monitor,
                motion_index=motion_index,
                control_ready_seconds=args.control_ready_seconds,
                play_delay_seconds=args.play_delay_seconds,
                drop_settle_seconds=args.drop_settle_seconds,
                post_complete_seconds=args.post_complete_seconds,
                timeout_seconds=args.timeout_seconds,
                sim_status=sim_status,
                record_event=record_event,
            )
            combined.wait_for_process_exit(process)
        finally:
            stop_event.set()
            sim_thread.join(timeout=10.0)

            if monitor is not None:
                monitor.close()
            if master_fd is not None:
                try:
                    os.close(master_fd)
                except OSError:
                    pass
            if process is not None and process.poll() is None:
                combined.wait_for_process_exit(process, timeout=5.0)
            if temp_dir is not None:
                temp_dir.cleanup()

        if "error" in sim_status:
            raise RuntimeError(f"Simulator thread failed: {sim_status['error']}")
        if not state_path.exists():
            raise FileNotFoundError(f"Expected recorded state dump was not created: {state_path}")

        state_timing = write_state_metadata(state_path, reference_timing)
        with np.load(state_path) as data:
            times = np.asarray(data["times"], dtype=np.float64)
        duration_seconds = float(times[-1]) if times.size > 0 else 0.0

        print(f"\nPhase 1 state dump saved to: {state_path}")
        print(f"Deploy log saved to: {log_path}")
        print(f"Recorded frames: {int(times.size)}")
        print(f"Recorded motion duration: {duration_seconds:.2f}s")
        print(
            "Reference motion window: "
            f"{reference_timing['reference_duration_seconds']:.2f}s "
            f"({reference_timing['reference_frames']} frames @ {reference_timing['reference_fps']:.2f}Hz)"
        )
        print(f"Auto-trim render end time: {state_timing['render_end_time_seconds']:.2f}s")
        return 0
    finally:
        elapsed = time.perf_counter() - start_time
        print(f"Phase 1 elapsed time: {format_elapsed(elapsed)}")


if __name__ == "__main__":
    raise SystemExit(main())

`run_headless_reference_motion_phase2.sh`

#!/usr/bin/env bash
set -euo pipefail

SCRIPT_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
REPO_ROOT=$(cd "$SCRIPT_DIR/../.." && pwd)

STATE_PATH="${1:-$REPO_ROOT/outputs/headless_test/sim_states.npz}"
OUTPUT_DIR="${2:-$(dirname "$STATE_PATH")}"
VIDEO_NAME="${3:-headless_motion.mp4}"

EXTRA_ARGS=()
if [[ $# -gt 3 ]]; then
  EXTRA_ARGS=("${@:4}")
fi

START_TIME=$(date +%s)
finish() {
  local exit_code=$?
  local end_time elapsed
  end_time=$(date +%s)
  elapsed=$((end_time - START_TIME))
  echo "[INFO] Phase 2 shell elapsed: ${elapsed}s (exit=${exit_code})"
}
trap finish EXIT

if [[ ! -d "$REPO_ROOT/.venv_sim" ]]; then
  echo ".venv_sim was not found under $REPO_ROOT" >&2
  echo "Run: bash install_scripts/install_mujoco_sim.sh" >&2
  exit 1
fi

source "$REPO_ROOT/.venv_sim/bin/activate"

python "$SCRIPT_DIR/run_headless_reference_motion_phase2.py" \
  --state-path "$STATE_PATH" \
  --output-dir "$OUTPUT_DIR" \
  --video-name "$VIDEO_NAME" \
  --video-width 1440 \
  --video-height 832 \
  --video-fps 15 \
  "${EXTRA_ARGS[@]}"

`run_headless_reference_motion_phase2.py`

#!/usr/bin/env python3
"""Run SONIC headless reference motion Phase 2 only and render a video from qpos."""

from __future__ import annotations

import argparse
import re
import tempfile
import time
from pathlib import Path

import numpy as np

import run_headless_reference_motion as combined


def parse_args() -> argparse.Namespace:
    parser = argparse.ArgumentParser(
        description="Render a recorded SONIC qpos dump into an MP4 video."
    )
    parser.add_argument(
        "--state-path",
        type=Path,
        default=combined.DEFAULT_OUTPUT_DIR / combined.DEFAULT_STATE_DUMP_NAME,
        help="Input NPZ path produced by Phase 1. The file must contain `times` and `qpos`.",
    )
    parser.add_argument(
        "--output-dir",
        type=Path,
        default=None,
        help="Directory for the rendered video. Defaults to the state file parent directory.",
    )
    parser.add_argument(
        "--video-name",
        type=str,
        default="headless_motion.mp4",
        help="Output video filename.",
    )
    parser.add_argument("--video-width", type=int, default=1280)
    parser.add_argument("--video-height", type=int, default=720)
    parser.add_argument("--video-fps", type=int, default=15)
    parser.add_argument(
        "--no-auto-trim",
        action="store_true",
        help="Render the full recorded qpos dump instead of trimming to the saved reference-motion window.",
    )
    parser.add_argument(
        "--trim-end-seconds",
        type=float,
        default=None,
        help="Optional manual render end time in seconds. Overrides the auto-trim window.",
    )
    parser.add_argument(
        "--sim-frequency",
        type=int,
        default=200,
        help="Simulation frequency used to build the offscreen simulator.",
    )
    parser.add_argument(
        "--camera-distance",
        type=float,
        default=2.8,
        help="Tracking camera distance.",
    )
    parser.add_argument(
        "--camera-azimuth",
        type=float,
        default=120.0,
        help="Tracking camera azimuth in degrees.",
    )
    parser.add_argument(
        "--camera-elevation",
        type=float,
        default=-25.0,
        help="Tracking camera elevation in degrees.",
    )
    parser.add_argument(
        "--camera-lookat",
        type=float,
        nargs=3,
        default=[0.0, 0.0, 0.7],
        metavar=("X", "Y", "Z"),
        help="Tracking camera lookat point.",
    )
    return parser.parse_args()


def format_elapsed(seconds: float) -> str:
    minutes, seconds = divmod(seconds, 60.0)
    hours, minutes = divmod(int(minutes), 60)
    if hours > 0:
        return f"{hours}h {minutes}m {seconds:.2f}s"
    if minutes > 0:
        return f"{minutes}m {seconds:.2f}s"
    return f"{seconds:.2f}s"


def ensure_phase2_inputs(args: argparse.Namespace) -> None:
    if not args.state_path.exists():
        raise FileNotFoundError(f"Phase 1 state file not found: {args.state_path}")
    if args.video_width % 2 != 0 or args.video_height % 2 != 0:
        raise ValueError("H.264 yuv420p output requires even video dimensions.")
    combined.ensure_ffmpeg_available()


def infer_trim_end_from_deploy_log(state_path: Path) -> float | None:
    log_path = state_path.with_name("deploy.log")
    if not log_path.exists():
        return None

    text = log_path.read_text(encoding="utf-8", errors="ignore")
    matches = re.findall(r"Loaded .* \((\d+) timesteps\)", text)
    if not matches:
        return None

    timesteps = int(matches[-1])
    if timesteps <= 0:
        return None
    return float(timesteps) / 50.0


def build_trimmed_state_file(args: argparse.Namespace) -> tuple[Path, dict[str, float]]:
    with np.load(args.state_path) as data:
        payload = {name: np.asarray(data[name]) for name in data.files}

    times = np.asarray(payload["times"], dtype=np.float64)
    qpos = np.asarray(payload["qpos"], dtype=np.float64)
    if times.size == 0:
        raise RuntimeError("The input state dump is empty.")

    trim_start = 0.0
    trim_end: float | None = None

    if args.trim_end_seconds is not None:
        trim_end = float(args.trim_end_seconds)
    elif not args.no_auto_trim and "render_end_time_seconds" in payload:
        trim_end = float(np.asarray(payload["render_end_time_seconds"]).item())
        if "render_start_time_seconds" in payload:
            trim_start = float(np.asarray(payload["render_start_time_seconds"]).item())
    elif not args.no_auto_trim:
        trim_end = infer_trim_end_from_deploy_log(args.state_path)

    if trim_end is None:
        return args.state_path, {
            "trim_applied": 0.0,
            "render_duration_seconds": float(times[-1]) if times.size > 0 else 0.0,
        }

    trim_start = max(0.0, trim_start)
    trim_end = max(trim_start, min(trim_end, float(times[-1])))

    mask = (times >= trim_start - 1e-9) & (times <= trim_end + 1e-9)
    if not np.any(mask):
        nearest = int(np.argmin(np.abs(times - trim_end)))
        mask = np.zeros(times.shape[0], dtype=bool)
        mask[nearest] = True

    trimmed_times = times[mask].copy()
    trimmed_qpos = qpos[mask].copy()
    trimmed_times -= trimmed_times[0]

    payload["times"] = trimmed_times
    payload["qpos"] = trimmed_qpos
    payload["render_start_time_seconds"] = np.asarray(0.0, dtype=np.float64)
    payload["render_end_time_seconds"] = np.asarray(
        float(trimmed_times[-1]) if trimmed_times.size > 0 else 0.0,
        dtype=np.float64,
    )

    with tempfile.NamedTemporaryFile(prefix="phase2_trimmed_", suffix=".npz", delete=False) as tmp:
        temp_state_path = Path(tmp.name)
    np.savez_compressed(temp_state_path, **payload)
    return temp_state_path, {
        "trim_applied": float(times[-1]) - float(trimmed_times[-1]),
        "render_duration_seconds": float(trimmed_times[-1]) if trimmed_times.size > 0 else 0.0,
    }


def main() -> int:
    args = parse_args()
    start_time = time.perf_counter()

    args.state_path = args.state_path.expanduser().resolve()
    args.output_dir = (
        args.state_path.parent if args.output_dir is None else args.output_dir.expanduser().resolve()
    )

    try:
        ensure_phase2_inputs(args)
        args.output_dir.mkdir(parents=True, exist_ok=True)
        video_path = args.output_dir / args.video_name

        with np.load(args.state_path) as data:
            times = np.asarray(data["times"], dtype=np.float64)
        duration_seconds = float(times[-1]) if times.size > 0 else 0.0

        print(f"Input state file: {args.state_path}")
        print(f"Input frames: {int(times.size)}")
        print(f"Input motion duration: {duration_seconds:.2f}s")

        state_for_render, trim_info = build_trimmed_state_file(args)
        try:
            if state_for_render != args.state_path:
                print(f"Trimmed render duration: {trim_info['render_duration_seconds']:.2f}s")
            combined.render_recorded_video(
                args=args,
                state_path=state_for_render,
                video_path=video_path,
            )
        finally:
            if state_for_render != args.state_path:
                state_for_render.unlink(missing_ok=True)

        if not video_path.exists():
            raise FileNotFoundError(f"Expected output video was not created: {video_path}")

        print(f"\nPhase 2 video saved to: {video_path}")
        return 0
    finally:
        elapsed = time.perf_counter() - start_time
        print(f"Phase 2 elapsed time: {format_elapsed(elapsed)}")


if __name__ == "__main__":
    raise SystemExit(main())

最后在./outputs/headless_test目录下看到生成的npz文件和视频就说明跑通了。

如果想走可视化交互，可以参考官方教程：

运行sim2sim

终端 1 — MuJoCo 模拟器（主机，来自仓库根目录）：

source .venv_sim/bin/activate
python gear_sonic/scripts/run_sim_loop.py # 如果是服务器，这里会直接报错，因为尝试打开可视化窗口

终端 2 — 部署（主机或 Docker，来自gear_sonic_deploy/）：

bash deploy.sh sim