自动驾驶里的，让自动驾驶系统不仅看见当前，

自动驾驶里的 World Model（世界模型） 是最近 2–3 年非常核心的方向，本质是：

学习一个可以预测“世界如何演化”的模型，让自动驾驶系统不仅看见当前，还能“想象未来”。

简单说：
感知 → 世界模型 → 预测未来 → 规划行为

而不是传统 pipeline：
感知 → 预测 → 规划

世界模型把这些统一起来。emergentmind.com

一、当前主流自动驾驶世界模型（重要论文 / 系统）

我按 工业界 + 学术界 给你列一个比较完整的列表。

1 Wayve：GAIA-1（工业界代表）

公司：Wayve

特点：

90亿参数 generative world model
输入：video + text + action
输出：未来驾驶场景
可生成完整驾驶视频

核心能力：

预测未来交通
生成新的驾驶场景
用作自动驾驶训练 simulator

训练数据：

4700小时真实驾驶数据

Wayve

核心思想：

scene_t + action_t → scene_t+1

类似：

LLM：token → next token
World model：scene → next scene

2 Waymo：Genie-based World Model

公司：Waymo + DeepMind

能力：

从文本 / 图像生成 可交互3D驾驶世界
自动生成极端场景（龙卷风、火灾等）
用于大规模仿真训练

作用：

解决 rare edge cases

The Verge

换句话说：

真实数据 + 生成世界 → 无限训练数据

3 OccWorld（3D Occupancy World Model）

论文：OccWorld

特点：

使用 3D occupancy grid 表达世界
Transformer 预测未来 occupancy
同时预测

ego trajectory
+
future scene

arXiv

核心结构：

3D Occupancy → Scene Tokens → Transformer → Future Occupancy

优势：

比 bounding box 更细粒度
可表达完整空间结构

4 MUVO（Multimodal World Model）

论文：MUVO

特点：

camera + lidar 融合
voxel spatial representation
预测未来 sensor data

arXiv

核心思想：

sensor → unified world representation → future prediction

5 DrivingWorld（Video GPT World Model）

特点：

使用 Video GPT
预测未来 driving video
可以根据车辆动作生成场景

结构：

video frames + action
→ autoregressive transformer
→ future frames

Xiaotao Hu

6 HERMES（2025 新模型）

特点：

BEV world model
同时做

scene understanding
+
scene generation

并且引入

LLM + causal attention

用于世界知识推理。

arXiv

二、世界模型的技术路线（非常关键）

目前自动驾驶 world model 大致分 4 个技术路线：

类型	表达世界
Image world model	图像预测
BEV world model	鸟瞰图
Occupancy world model	3D占据
Latent world model	latent space

emergentmind.com

1 Image world model

直接预测未来图像

例子：

DrivingWorld
GAIA-1

优点：

生成真实视频

缺点：

不利于 planning

2 BEV world model

在 BEV 上预测

例子：

HERMES
Tesla FSD internal

优点：

planning友好

3 Occupancy world model

3D voxel

例子：

OccWorld
OccNet系列

优点：

几何精度高

4 Latent world model（最前沿）

类似 MuZero / Dreamer

流程：

sensor → latent state

latent + action
→ future latent

latent → decode scene

优点：

高效
可以 RL

三、世界模型完整结构（自动驾驶未来架构）

最理想结构其实是：

Sensors

↓

World Encoder

↓

Latent World State

↓

World Model (predict future)

↓

Policy / Planner

↓

Control

数学上：

z_t = Encoder(o_t)

z_{t+1} = WorldModel(z_t , a_t)

a_t = Policy(z_t)

这和强化学习 world model 完全一样。

四、为什么世界模型重要

传统自动驾驶：

perception

prediction

planning

问题：

信息丢失
模块不一致
edge case 多

世界模型：

统一世界表示

+

统一未来预测

优势：

1️⃣ 可以 预测未来世界

2️⃣ 可以 生成训练数据

3️⃣ 可以 统一感知+预测+规划

五、现在谁最领先

我给你一个非常真实的判断：

公司	世界模型进度
Wayve	最激进
Waymo	simulation最强
Tesla	latent BEV
Nvidia	DriveSim
OpenDriveLab	academic

六、未来自动驾驶架构（极可能）

未来自动驾驶会变成：

Foundation Model for Driving

结构：

Multimodal Transformer

camera
lidar
map
language
actions

→ unified world model

类似：

GPT + Dreamer + MuZero

七、如果你是做机器人算法（你现在做轮臂机器人）

其实 世界模型就是机器人终极架构。

机器人未来：

Perception → World Model → Policy

这会统一：

自动驾驶
机器人
embodied AI

如果你愿意，我可以给你讲一个 非常关键但很多人没看懂的东西：

“特斯拉 FSD 的世界模型到底长什么样”

我可以给你画出 FSD / Wayve / Waymo 三种世界模型架构差异图，这个对你理解自动驾驶架构会非常重要。

自动驾驶里的 World Model（世界模型）是最近 2–3 年非常核心的方向，本质是：

学习一个可以预测“世界如何演化”的模型，让自动驾驶系统不仅看见当前，还能“想象未来”。

简单说：
感知 → 世界模型 → 预测未来 → 规划行为

而不是传统 pipeline：
感知 → 预测 → 规划

世界模型把这些统一起来。

一、当前主流自动驾驶世界模型（重要论文 / 系统）

我按工业界 + 学术界给你列一个比较完整的列表。

1 Wayve：GAIA-1（工业界代表）

公司：Wayve

特点：

90亿参数 generative world model

输入：video + text + action

输出：未来驾驶场景

可生成完整驾驶视频

核心能力：

预测未来交通

生成新的驾驶场景

用作自动驾驶训练 simulator

训练数据：

4700小时真实驾驶数据

核心思想：

scene_t + action_t → scene_t+1

类似：

LLM：token → next token
World model：scene → next scene
2 Waymo：Genie-based World Model

公司：Waymo + DeepMind

能力：

从文本 / 图像生成可交互3D驾驶世界

自动生成极端场景（龙卷风、火灾等）

用于大规模仿真训练

作用：

解决 rare edge cases

换句话说：

真实数据 + 生成世界 → 无限训练数据
3 OccWorld（3D Occupancy World Model）

论文：OccWorld

特点：

使用 3D occupancy grid 表达世界

Transformer 预测未来 occupancy

同时预测

ego trajectory
+
future scene

核心结构：

3D Occupancy → Scene Tokens → Transformer → Future Occupancy

优势：

比 bounding box 更细粒度

可表达完整空间结构

4 MUVO（Multimodal World Model）

论文：MUVO

特点：

camera + lidar 融合

voxel spatial representation

预测未来 sensor data

核心思想：

sensor → unified world representation → future prediction
5 DrivingWorld（Video GPT World Model）

特点：

使用 Video GPT

预测未来 driving video

可以根据车辆动作生成场景

结构：

video frames + action
→ autoregressive transformer
→ future frames

6 HERMES（2025 新模型）

特点：

BEV world model

同时做

scene understanding
+
scene generation

并且引入

LLM + causal attention

用于世界知识推理。

二、世界模型的技术路线（非常关键）

目前自动驾驶 world model 大致分 4 个技术路线：

类型表达世界
Image world model 图像预测
BEV world model 鸟瞰图
Occupancy world model 3D占据
Latent world model latent space

1 Image world model

直接预测未来图像

例子：

DrivingWorld
GAIA-1

优点：

生成真实视频

缺点：

不利于 planning

2 BEV world model

在 BEV 上预测

例子：

HERMES
Tesla FSD internal

优点：

planning友好

3 Occupancy world model

3D voxel

例子：

OccWorld
OccNet系列

优点：

几何精度高

4 Latent world model（最前沿）

类似 MuZero / Dreamer

流程：

sensor → latent state

latent + action
→ future latent

latent → decode scene

优点：

高效

可以 RL

三、世界模型完整结构（自动驾驶未来架构）

最理想结构其实是：

Sensors
↓
World Encoder
↓
Latent World State
↓
World Model (predict future)
↓
Policy / Planner
↓
Control

数学上：

z_t = Encoder(o_t)

z_{t+1} = WorldModel(z_t , a_t)

a_t = Policy(z_t)

这和强化学习 world model 完全一样。

四、为什么世界模型重要

传统自动驾驶：

perception
prediction
planning

问题：

信息丢失

模块不一致

edge case 多

世界模型：

统一世界表示
+
统一未来预测

优势：

1️⃣ 可以预测未来世界
2️⃣ 可以生成训练数据
3️⃣ 可以统一感知+预测+规划

五、现在谁最领先

我给你一个非常真实的判断：

公司世界模型进度
Wayve 最激进
Waymo simulation最强
Tesla latent BEV
Nvidia DriveSim
OpenDriveLab academic
六、未来自动驾驶架构（极可能）

未来自动驾驶会变成：

Foundation Model for Driving

结构：

Multimodal Transformer

camera
lidar
map
language
actions

→ unified world model

类似：

GPT + Dreamer + MuZero
七、如果你是做机器人算法（你现在做轮臂机器人）

其实世界模型就是机器人终极架构。

机器人未来：

Perception → World Model → Policy

这会统一：

自动驾驶

机器人

embodied AI

如果你愿意，我可以给你讲一个非常关键但很多人没看懂的东西：

“特斯拉 FSD 的世界模型到底长什么样”

我可以给你画出 FSD / Wayve / Waymo 三种世界模型架构差异图，这个对你理解自动驾驶架构会非常重要。

自动驾驶里的，让自动驾驶系统不仅看见当前，

一、当前主流自动驾驶世界模型（重要论文 / 系统）

1 Wayve：GAIA-1（工业界代表）

2 Waymo：Genie-based World Model

3 OccWorld（3D Occupancy World Model）

4 MUVO（Multimodal World Model）

5 DrivingWorld（Video GPT World Model）

6 HERMES（2025 新模型）

二、世界模型的技术路线（非常关键）

1 Image world model

2 BEV world model

3 Occupancy world model

4 Latent world model（最前沿）

三、世界模型完整结构（自动驾驶未来架构）

四、为什么世界模型重要

五、现在谁最领先

六、未来自动驾驶架构（极可能）

七、如果你是做机器人算法（你现在做轮臂机器人）

相关阅读

本类排行

相关标签

本类推荐

栏目热点

猜你喜欢