SAGCI-System paper review

Problem

Tasks in the real life: unstructured environments.

Need to learn things quickly.

Target

Sample-Efficient, Generalizable, Compositional, and Incremental

File

Unified Robot Description Format(URFD)

Related works

Interactive Perception

Differential Simulation

Model-Based Reinforcement Learning

Model Established

Build Environment Initial Model

The $t$ th step point clouds: $\{P_t^i\}_{i = 1}^N \in \mathscr{R}^{N \times 3}$ Taking advantage of a segmentation model to generate $\{M_t^i\}_{i = 1}^N$, $M_i^t \in \{1, 2, \cdots, K\}$, which help to segment the raw points to $K$ links in the URDF. $\{\mathscr{G}_t^j\}_{j = 1}^ K$ denotes these links. Based on that, using MainfoldPlus to help build meshes. The joints have 4 types, so we need $\{\mathscr{J}\} \in \mathscr{R}^{K \times K \times 4}$ to represent them. The corresponding joint spatial is $\mathscr{C} \in \mathscr{R}^{K \times K \times 9}$, about axis, origin, and orientation. Form a $ST$ to compose a URDF.

Interactive Perception

The parameters: $\mathscr{Z} = \{\mathscr{J, C, M, \alpha^{sim} }\}$. The $\alpha^{sim}$ is the physical attributes of the object. $a_t^{IP}$ denotes the move of the $t$th step. First, in the simulation, we have: $$ \bar{\mathscr{P} }_{t + 1}^i = \mathscr{F}^{sim}(\mathscr{P_t}, \mathscr{Z}, \mathscr{\varepsilon}, a_t^{IP}) $$ Second, in real world, we have: $$ \widetilde{\mathscr{P} }_{t+1}^i = \mathscr{F}^{real}(\mathscr{P}_t^i, \mathscr{U}_t^i, \{\mathscr{P}_{t+1}^i\}_{i = 1}^N) $$ The $\{\mathscr{U}_t^i\}_{i = 1}^N$ denotes the scene flow, which is generated from the $\{\{\mathscr{P}_t^i\}_{i = 1}^N, \{\mathscr{P}_{t+1}^i\}_{i = 1}^N\}$ The $\widetilde{\mathscr{P} }_{t+1}^i$ is the nearest point in the $\{\mathscr{P}_{t+1}^i\}$ of $\{\{\mathscr{U}_t^i\} + \{\mathscr{P}_t^i\}\}$ . Now consider the optimization of the $\mathscr{Z}$. The distance between the simulation and the real world is: $$ \mathscr{L}_{t +1} = \frac{1}{N} \sum_{i = 1}^N\|\bar{\mathscr{P} }_{t + 1}^i - \widetilde{\mathscr{P} }_{t+1}^i\|^2 $$ The $\mathscr{F}^{sim}$ is differentiable, so we could: $$ \mathscr{Z}' = \mathscr{Z} - \lambda\frac{\partial \mathscr{L}_{t +1} }{\partial \mathscr{Z} } $$ Remember $\mathscr{Z} = \{\mathscr{J, C, M, \alpha^{sim} }\}$, so it is actually optimizing these 4 matrixes. Reminding that the $\alpha^{sim}$ should be combined with $a_t^{IP}$ to calculate the joint state change $\delta q$. The improvement is: $$ r_t = \mathscr{L}_{t+ 1}(\mathscr{Z}) - \mathscr{L}_{t+ 1}(\mathscr{Z'}) $$ During the simulation, the model adopted is Nimble, which is differentiable. The point is, instead of just adopting the Nimble's prediction $s_{t+1}^{Nim}$, the model calculates the transition $\delta_{s_{t + 1} }$ from $s_t^{sim}, a_t, s_{t + 1}^{Nim}$. Then the result $s_{t +1}^{Nim} + \delta_{s_{t+1} }$ is the real $s_{t + 1}^{sim}$ The backward propagation is just normal chain rules.

Experiment

The judgment standard is mIoU, segmentation Acc., Rot, and Tran error.

sf + noise and sf are better than the Initial model, sf + ground truth is the best.