Transporter Networks
Abstract
- A simple model that learns to attend to a local region and predict its spatial displacement
- 10 unique tabletop manipulation tasks
- The model could achieve more than 90% success in new configurations using 100 expert demos
- Opensource Ravens
- Basically like a deep template matching, where you crop an observation around the object and use it as a template
Method
A problem setting is as follows:
- : the pose of the end effector used to pick / place an object
Learning to Transport
Assumptions:
- Immobilizing grasp (i.e., suction gripper) These assumptions provides the following setting:
- is sampled from a distribution of successful pick poses
- For each successful pick pose, there's a corresponding distr. of successful place pose ()
In equation,
Learning picking
- : pixel location --> We can map each pixel to a pick action :)
- is Fully Convolutional Network (FCN)
- Translationally equivariant (i.e., f_\text{pick}(g \circ o_t ) = g \circ f_\text{pick}(o_t})
Spatially consistent visual representations
What is spatially consistent?: The appearance of an object remains constant across different camera views
They convert RGB-D images into a spartially consistent form by unprojecting to a 3D point cloud and then rendering into an orthographic projection.
- : a dense feature of a cropped observation (template)
- : a dense feature of a crop at pose . A pose here is a pixel location (search area)
:::messages Learning with Planer Rotations: SE(2) ? They discretize the rotation into bins, and then rotate the observation accordingly. A trick is to apply FCN times in parallel, for each rotated . :::