Reliable robot grasping across many objects is challenging due to sensor noise and occlusions that lead to uncertainty about the precise shape, position, and mass of objects. The Dexterity Network (Dex-Net) 2.0 is a project centered on using physics-based models of robust robot grasping to generate massive datasets of parallel-jaw grasps across thousands of 3D CAD object models. These datasets are used to train deep neural networks to plan grasps from a point clouds on a physical robot that can lift and transport a wide variety of objects.

To facilitate reproducibility and future research, this blog post announces the release of the:

  1. Dexterity Network (Dex-Net) 2.0 dataset: 6.7 million pairs of synthetic point clouds and grasps with robustness labels. [link to data folder]
  2. Grasp Quality CNN (GQ-CNN) model: 18 million parameters trained on the Dex-Net 2.0 dataset. [link to our models]
  3. GQ-CNN Python Package: Code to replicate our GQ-CNN training results on synthetic data (note System Requirements below). [link to code].

In the post, we also summarize the methods behind Dex-Net 2.0 (1), our experimental results on a real robot, and details on the datasets, models, and code.

Research papers and additional information on the Dexterity Network can be found on the project website: https://berkeleyautomation.github.io/dex-net.

Dex-Net is a project in the AUTOLAB at UC Berkeley that is advised by Prof. Ken Goldberg.

Background on Grasping

Robot grasping across many objects is difficult due to sensor noise and occlusions, which make it challenging to precisely infer physical properties such as object shape, pose, material properties, mass, and the locations of contact points between the fingers and object. Recent results suggest that deep neural networks trained on large datasets of human grasp labels (2) or trials of grasping on a physical system (3) can be used to plan successful grasps across a wide variety of objects directly from images (4) with no explicit modeling of physics, similar to generalization results seen in computer vision. However, the training datasets may be time consuming to generate.

To reduce training time, one alternative is to use Cloud Computing to rapidly compute grasps across a large dataset of object mesh models (5) using physics-based models of grasping (6). These methods rank grasps by a quantity called the grasp robustness, which is the probability of grasp success predicted by models from mechanics, such as whether or not the grasp can resist arbitrary forces and torques according to probability distributions over properties such as object position and surface friction (7). However, these methods make the strong assumption of a perception system that estimates these properties either perfectly or according to known Gaussian distributions. In practice, these perception systems are slow, prone to errors, and may not generalize well to new objects. Despite over 30 years of research, in practice it is common to plan grasps using heuristics such as detecting cylinders in applications such as home decluttering (8) and the Amazon Picking Challenge (9).

The Dexterity Network (Dex-Net) 2.0

Rather than attempt to estimate 3D object shape and pose from images, Dex-Net 2.0 uses a probabilistic model to generate synthetic point clouds, grasps, and grasp robustness labels from datasets of 3D object meshes (10) using physics-based models of grasping, image rendering, and camera noise. The main insight behind the method is that robust parallel-jaw grasps of an object are strongly correlated with the shape of the object. These geometric affordances for grasping, such as handles and cylinders, are visible in partial point clouds and their correlation with grasping will evident in samples from the model. We hypothesize that Deep CNNs are able to learn these correlations using a hierarchical set of filters that recognize geometric primitives, similar to the Gabor-like filters learned by CNNs for image classification (11).

We formalize and study this approach in our paper, “Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics.” In the paper we detail the Dexterity Network (Dex-Net) 2.0, a dataset of 6.7 million robust grasps and point clouds with synthetic noise generated from our probabilistic model of grasping rigid objects on a tabletop with a parallel-jaw gripper. We develop a deep Grasp Quality Convolutional Neural Network (GQ-CNN) model and train it on Dex-Net 2.0 to estimate grasp robustness from a candidate grasp and point cloud. We use the GQ-CNN to plan grasps on a physical robot by sampling a set of grasp candidates from an input point cloud with edge detection and executing the most robust grasp estimated by the GQ-CNN:

When trained on Dex-Net 2.0, the GQ-CNN learns a set of low-level filters that appear to detect image gradients at various scales. Filters can be organized into two classes: coarse oriented gradient filters that may be useful for estimating collisions between the gripper and object and fine vertical filters that may be useful for estimating surface normals at the locations of contact between the fingers and object:

Experiments with the ABB YuMi

To evaluate GQ-CNN-based grasp planning on a physical robot, we ran over 1,000 trials of grasping on an ABB YuMi to investigate:

  1. Model Performance: Can a GQ-CNN trained entirely on synthetic data for a set of known objects be used to successfully grasp the objects on a physical robot?
  2. Generalization: Can the GQ-CNN be used to successfully grasp novel objects that were not seen in training?

Model Performance

We first measured the ability of our method to plan grasps that could maintain a grasp on the object while lifting the object, transporting it, and shaking it within the gripper. We used a set of eight 3D printed objects with known shape, center of mass, and frictional properties to highlight differences between our physical models and grasping on the physical robot. To explore failure modes, we chose objects with adversarial geometry for two-finger grippers such as smooth, curved surfaces and narrow openings.

We found that the Dex-Net 2.0 grasp planner could achieve up to 93% success on the physical robot and was 3x faster than a method that matched the exact object shape to the point cloud. The results suggest that our physics-based model is a useful proxy for grasp outcomes on a physical robot when object properties are known and that the GQ-CNN can be used to plan highly precise grasps. Here’s an example:

dexnet_gif

Generalization

We also evaluated the ability to generalize to previously unseen objects by testing on a set of 40 novel objects including objects with moving parts and deformation, such as a can opener and a washcloth. After analyzing the data further we found a surprising result: the GQ-CNN had just one false positive out of 69 grasps it predicted to succeed. This 99% precision score is important because it suggests that the robot could anticipate failures based on its confidence labels and perform recovery actions such as poking objects or asking a human for help.

Limitations

The results of grasp planning with Dex-Net 2.0 suggest that it is possible to achieve highly reliable grasping across a wide variety of objects by training neural networks on only synthetic data that is generated using physical models of grasping and image formation. However, there are several limitations of the current method:

  1. Sensor Capabilities. Some sources of noise on the physical depth camera, such as missing data, are not accounted for by the Dex-Net 2.0 model. Furthermore, depth cameras cannot see objects that are transparent or flat on a table.
  2. Model Limitations. The physical model of grasping used by Dex-Net 2.0 considers fingertip grasps of rigid objects. We do not account for grasping strategies such as pinching a flat piece of paper into the gripper or hooking an object with a finger.
  3. Single Objects. The method is designed to only grasp objects in isolation. We are currently working on extending the Dex-Net 2.0 model to grasping objects from a pile.
  4. Task-Independence. The method plans grasps that can be used to robustly lift and transport an object but does not consider use cases of an object such as exact placement, stacking, or connecting it to another object in assembly which may require more precise grasps. We are researching possible extensions with task-based grasp quality metrics, dynamic simulation, and learning from demonstration.

Dataset and Code Release

Over summer 2017, we are releasing a subset of our code, datasets, and the trained GQ-CNN weights which we hope will facilitate further research and comparisons.

Today we’re releasing the Dex-Net 2.0 Training Dataset and Code, which includes the Dex-Net 2.0 dataset with 6.7 million synthetic datapoints, pretrained GQ-CNN models from the paper, and the gqcnn Python package for replicating our experiments on classifying robust grasps on synthetic data with GQ-CNNs. We hope this will facilitate development of new GQ-CNN architectures and training methods that perform better on both synthetic datasets and datasets collected with our robot. You can access the release with these links: [datasets] [models] [code]

System Requirements

Please note that strong performance on this particular dataset may not be indicative of performance on other robots because the dataset is specific to: 1) The ABB YuMi gripper due to collision geometry. 2) A Primesense Carmine 1.08 sensor due to camera parameters used in rendering. 3) The set of poses of the camera relative to the table: 50-70 centimeters directly above a table looking straight down.

Nonetheless, the algorithms behind the dataset can be used to generate datasets for other two-finger grippers, cameras, and camera poses relative to the robot. We hypothesize that GQ-CNN-based grasp planning will perform best if the training datasets are generated using the gripper geometry, camera intrinsics, and camera location specific to the hardware setup.

ABB YuMi Benchmark

We plan to keep a leaderboard of performance on the Dex-Net 2.0 dataset to investigate improvements to the GQ-CNN architecture, since our best models achieve only 93% classification accuracy on synthetic data. Since datasets are specific to a hardware setup, we volunteer to benchmark performance on the physical robot for models that we deed signficantly outperform other methods on synthetic data. We invite researchers from any discipline or background to participate.

Python Package

To aid in training GQ-CNNs, we developed the gqcnn Python package. Using gqcnn, you can quickly get started training GQ-CNNs on datasets generated with Dex-Net 2.0. There are tutorials to replicate the results from our RSS paper, and we invite researchers to try to improve classification performance on synthetic datasets as well as datasets of grasps collected with our physical ABB YuMi robot.

We’re also working a ROS service for grasp planning with GQ-CNNs. The ROS package will enable users to see the results of grasp planning with GQ-CNNs on custom point clouds. We encourage interested parties to set up a Primesense Carmine 1.08 or Microsoft Kinect for Xbox 360 roughly 50-70 cm above a table and attempt grasps planned by a GQ-CNN-based grasp planner. While our dataset may not generalize to other hardware setups as noted above, we hope that with further research it may be possible to use GQ-CNNs for lifting and transporting objects with other robots. If you are interested in a research collaboration on such a project, please email Jeff Mahler (jmahler@berkeley.edu).

Future Releases

We are also aiming for the following releases and dates of additional data and functionality from Dex-Net over summer and fall 2017:

  • Dex-Net Object Mesh Dataset v1.1: The subset of 1,500 3D object models from Dex-Net 1.0 used in the RSS paper, labeled with Parallel-Jaw grasps for the ABB YuMi (14). July 12, 2017.
  • Dex-Net as a Service: HTTP web API to create new databases with custom 3D models and compute grasp robustness metrics. Fall 2017.

Contact

See the project website for updates and progress.

For more information please contact Jeff Mahler or Prof. Ken Goldberg of the Berkeley AUTOLAB.

Acknowledgments

This research was performed at the AUTOLAB at UC Berkeley in affiliation with the Berkeley AI Research (BAIR) Lab, the Real-Time Intelligent Secure Execution (RISE) Lab, and the CITRIS People and Robots (CPAR) Initiative. The authors were supported in part by the U.S. National Science Foundation under NRI Award IIS-1227536: Multilateral Manipulation by Human-Robot Collaborative Systems, the Department of Defense (DoD) through the National Defense Science & Engineering Graduate Fellowship (NDSEG) Program, the Berkeley Deep Drive (BDD) Program, and by donations from Siemens, Google, Cisco, Autodesk, IBM, Amazon Robotics, and Toyota Robotics Institute. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Sponsors.

References

(1): Mahler, Jeffrey, Jacky Liang, Sherdil Niyaz, Michael Laskey, Richard Doan, Xinyu Liu, Juan Aparicio Ojea, and Ken Goldberg. “Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics.” arXiv preprint arXiv:1703.09312 (2017). (Paper) (Website)

(2): Kappler, Daniel, Jeannette Bohg, and Stefan Schaal. “Leveraging Big Data for Grasp Planning.” In Robotics and Automation (ICRA), 2015 IEEE International Conference on, pp. 4304-4311. IEEE, 2015.

(3): Levine, Sergey, Peter Pastor, Alex Krizhevsky, and Deirdre Quillen. “Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection.” arXiv preprint arXiv:1603.02199 (2016).

(4): Johns, Edward, Stefan Leutenegger, and Andrew J. Davison. “Deep Learning a Grasp Function for Grasping under Gripper Pose Uncertainty.” In Intelligent Robots and Systems (IROS), 2016 IEEE/RSJ International Conference on, pp. 4461-4468. IEEE, 2016.

(5): Goldfeder, Corey, Matei Ciocarlie, Hao Dang, and Peter K. Allen. “The Columbia Grasp Database.” In Robotics and Automation, 2009. ICRA’09. IEEE International Conference on, pp. 1710-1716. IEEE, 2009.

(6): Prattichizzo, Domenico, and Jeffrey C. Trinkle. “Grasping.” In Springer Handbook of Robotics, pp. 955-988. Springer International Publishing, 2016.

(7): Weisz, Jonathan, and Peter K. Allen. “Pose Error Robust Grasping from Contact Wrench Space Metrics.” In Robotics and Automation (ICRA), 2012 IEEE International Conference on, pp. 557-562. IEEE, 2012.

(8): Ciocarlie, Matei, Kaijen Hsiao, Edward Gil Jones, Sachin Chitta, Radu Bogdan Rusu, and Ioan A. Åžucan. “Towards Reliable Grasping and Manipulation in Household Environments.” In Experimental Robotics, pp. 241-252. Springer Berlin Heidelberg, 2014.

(9): Hernandez, Carlos, Mukunda Bharatheesha, Wilson Ko, Hans Gaiser, Jethro Tan, Kanter van Deurzen, Maarten de Vries et al. “Team Delft’s Robot Winner of the Amazon Picking Challenge 2016.” arXiv preprint arXiv:1610.05514 (2016).

(10): Mahler, Jeffrey, Florian T. Pokorny, Brian Hou, Melrose Roderick, Michael Laskey, Mathieu Aubry, Kai Kohlhoff, Torsten Kröger, James Kuffner, and Ken Goldberg. “Dex-Net 1.0: A Cloud-Based Network of 3D Objects for Robust Grasp Planning using a Multi-Armed Bandit Model with Correlated Rewards.” In Robotics and Automation (ICRA), 2016 IEEE International Conference on, pp. 1957-1964. IEEE, 2016.