Real Robot Challenge 2020 Dataset

Manuel Wüthrich*1, Felix Widmaier*1, Stefan Bauer*2, Niklas Funk†3, Julen Urain De Jesus†3, Jan Peters†3, Joe Watson†3, Claire Chen†4, Krishnan Srinivasan†4, Junwu Zhang†4, Jeffrey Zhang†4, Matthew R. Walter†5, Rishabh Madan†9, Charles Schaff†5, Takahiro Maeda†5, Takuma Yoneda†5, Denis Yarats†6, Arthur Allshire†7, Ethan K. Gordon †8, Tapomayukh Bhattacharjee†9, Siddhartha S. Srinivasa†8, Animesh Garg†7, Annika Buchholz1, Sebastian Stark1, Thomas Steinbrenner1, Joel Akpo1, Shruti Joshi1, Vaibhav Agrawal1, Bernhard Schölkopf1

* Equal contribution    Challenge participant   1 Max-Planck-Institute for Intelligent Systems   2 KTH Stockholm   3 TU Darmstadt   4 Stanford University   5 TTI Chicago   6 New York University   7 University of Toronto   8 University of Washington   9 Cornell University  


The RRC 2020 Dataset contains the recorded data of the Real Robot Challenge 2020.

The dataset consists of the individual runs that were executed by the challenge participants as well as the runs from the weekly evaluations. For each run, the actions sent to the robot as well as all observations provided by robot and cameras are included, as well as additional information like the goal that was pursued and the reward that was achieved.

The challenge was split into three phases:

The dataset contains 2856 runs of phase 2 and 7422 runs of phase 3. The single runs can be downloaded individually, so you don't need to download the full dataset if you are only interested in a specific subset. The compressed file size of one run is around 250 MB on average.

To filter the runs based on various parameters, we provide a SQLite database listing all runs with some meta information and metrics, as well as a Python script to easily run basic queries on this database. Since it is a standard SQLite database, you can also open it with other tools if you want to perform more complex queries.

See also our paper "A Robot Cluster for Reproducible Research in Dexterous Manipulation" about the challenge and the dataset.

Downloads

The recorded data of the individual runs can be downloaded via the following URL patterns:

Requirements of rrc_dataset_query.py:

You can easily generate scripts to download specific subsets of the dataset. E.g. if you are only interested in runs using the cube (phase 2) where the cube was lifted at least 5 cm high:

$ ./rrc_dataset_query.py query rrc2020_dataset_index.db \
    --format "wget -N {url_zarr}" \
    -w challenge_phase = 2 -w max_height ">" 0.05 > download_script.sh

Then execute the generated script to actually download the data:

$ bash ./download_script.sh

Singularity Images

A Singularity image with our software installed (i.e. everything you need to read the log files) can be downloaded here.

Note: In this image, the object is assumed to be a cube. While it is possible to also read/view camera logs of phase 3 with this version of the software, visualising the object pose will use the model of the cube and thus not match properly with the actual cuboid used in this phase.

Old Images of RRC 2020

For legacy support, the images that were used during the RRC 2020 are also still available:

Database Fields

Note that the cumulative_reward is dependent on the distance of the goal to the initial pose of the object, so it is not an ideal metric to compare runs with different goals. We therefore provide the other metrics as well, to give a better understanding of what happened in the runs.

(*) For the metrics min_distance_to_goal, max_height and furthest_from_start there exist additional fields which are suffixed with _10 and _30. While the original field contains the max./min. value throughout the whole run, these fields contain the 10-th/30-th largest/smallest value throughout all observations. They serve as a simple filter, rejecting short-lived peak values. The numbers refer to camera observations which are provided at 10 Hz, so, for example, max_height_10 indicates that the object has been at least that high for a total duration of around one second (possibly with interruptions).

Magic Fields

There are two "magic fields" supported by the query script: url_orig and url_zarr. They do not actually exist inside the database but the query script recognises them and replaces them with the download URL to the original or zarr file of the corresponding run (see the example above).

Data Formats

The data is available in two different formats:

Robot Log Files

The logs of each run are provided as a gzip-compressed tarball which contains the following files:

The robot and camera data files are in a custom binary format. See our software documentation on how to read them.

Zarr Zip Files

The Zarr storages contain the same data as the tarballs mentioned above but in a format that can be easily read in Python, with Zarr being the only dependency.

The data of each run is provided as a zip file that can directly be read by Zarr:

import zarr
data = zarr.open_group("12345.zip", mode="r")

Meta data like robot name, goal and the metrics are stored as attributes:

print("Timestamp:", data.attrs["timestamps"])
print("Robot:", data.attrs["robot_name"])
print("Goal:", data.attrs["goal"])
print("Metrics:", data.attrs["metrics"])

Camera calibration parameters are stored in the arrays

The first axis of each array is for the three cameras "camera60", "camera180" and "camera300" in this order.

camera_matrix_180 = data.camera_matrices[1]

Robot data:

The arrays time_index and timestamp contain the time indices and -stamps of all robot steps.

Robot observations, desired actions, applied actions and status messages are organised in sub-groups with arrays for the different fields. The arrays are all aligned, containing one entry per time step. Example:

i = 42
print("Applied torque at t = {}:  {}".format(
    data.time_index[i],
    data.applied_action.torque[i],
))

Camera data:

Important

The camera runs at a lower frequency than the robot, so the arrays of robot and camera observations are not aligned! Instead, use the additional array map_robot_to_camera_index to map from the index of a robot-related array to the index of a camera-related array. Example:

i_rob = 42
i_cam = data.map_robot_to_camera_index[i_rob]
print("Object position at robot time index t = {}:  {}".format(
    data.time_index[i_rob],
    data.object_pose.position[i_cam],
))

The array image_timestamps contains the time stamps of the camera observations.

The array images contain the images from the cameras. The images of the three cameras are merged on the second axis. They are provided in raw format to save space, so they need to be debayered first (e.g. with OpenCV):

import cv2

i_cam = 23
raw_image_cam300 = data.images[i_cam][2]

bgr_image = cv2.cvtColor(raw_image_cam300, cv2.COLOR_BAYER_BG2BGR)

The raw and filtered object poses are provided in subgroups object_pose/filtered_object_pose which contain arrays position, orientation (as quaternion (x, y, z, w)) and confidence. These arrays are aligned with the image arrays.

Acknowledgements

The challenge would not have been possible without the help of numerous people who worked on building the robots, implementing the software, setting up the infrastructure, administrative stuff, etc. For a list of the people involved, see the "Organizers" tab on https://real-robot-challenge.com/2020.

And of course the challenge would have been meaningless without its participants who put a lot of work into developing policies and sending jobs to the robots. See the "Results" tab on https://real-robot-challenge.com/2020 for the final leaderboards as well as links to the source code and reports of the winning teams.

License

The dataset is provided under the Creative Commons BY-NC-SA 4.0 license

Citation

If you are using the dataset in your academic work, please consider citing the corresponding paper:

@misc{bauer2021robot,
      title={A Robot Cluster for Reproducible Research in Dexterous Manipulation}, 
      author={Stefan Bauer and Felix Widmaier and Manuel Wüthrich and Niklas Funk and Julen Urain De Jesus and Jan Peters and Joe Watson and Claire Chen and Krishnan Srinivasan and Junwu Zhang and Jeffrey Zhang and Matthew R. Walter and Rishabh Madan and Charles Schaff and Takahiro Maeda and Takuma Yoneda and Denis Yarats and Arthur Allshire and Ethan K. Gordon and Tapomayukh Bhattacharjee and Siddhartha S. Srinivasa and Animesh Garg and Annika Buchholz and Sebastian Stark and Thomas Steinbrenner and Joel Akpo and Shruti Joshi and Vaibhav Agrawal and Bernhard Schölkopf},
      year={2021},
      eprint={2109.10957},
      archivePrefix={arXiv},
      primaryClass={cs.RO}
}

References

  1. M. Wüthrich et al. „A Robot Cluster for Reproducible Research in Dexterous Manipulation“. PDF on arXiv.
  2. M. Wüthrich, F. Widmaier, F. Grimminger, J. Akpo, S. Joshi, V. Agrawal, B. Hammoud, M. Khadiv, M. Bogdanovic, V. Berenz, J. Viereck, M. Naveau, L. Righetti, B. Schölkopf and S. Bauer. „TriFinger: An Open-Source Robot for Learning Dexterity“. In: Conference on Robot Learning (CoRL) (2020). PDF on arXive.
  3. N. Funk, C. Schaff, R. Madan, T. Yoneda, J. De Jesus, J. Watson, E. Gordon, F. Widmaier, S. Bauer, S. Srinivasa, T. Bhattacharjee, M. Walter und J. Peters. „Benchmarking Structured Policies and Policy Optimization for Real-World Dexterous Object Manipulation“. PDF on arXiv.