Real Robot Challenge 2020 Dataset

Manuel Wüthrich^*1, Felix Widmaier^*1, Stefan Bauer^*2, Niklas Funk^†3, Julen Urain De Jesus^†3, Jan Peters^†3, Joe Watson^†3, Claire Chen^†4, Krishnan Srinivasan^†4, Junwu Zhang^†4, Jeffrey Zhang^†4, Matthew R. Walter^†5, Rishabh Madan^†9, Charles Schaff^†5, Takahiro Maeda^†5, Takuma Yoneda^†5, Denis Yarats^†6, Arthur Allshire^†7, Ethan K. Gordon ^†8, Tapomayukh Bhattacharjee^†9, Siddhartha S. Srinivasa^†8, Animesh Garg^†7, Annika Buchholz¹, Sebastian Stark¹, Thomas Steinbrenner¹, Joel Akpo¹, Shruti Joshi¹, Vaibhav Agrawal¹, Bernhard Schölkopf¹

^* Equal contribution ^† Challenge participant ¹ Max-Planck-Institute for Intelligent Systems ² KTH Stockholm ³ TU Darmstadt ⁴ Stanford University ⁵ TTI Chicago ⁶ New York University ⁷ University of Toronto ⁸ University of Washington ⁹ Cornell University

The RRC 2020 Dataset contains the recorded data of the Real Robot Challenge 2020.

The dataset consists of the individual runs that were executed by the challenge participants as well as the runs from the weekly evaluations. For each run, the actions sent to the robot as well as all observations provided by robot and cameras are included, as well as additional information like the goal that was pursued and the reward that was achieved.

The challenge was split into three phases:

Phase 1: Only in simulation, thus not included in this dataset.
Phase 2: Move a cube of width 6.5 cm to a given goal.
Phase 3: Move a cuboid (2x2x8 cm) to a given goal.

The dataset contains 2856 runs of phase 2 and 7422 runs of phase 3. The single runs can be downloaded individually, so you don't need to download the full dataset if you are only interested in a specific subset. The compressed file size of one run is around 250 MB on average.

To filter the runs based on various parameters, we provide a SQLite database listing all runs with some meta information and metrics, as well as a Python script to easily run basic queries on this database. Since it is a standard SQLite database, you can also open it with other tools if you want to perform more complex queries.

See also our paper "A Robot Cluster for Reproducible Research in Dexterous Manipulation" about the challenge and the dataset.

Downloads

rrc2020_dataset_index.db: SQLite database listing all the runs with some meta information (see Database Fields).
rrc_dataset_query.py: Python script to run simple queries against the database. Can be used to filter the dataset for interesting runs. Use --help to get a list of all options.

The recorded data of the individual runs can be downloaded via the following URL patterns:

Original log files: https://download.is.tue.mpg.de/rrc2020/orig/{job_id}.tar.gz
Zarr: https://download.is.tue.mpg.de/rrc2020/zarr/{job_id}.zip

Requirements of rrc_dataset_query.py:

Python 3
dataset (pip install dataset)
matplotlib (only needed when using the plot function)

You can easily generate scripts to download specific subsets of the dataset. E.g. if you are only interested in runs using the cube (phase 2) where the cube was lifted at least 5 cm high:

$ ./rrc_dataset_query.py query rrc2020_dataset_index.db \
    --format "wget -N {url_zarr}" \
    -w challenge_phase = 2 -w max_height ">" 0.05 > download_script.sh

Then execute the generated script to actually download the data:

$ bash ./download_script.sh

Singularity Images

A Singularity image with our software installed (i.e. everything you need to read the log files) can be downloaded here.

Note: In this image, the object is assumed to be a cube. While it is possible to also read/view camera logs of phase 3 with this version of the software, visualising the object pose will use the model of the cube and thus not match properly with the actual cuboid used in this phase.

Old Images of RRC 2020

For legacy support, the images that were used during the RRC 2020 are also still available:

Database Fields

job_id: ID of the run. This is needed for downloading the logs of this run.
start_time: Time at which the run was executed
challenge_phase: Phase of the challenge to which this run belongs.
robot_name: Name of the robot.
difficulty_level: Goal difficulty level.
cumulative_reward: Cumulative reward of this run. This is the metric that was used for the ranking in the challenge.
baseline_reward: Theoretical reward if the object had stayed at the initial pose throughtout the whole run. Compare this to the achieved cumulative_reward to get an idea on how well the robot performed in this run.
initial_distance_to_goal: The object's distance to the goal position at the start of the run.
min_distance_to_goal (*): Smallest distance to the goal position that was achieved in any step of the run.
max_height (*): Maximal height above the table the object reached throughout the run.
furthest_from_start (*): Furthest distance to the initial position that the object reached throughout the run.

Note that the cumulative_reward is dependent on the distance of the goal to the initial pose of the object, so it is not an ideal metric to compare runs with different goals. We therefore provide the other metrics as well, to give a better understanding of what happened in the runs.

(*) For the metrics min_distance_to_goal, max_height and furthest_from_start there exist additional fields which are suffixed with _10 and _30. While the original field contains the max./min. value throughout the whole run, these fields contain the 10-th/30-th largest/smallest value throughout all observations. They serve as a simple filter, rejecting short-lived peak values. The numbers refer to camera observations which are provided at 10 Hz, so, for example, max_height_10 indicates that the object has been at least that high for a total duration of around one second (possibly with interruptions).

Magic Fields

There are two "magic fields" supported by the query script: url_orig and url_zarr. They do not actually exist inside the database but the query script recognises them and replaces them with the download URL to the original or zarr file of the corresponding run (see the example above).

Data Formats

The data is available in two different formats:

Robot Log Files: The original log files that were generated when running the robot. Additional software is needed for reading the files.
Zarr Zip Files: The data of the original files converted into a Zarr storage. No further dependencies apart from Zarr are needed for reading.

Robot Log Files

The logs of each run are provided as a gzip-compressed tarball which contains the following files:

camera{60,180,360}.yml: Camera calibration parameters
camera_data.dat: Recorded camera observations (includes the object pose)
goal.json: The goal that was pursued in this run
info.json: Contains timestamp and robot name
metrics.json: Several metrics for the run (these values are also included in the SQLite database)
object_trajectory.p: Python pickle file containing the object trajectory
robot_data.dat/robot_data.dat.old: Recorded robot observations and actions. The *.old file is in the original format used during the RRC 2020, the other one is converted to be readable with the latest state of the software (state June 2021). Use the "old" one when using the software state from 2020, otherwise use the new one. The content is identical.

The robot and camera data files are in a custom binary format. See our software documentation on how to read them.

Zarr Zip Files

The Zarr storages contain the same data as the tarballs mentioned above but in a format that can be easily read in Python, with Zarr being the only dependency.

The data of each run is provided as a zip file that can directly be read by Zarr:

import zarr
data = zarr.open_group("12345.zip", mode="r")

Meta data like robot name, goal and the metrics are stored as attributes:

print("Timestamp:", data.attrs["timestamps"])
print("Robot:", data.attrs["robot_name"])
print("Goal:", data.attrs["goal"])
print("Metrics:", data.attrs["metrics"])

Camera calibration parameters are stored in the arrays

camera_matrices
distortion_coefficients
tf_world_to_cameras

The first axis of each array is for the three cameras "camera60", "camera180" and "camera300" in this order.

camera_matrix_180 = data.camera_matrices[1]

Robot data:

The arrays time_index and timestamp contain the time indices and -stamps of all robot steps.

Robot observations, desired actions, applied actions and status messages are organised in sub-groups with arrays for the different fields. The arrays are all aligned, containing one entry per time step. Example:

i = 42
print("Applied torque at t = {}:  {}".format(
    data.time_index[i],
    data.applied_action.torque[i],
))

Camera data:

Important

The camera runs at a lower frequency than the robot, so the arrays of robot and camera observations are not aligned! Instead, use the additional array map_robot_to_camera_index to map from the index of a robot-related array to the index of a camera-related array. Example:

i_rob = 42
i_cam = data.map_robot_to_camera_index[i_rob]
print("Object position at robot time index t = {}:  {}".format(
    data.time_index[i_rob],
    data.object_pose.position[i_cam],
))

The array image_timestamps contains the time stamps of the camera observations.

The array images contain the images from the cameras. The images of the three cameras are merged on the second axis. They are provided in raw format to save space, so they need to be debayered first (e.g. with OpenCV):

import cv2

i_cam = 23
raw_image_cam300 = data.images[i_cam][2]

bgr_image = cv2.cvtColor(raw_image_cam300, cv2.COLOR_BAYER_BG2BGR)

The raw and filtered object poses are provided in subgroups object_pose/filtered_object_pose which contain arrays position, orientation (as quaternion (x, y, z, w)) and confidence. These arrays are aligned with the image arrays.

Acknowledgements

The challenge would not have been possible without the help of numerous people who worked on building the robots, implementing the software, setting up the infrastructure, administrative stuff, etc. For a list of the people involved, see the "Organizers" tab on https://real-robot-challenge.com/2020.

And of course the challenge would have been meaningless without its participants who put a lot of work into developing policies and sending jobs to the robots. See the "Results" tab on https://real-robot-challenge.com/2020 for the final leaderboards as well as links to the source code and reports of the winning teams.

License

The dataset is provided under the Creative Commons BY-NC-SA 4.0 license

Citation

If you are using the dataset in your academic work, please consider citing the corresponding paper:

@misc{bauer2021robot,
      title={A Robot Cluster for Reproducible Research in Dexterous Manipulation}, 
      author={Stefan Bauer and Felix Widmaier and Manuel Wüthrich and Niklas Funk and Julen Urain De Jesus and Jan Peters and Joe Watson and Claire Chen and Krishnan Srinivasan and Junwu Zhang and Jeffrey Zhang and Matthew R. Walter and Rishabh Madan and Charles Schaff and Takahiro Maeda and Takuma Yoneda and Denis Yarats and Arthur Allshire and Ethan K. Gordon and Tapomayukh Bhattacharjee and Siddhartha S. Srinivasa and Animesh Garg and Annika Buchholz and Sebastian Stark and Thomas Steinbrenner and Joel Akpo and Shruti Joshi and Vaibhav Agrawal and Bernhard Schölkopf},
      year={2021},
      eprint={2109.10957},
      archivePrefix={arXiv},
      primaryClass={cs.RO}
}

References

M. Wüthrich et al. „A Robot Cluster for Reproducible Research in Dexterous Manipulation“. PDF on arXiv.
M. Wüthrich, F. Widmaier, F. Grimminger, J. Akpo, S. Joshi, V. Agrawal, B. Hammoud, M. Khadiv, M. Bogdanovic, V. Berenz, J. Viereck, M. Naveau, L. Righetti, B. Schölkopf and S. Bauer. „TriFinger: An Open-Source Robot for Learning Dexterity“. In: Conference on Robot Learning (CoRL) (2020). PDF on arXive.
N. Funk, C. Schaff, R. Madan, T. Yoneda, J. De Jesus, J. Watson, E. Gordon, F. Widmaier, S. Bauer, S. Srinivasa, T. Bhattacharjee, M. Walter und J. Peters. „Benchmarking Structured Policies and Policy Optimization for Real-World Dexterous Object Manipulation“. PDF on arXiv.