This challenge has ended!
This documentation is only for the Real Robot Challenge 2020 which has ended. Following challenges have their own documentation, see the challenge website for more information.
Details of the Tasks (Phase 3)¶
The task in phase 1 is to move a cuboid from a random initial position on the ground to some goal position. There are four levels of increasing difficulty which all need to be solved:
The goal is randomly sampled somewhere on the table. Orientation is not considered. For this level it is not necessary to lift the cube so it can be solved by pushing.
The cube has to be lifted to a fixed goal position
(0, 0, 0.0825). Orientation is not considered.
The goal is randomly sampled somewhere within the arena with an height of up to 10 cm. Orientation is not considered.
Like 3. but in addition to the position a goal orientation is sampled uniformly.
To keep the interface the same for all levels, the goal is always given as a
pose consisting of position and orientation. However, the orientation is
only considered by the reward function for level 4. For levels 1-3 it is
always set to an identity quaternion
(0, 0, 0, 1) and may simply be
The goal position is given with respect to the center of the cube. Therefore, it will have a non-zero z-value even when the goal is on the ground.
The episode length is fixed to 120000 steps for all levels. This is also
trifinger_simulation.tasks.move_cube.episode_length so you
can access the value through this variable in you code.
One step corresponds to ~1 ms, so a full episode takes 2 minutes.
The reward function differs for the different levels. For levels 1-3 only the position is considered while for level 4 also the orientation is taken into account.
The reward functions of the different levels are described below. For the exact
Reward for Levels 1-3¶
For the position error, we use a weighted sum of the Euclidean distance on the
x/y-plane and the absolute distance on the z-axis. Both components are scaled
based on their expected range. Since the z-range is smaller, this means that
the height has a higher weight. The sum is again rescaled so that the total
error is in the interval
Input: goal position and actual position err_xy = ||goal[xy] - actual[xy]|| err_z = |goal[z] - actual[z]| position_error = (err_xy / arena_diameter + err_z / height_range) / 2 reward = -position_error
Reward for Level 4¶
The position error is computed using the same function as above. Further the orientation error is computed as the angle between the long axes of the cuboid (y-axis in the object frame) in actual and goal pose. The orientation around the long axis is not considered to compensate for some inaccuracy of our tracking method.
Both components are scaled with their expected range and then summed. The sum
is again rescaled to be in the interval
position_error same as for levels 1-3 y_goal = Rot_goal * (0, 1, 0) y_actual = Rot_actual * (0, 1, 0) orientation_error = angle between y_goal and y_actual total_error = (position_error + orientation_error / pi) / 2 reward = -total_error
Cumulative Reward of an Episode¶
The cumulative reward, used to evaluate an episode, is computed as the sum of the rewards of each time step of the episode.
Weighted Reward over all Levels¶
The overall score used to compare the submissions of all participants is computed as follows:
For each level a number of goal pose are randomly sampled (from the appropriate distribution for that task, as described above) and for each of them the policy is executed for one episode. For each episode the cumulative reward is computed as described above, and then the median is taken across episodes, which yields an average reward for the given level. The overall score is then computed as a weighted sum of these level-wise rewards, where each level is weighted by its number (level 1 has weight 1, level 2 has weight 2, etc.).