Attention

**This challenge has ended!**

This documentation is only for the Real Robot Challenge 2020 which has ended. Following challenges have their own documentation, see the challenge website for more information.

# Details of the Tasks¶

## Difficulty Levels¶

The task in phase 1 is to move a cube from a random initial position on the ground to some goal position. There are four levels of increasing difficulty which all need to be solved:

The goal is randomly sampled somewhere on the table. Orientation is not considered. For this level it is not necessary to lift the cube so it can be solved by pushing.

The cube has to be lifted to a fixed goal position

`(0, 0, 0.0825)`

. Orientation is not considered.The goal is randomly sampled somewhere within the arena with an height of up to 10 cm. Orientation is not considered.

Like 3. but in addition to the position a goal orientation is sampled uniformly.

Note

To keep the interface the same for all levels, the goal is always given as a
pose consisting of position and orientation. However, the orientation is
only considered by the reward function for level 4. For levels 1-3 it is
always set to an identity quaternion `(0, 0, 0, 1)`

and may simply be
ignored.

Note

The goal position is given with respect to the center of the cube. Therefore, it will have a non-zero z-value even when the goal is on the ground.

## Episode Length¶

The episode length is fixed to 3750 steps for all levels. This is also
specified in `rrc_simulation.tasks.move_cube.episode_length`

so you can
access the value through this variable in you code.

One step corresponds to 0.004 seconds in the simulation, so a full episode corresponds to 15 seconds.

## Reward Functions¶

The reward function differs for the different levels. For levels 1-3 only the position is considered while for level 4 also the orientation is taken into account.

The reward functions of the different levels are described below. For the exact
implementation see `rrc_simulation.tasks.move_cube.evaluate_state()`

.

### Reward for Levels 1-3¶

For the position error, we use a weighted sum of the Euclidean distance on the
x/y-plane and the absolute distance on the z-axis. Both components are scaled
based on their expected range. Since the z-range is smaller, this means that
the height has a higher weight. The sum is again rescaled so that the total
error is in the interval `[0, 1]`

.

```
Input: goal position and actual position
err_xy = ||goal[xy] - actual[xy]||
err_z = |goal[z] - actual[z]|
position_error = (err_xy / arena_diameter + err_z / height_range) / 2
reward = -position_error
```

### Reward for Level 4¶

The position error is computed using the same function as above. Further the orientation error is computed as the magnitude of the rotation from the actual orientation to the goal orientation.

Both components are scaled with their expected range and then summed. The sum
is again rescaled to be in the interval `[0, 1]`

.

```
position_error same as for levels 1-3
err_rot = rotation from actual to goal orientation (as quaternion)
orientation_error = 2 * atan2(||err_rot[x,y,z]||, |err_rot[w]|)
total_error = (position_error + orientation_error / pi) / 2
reward = -total_error
```

Note

In phase 3 of the challenge the reward function for level 4 is a bit different, see Phase 3: Moving a long cuboid.

### Accumulated Reward of an Episode¶

The accumulated reward, used to evaluate an episode, is computed as the sum of the rewards of each time step of the episode.

### Weighted Reward over all Levels¶

The overall score used to compare the submissions of all participants is computed as follows:

For each level 10 random pairs of initial and goal pose are generated (from the appropriate distribution for that task, as described above) and for each of them the policy is executed for one episode. For each episode the accumulated reward is computed as described above, and then the mean is taken across episodes, which yields an average reward for the given level. The overall score is then computed as a weighted sum of these level-wise rewards, where each level is weighted by its number (level 1 has weight 1, level 2 has weight 2, etc.).