********************
Details of the Tasks
********************


Difficulty Levels
=================

The task in phase 1 is to move a cube from a random initial position on the
ground to some goal position.  There are four levels of increasing difficulty
which all need to be solved:

1. The goal is randomly sampled somewhere on the table.  Orientation is not
   considered.  For this level it is not necessary to lift the cube so it can be
   solved by pushing.
2. The cube has to be lifted to a fixed goal position ``(0, 0, 0.0825)``.
   Orientation is not considered.
3. The goal is randomly sampled somewhere within the arena with an height of up
   to 10 cm.  Orientation is not considered.
4. Like 3. but in addition to the position a goal orientation is sampled
   uniformly.


.. note::

    To keep the interface the same for all levels, the goal is always given as a
    pose consisting of position and orientation.  However, the orientation is
    only considered by the reward function for level 4.  For levels 1-3 it is
    always set to an identity quaternion ``(0, 0, 0, 1)`` and may simply be
    ignored.

.. note::

    The goal position is given with respect to the center of the cube.
    Therefore, it will have a non-zero z-value even when the goal is on the
    ground.


Episode Length
==============

The episode length is fixed to 3750 steps for all levels.  This is also
specified in :data:`rrc_simulation.tasks.move_cube.episode_length` so you can
access the value through this variable in you code.

One step corresponds to 0.004 seconds in the simulation, so a full episode
corresponds to 15 seconds.


.. _reward_functions:

Reward Functions
================

The reward function differs for the different levels.  For levels 1-3 only the
position is considered while for level 4 also the orientation is taken into
account.

The reward functions of the different levels are described below.  For the exact
implementation see :func:`rrc_simulation.tasks.move_cube.evaluate_state`.


.. _reward_functions_1_to_3:

Reward for Levels 1-3
---------------------

For the position error, we use a weighted sum of the Euclidean distance on the
x/y-plane and the absolute distance on the z-axis.  Both components are scaled
based on their expected range.  Since the z-range is smaller, this means that
the height has a higher weight.  The sum is again rescaled so that the total
error is in the interval ``[0, 1]``.

::

    Input: goal position and actual position

    err_xy = ||goal[xy] - actual[xy]||
    err_z  = |goal[z] - actual[z]|

    position_error = (err_xy / arena_diameter + err_z / height_range) / 2

    reward = -position_error


Reward for Level 4
------------------

The position error is computed using the same function as above.  Further the
orientation error is computed as the magnitude of the rotation from the actual
orientation to the goal orientation.

Both components are scaled with their expected range and then summed.  The sum
is again rescaled to be in the interval ``[0, 1]``.

::

    position_error same as for levels 1-3

    err_rot = rotation from actual to goal orientation (as quaternion)
    orientation_error = 2 * atan2(||err_rot[x,y,z]||, |err_rot[w]|)

    total_error = (position_error + orientation_error / pi) / 2

    reward = -total_error


.. note::

    In phase 3 of the challenge the reward function for level 4 is a bit
    different, see :doc:`../robot_phase/phase3`.


Accumulated Reward of an Episode
--------------------------------

The accumulated reward, used to evaluate an episode, is computed as the sum of
the rewards of each time step of the episode.


Weighted Reward over all Levels
-------------------------------

The overall score used to compare the submissions of all participants is
computed as follows:

For each level 10 random pairs of initial and goal pose are generated (from the
appropriate distribution for that task, as described above) and for each of them
the policy is executed for one episode. For each episode the accumulated reward
is computed as described above, and then the mean is taken across episodes,
which yields an average reward for the given level. The overall score is then
computed as a weighted sum of these level-wise rewards, where each level is
weighted by its number (level 1 has weight 1, level 2 has weight 2, etc.).


API of Task-Related Functions
=============================

See :doc:`api_move_cube`.