******************************
Details of the Tasks (Phase 3)
******************************


Difficulty Levels
=================

The task in phase 1 is to move a cuboid from a random initial position on the
ground to some goal position.  There are four levels of increasing difficulty
which all need to be solved:

1. The goal is randomly sampled somewhere on the table.  Orientation is not
   considered.  For this level it is not necessary to lift the cube so it can be
   solved by pushing.
2. The cube has to be lifted to a fixed goal position ``(0, 0, 0.0825)``.
   Orientation is not considered.
3. The goal is randomly sampled somewhere within the arena with an height of up
   to 10 cm.  Orientation is not considered.
4. Like 3. but in addition to the position a goal orientation is sampled
   uniformly.


.. note::

    To keep the interface the same for all levels, the goal is always given as a
    pose consisting of position and orientation.  However, the orientation is
    only considered by the reward function for level 4.  For levels 1-3 it is
    always set to an identity quaternion ``(0, 0, 0, 1)`` and may simply be
    ignored.

.. note::

    The goal position is given with respect to the center of the cube.
    Therefore, it will have a non-zero z-value even when the goal is on the
    ground.


Episode Length
==============

The episode length is fixed to 120000 steps for all levels.  This is also
specified in :data:`trifinger_simulation.tasks.move_cube.episode_length` so you
can access the value through this variable in you code.

One step corresponds to ~1 ms, so a full episode takes 2 minutes.


Reward Functions
================

The reward function differs for the different levels.  For levels 1-3 only the
position is considered while for level 4 also the orientation is taken into
account.

The reward functions of the different levels are described below.  For the exact
implementation see :func:`trifinger_simulation.tasks.move_cube.evaluate_state`.


Reward for Levels 1-3
---------------------

For the position error, we use a weighted sum of the Euclidean distance on the
x/y-plane and the absolute distance on the z-axis.  Both components are scaled
based on their expected range.  Since the z-range is smaller, this means that
the height has a higher weight.  The sum is again rescaled so that the total
error is in the interval ``[0, 1]``.

::

    Input: goal position and actual position

    err_xy = ||goal[xy] - actual[xy]||
    err_z  = |goal[z] - actual[z]|

    position_error = (err_xy / arena_diameter + err_z / height_range) / 2

    reward = -position_error


Reward for Level 4
------------------

The position error is computed using the same function as above.  Further the
orientation error is computed as the angle between the long axes of the cuboid
(y-axis in the object frame) in actual and goal pose.  The orientation around
the long axis is not considered to compensate for some inaccuracy of our
tracking method.

Both components are scaled with their expected range and then summed.  The sum
is again rescaled to be in the interval ``[0, 1]``.

::

    position_error same as for levels 1-3

    y_goal = Rot_goal * (0, 1, 0)
    y_actual = Rot_actual * (0, 1, 0)
    orientation_error = angle between y_goal and y_actual

    total_error = (position_error + orientation_error / pi) / 2

    reward = -total_error


Cumulative Reward of an Episode
-------------------------------

The cumulative reward, used to evaluate an episode, is computed as the sum of
the rewards of each time step of the episode.


Weighted Reward over all Levels
-------------------------------

The overall score used to compare the submissions of all participants is
computed as follows:

For each level a number of goal pose are randomly sampled (from the appropriate
distribution for that task, as described above) and for each of them the policy
is executed for one episode. For each episode the cumulative reward is computed
as described above, and then the median is taken across episodes, which yields
an average reward for the given level. The overall score is then computed as a
weighted sum of these level-wise rewards, where each level is weighted by its
number (level 1 has weight 1, level 2 has weight 2, etc.).


API of Task-Related Functions
=============================

The ``move_cube`` module implements functions to sample and validate goals and
compute the reward of a single state for each of the four levels.  It is still
called "cube" there for compatibility, despite the object being a cuboid now.

These functions are used in our evaluation scripts but you may also use them in
your code, e.g. to sample more goals or to compute the reward of a specific pair
of achieved and goal state.


.. autodata:: trifinger_simulation.tasks.move_cube.episode_length

.. autoclass:: trifinger_simulation.tasks.move_cube.Pose

.. autofunction:: trifinger_simulation.tasks.move_cube.sample_goal

.. autofunction:: trifinger_simulation.tasks.move_cube.validate_goal

.. autofunction:: trifinger_simulation.tasks.move_cube.evaluate_state