****************************** Details of the Tasks (Phase 3) ****************************** Difficulty Levels ================= The task in phase 1 is to move a cuboid from a random initial position on the ground to some goal position. There are four levels of increasing difficulty which all need to be solved: 1. The goal is randomly sampled somewhere on the table. Orientation is not considered. For this level it is not necessary to lift the cube so it can be solved by pushing. 2. The cube has to be lifted to a fixed goal position ``(0, 0, 0.0825)``. Orientation is not considered. 3. The goal is randomly sampled somewhere within the arena with an height of up to 10 cm. Orientation is not considered. 4. Like 3. but in addition to the position a goal orientation is sampled uniformly. .. note:: To keep the interface the same for all levels, the goal is always given as a pose consisting of position and orientation. However, the orientation is only considered by the reward function for level 4. For levels 1-3 it is always set to an identity quaternion ``(0, 0, 0, 1)`` and may simply be ignored. .. note:: The goal position is given with respect to the center of the cube. Therefore, it will have a non-zero z-value even when the goal is on the ground. Episode Length ============== The episode length is fixed to 120000 steps for all levels. This is also specified in :data:`trifinger_simulation.tasks.move_cube.episode_length` so you can access the value through this variable in you code. One step corresponds to ~1 ms, so a full episode takes 2 minutes. Reward Functions ================ The reward function differs for the different levels. For levels 1-3 only the position is considered while for level 4 also the orientation is taken into account. The reward functions of the different levels are described below. For the exact implementation see :func:`trifinger_simulation.tasks.move_cube.evaluate_state`. Reward for Levels 1-3 --------------------- For the position error, we use a weighted sum of the Euclidean distance on the x/y-plane and the absolute distance on the z-axis. Both components are scaled based on their expected range. Since the z-range is smaller, this means that the height has a higher weight. The sum is again rescaled so that the total error is in the interval ``[0, 1]``. :: Input: goal position and actual position err_xy = ||goal[xy] - actual[xy]|| err_z = |goal[z] - actual[z]| position_error = (err_xy / arena_diameter + err_z / height_range) / 2 reward = -position_error Reward for Level 4 ------------------ The position error is computed using the same function as above. Further the orientation error is computed as the angle between the long axes of the cuboid (y-axis in the object frame) in actual and goal pose. The orientation around the long axis is not considered to compensate for some inaccuracy of our tracking method. Both components are scaled with their expected range and then summed. The sum is again rescaled to be in the interval ``[0, 1]``. :: position_error same as for levels 1-3 y_goal = Rot_goal * (0, 1, 0) y_actual = Rot_actual * (0, 1, 0) orientation_error = angle between y_goal and y_actual total_error = (position_error + orientation_error / pi) / 2 reward = -total_error Cumulative Reward of an Episode ------------------------------- The cumulative reward, used to evaluate an episode, is computed as the sum of the rewards of each time step of the episode. Weighted Reward over all Levels ------------------------------- The overall score used to compare the submissions of all participants is computed as follows: For each level a number of goal pose are randomly sampled (from the appropriate distribution for that task, as described above) and for each of them the policy is executed for one episode. For each episode the cumulative reward is computed as described above, and then the median is taken across episodes, which yields an average reward for the given level. The overall score is then computed as a weighted sum of these level-wise rewards, where each level is weighted by its number (level 1 has weight 1, level 2 has weight 2, etc.). API of Task-Related Functions ============================= The ``move_cube`` module implements functions to sample and validate goals and compute the reward of a single state for each of the four levels. It is still called "cube" there for compatibility, despite the object being a cuboid now. These functions are used in our evaluation scripts but you may also use them in your code, e.g. to sample more goals or to compute the reward of a specific pair of achieved and goal state. .. autodata:: trifinger_simulation.tasks.move_cube.episode_length .. autoclass:: trifinger_simulation.tasks.move_cube.Pose .. autofunction:: trifinger_simulation.tasks.move_cube.sample_goal .. autofunction:: trifinger_simulation.tasks.move_cube.validate_goal .. autofunction:: trifinger_simulation.tasks.move_cube.evaluate_state