PS was shown to perform well in advanced robotics applications, in the problem of learning complex haptic manipulation skills. PS autonomously learns to select the best sensing action and the best preparatory skill given a specific perceptual state in a so-called playing phase. Watch how PS learns to generalize complex skills shown by kinesthetic teaching.

You can find more details in the following publication and on the website of Intelligent and Interactive Systems.
S. Hangl, E. Ugur; S. Szedmak, J. Piater, Robotic playing for hierarchical complex skill learning. In Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., pp. 2799-2804, 2016.


Grid world

In this classic benchmark task, the agent must navigate a maze to reach a goal. This is challenging because the reward is delayed — that is, the agent must make a long sequence of correct choices before reaching the goal. Watch how quickly a PS agent finds its way after being trained:


Mountain car

This is another popular reinforcement learning problem. The agent steers a car, which can accelerate to the left or to the right, and must take it up the mountain. However, because the car’s engine is not strong enough, this requires some back-and-forth to build momentum — again, not an easy solution for the agent to figure out. Here is what a PS agent learned to do:


Collective motion

PS agents are also used to study collective motion. In this task, agents were rewarded for staying close to their neighbours, in a cohesive swarm, and had to learn how they should respond to their neighbours’ movements in order to achieve that. After some training, they have learned how to stay together most of the time. The curve on the right tracks the rewards they earn for this.