Parameter Space Noise for Exploration

Matthias Plappert, OpenAI

May 15, 2018

Abstract

All decision-making problems fundamentally aim at optimizing for a score that measures success. Very often however the outcome of a decision is not immediately dependent on an action, making the score indifferentiable with respect to this action(s). Broadly speaking, the field of reinforcement learning (RL) is looking at a particular way to overcome this problem. Matthias Plappert, a KIT graduate who wrote his Master's Thesis at OpenAI and has joined them full-time now, will first introduce us to the field of deep reinforcement learning and then dive straight into very current research questions.

One part of Matthias' current research is concerned with an intrinsic problem encountered in RL: The way agents explore the action space in order to find new solutions, while making use of learned behavior (exploitation). Deep reinforcement learning methods generally engage in exploratory behavior through noise injection in the action space. An alternative is to add noise directly to the agent's parameters, which can lead to more consistent exploration and a richer set of behaviors. Methods such as evolutionary strategies use parameter perturbations, but discard all temporal structure in the process and require significantly more samples. Combining parameter noise with traditional RL methods allows to combine the best of both worlds. In Matthias latest work with OpenAI he was able to demonstrate that both off- and on-policy methods benefit from this approach through experimental comparison of DQN, DDPG, and TRPO on high-dimensional discrete action environments as well as continuous control tasks.

Besides a general introduction to RL, Matthias' talk will present the major findings of his aforementioned recent paper that will be presented at ICLR shortly, here's a link to a preprint: openreview

More of his work, including an extention to Keras specifically for RL, can be found here: matthiasplappert.com

Slides