Learning to act in complex environments is to date among the most challenging tasks the Machine Learning and Computer Vision communities deal with. In this problem setting rewards are typically sparse and action-taking is non-differentiable. The most prominent technique to account for these difficulties is reinforcement learning, where one aims at learning to maximize an expected reward over typically millions of episodes. An alternative approach that perhaps has not been quite as attended to are evolutionary algorithms. As the name suggests, such algorithms breed different versions of models in order to select the most successful ones according to a fitness function.
Christoph Martens, who goes by Cookie Engineer, employs such techniques to code software bots. He runs a company called artificial.engineering where he advances the application engine lychee.js with "the goal to deliver Total Automation through Artificial Intelligence". Check out some of his demo's show-casing evolutionary AI applications here:
Flappy Plane: Demo
Pong: Demo
The first talk of our two-part series on evolutionary AI introduces the basics behind the algorithms. The concept of adaptive neural networks (ANNs) and genetic programming are presented. We get to learn about how to parallelize simulation runs and when to use Bayesian Learning for strategic objective systems. NEAT (Neuro Evolution of Augmenting Topologies) -a method that allows for the evolution of neural network structures by means of a genetic algorithm- is discussed.
Slides: Part I
This time we'll get our hands dirty and code an Evolutionary AI simulation that parallelizes the reinforced Pong demo. It will be a practical workshop, where we'll build up everything step-by-step and take a look at the code-side implementations. If we have enough time, we'll also try to implement some of the concepts behind NEAT into the demo.
While the last talk developed this theory of the human brain as an approximately Bayesian prediction machine from a neuroscientific perspective, we get to see this time how the resulting optimization objective can be used directly for training a deep agent to solve the mountain car problem and discuss possible advantages over "classic" reinforcement learning approaches. Along the way, the utilized machine learning tools such as variational auto-encoders and evolution strategies for optimization of non-differentiable objectives will be explained.
Slides: Part II