Fighting Overfitting in Deep Reinforcement Learning

Namaste!

Travelling for me is always an opportunity for reflection, study, and relaxing. On the road to Frankfurt I decided to come back to a company called OpenAI. OpenAI is non-profit and focused on AI-research. You might remember it being one of the many endeavours that Elon Musk sponsors. And amongst the Deep Reinforcement Learning community, it is very renowned. Partly because of OpenAI Gym. Gym is a huge pool of environments and a quasi-standard for environment interfaces.

One of their articles caught my eye today. Quantifying Generalization in Reinforcement Learning. You definitely remember that overfitting is a well-known issue in Deep Learning and traditional Machine Learning. Overfitting can be roughly translated to: The degree to which your model learns the training-data by heart. A huge quantity of overfitting corresponds to a lack of generalization. Your model simply does not work well with data it has not seen yet. Usually, this results in your model not working well or even at all in production.

Did you know...

That you can book me for trainings?
That I could be your mentor?

Feel free to get in touch anytime.

Deep Learning and Deep Reinforcement Learning have a lot in common.

Deep Reinforcement Learning takes the data-driven approach Deep Learning and lifts it to the goal-driven level. It moves away from dealing with data and puts emphasis on creating agents that act, that act well, and that act out optimal strategies. It is always good to think about a Deep Reinforcement Learning agent as an artificial entity that learns to play a computer game. Although computer games are not that serious at a first glance, they are ideal test-beds for AI. Any agent that works well on a computer game has a high chance of working well in data-centre cooling control. With a little training of course. And this is just one example.

It is certain that Deep Reinforcement Learning also suffers from overfitting. Here the lack of generalization comes with the inability of an agent to perform well in scenarios it has not seen yet. In computer games, it is easy to create an agent that learns to play a fixed set of levels. But this agent could and most probably will fail on levels it never played. This is especially crucial once you try to put it into production.

An experiment: How many levels do you need to avoid overfitting?

OpenAI did some excellent experiments and published the results in their article. They introduce a novel environment called CoinRun. This new environment has a striking resemblance to classic platformer games. The goal is to make your way through a level, finding and collecting the single coin that is hidden somewhere. The most interesting aspect of CoinRun is: All levels are procedurally generated.

Keeping in mind the game’s ability to generate an infinite amount of levels, the researchers did experiments with several subsets of fixed levels. As expected there was a huge degree of overfitting, when only training on 4K levels. „Only“ is a challenging word because 4K levels are already quite a lot. Especially for human players. Surprisingly, overfitting was still visible with 16K training levels. And yes, the best agents were those trained on an unrestricted set of levels.

Another experiment: How to get out of overfitting when your amount of level is fixed to a very small number?

Yes, in some cases it is not really feasible or possible to train on an unrestricted amount of levels. You remember a similar problem with Deep Learning. Getting more data is equivalent to getting more levels to train on. Both are sometimes equally impossible.

OpenAI ran several sub-experiments. All focused on the task at hand: How to reduce overfitting? Results were as follows. Dropout and L2 regularization are both strategies to overcoming overfitting. You know both from Deep Learning. It turned out that they worked well, whereas L2 regularization performed better than dropout. Data augmentation is also a well-known strategy against overfitting. It is about changing your data slightly and randomly in order to increase the size of your data-set artificially. Data augmentation works better than the other approaches. Even better is batch normalization, a process that normalizes your data at specific layers of your underlying Neural Network in batches. And finally, environmental stochasticity performed best. Stochasticity is very interesting. It replaces the agent’s actions randomly with random actions on the environment level. Obviously training the agent for uncertainty, thus increasing generalization.

In summary…

In summary, it was exciting for me to see the issue of overfitting being addressed in the field of Deep Reinforcement Learning. It is similar in most aspects but it has differences in the details. Some issues you do inherit from Deep Learning, some are new. This is also true for compensation strategies. Some you can re-use, some you have to come up with. And finally, please also read their paper. And do not miss the Gym-compatible CoinRun environment.

Stay in touch.

I hope you liked the article. Why not stay in touch? You will find me at LinkedIn, XING and Facebook. Please add me if you like and feel free to like, comment and share my humble contributions to the world of AI. Thank you!

If you want to become a part of my mission of spreading Artificial Intelligence globally, feel free to become one of my Patrons. Become a Patron!

A quick about me. I am a computer scientist with a love for art, music and yoga. I am a Artificial Intelligence expert with a focus on Deep Learning. As a freelancer I offer training, mentoring and prototyping. If you are interested in working with me, let me know. My email-address is tristan@ai-guru.de - I am looking forward to talking to you!