Four things you would love to know about Deep Reinforcement Learning

Namaste! As you might have noticed, I am heavily into Deep Reinforcement Learning. It is always good to proceed in a goal-oriented way. Set a couple of goals, follow them until reached, repeat. One of my biggest recent goals was to be able to teach Deep Reinforcement Learning in my future courses. I reached that goal now. While doing some heavy experiments and research I came up with a small list of things you would love to know about Deep Reinforcement Learning. And, as always, I am happy to share my thoughts with you.

What is Deep Reinforcement Learning? In one of my recent blog-posts, I gave a very technical introduction. Providing a simulations and solving a problem. Seen from a far enough point of view, Deep Reinforcement Learning is Reinforcement Learning plus Deep Learning. The first is training Artificial Intelligences via an action-rewards-system. Positive and negative reinforcement of different behaviours in order to come up with the best. The second is designing and training Deep Neural Networks. The topic that has been really, really trending in the last couple of years. For very solid reasons.

Deep Reinforcement Learning is all about Agent Systems Engineering.

What is Artificial Intelligence? When was the last time you discussed that question? It is a non-trivial one. Mainly because Artificial Intelligence is something like an umbrella-term. It is very far-reaching. It even has roots in philosophy. I try to not involve myself in such discussions, because I always fear that I would be asked to discuss an upcoming robot uprising…

There is a quite established group of experts that claim the following: Artificial Intelligence is all about intelligent agents. I tend to support that standpoint. What is an intelligent agent, you might ask? An intelligent agent is an (artificial) entity inside an environment. It can perceive the environment via sensors and interact with it via actuators. Think about robots with cameras and arms. When is an agent intelligent? Well… When it maximises its performance measure. This means that there is some means to assess its performance. For example, miles without a crash in autonomous driving.

Deep Reinforcement Learning is all about training intelligent agents. Those are assumed to be situated in environments. Usually connected via an interface. And they have the task to optimise their rewards. This is fairly simple. Every time the agent performs an action, the state of the environment changes and the agent gets a feedback. A goal of training would be to find a policy that yields the highest rewards over time.

Deep Reinforcement Learning is very different from Deep Learning.

The core component of an agent in Deep Reinforcement Learning is a Deep Neural Network. And this network adapts during training in order to optimize the problem at hand. This is usually called Q-learning. Coming up with and refining over time a table that suggests which action to do in which state in order to get the highest reward. And the good thing is: Deep Neural Networks are perfect for approximating such tables. Why? Because they do an excellent task in extracting features from the data.

Deep Learning is mainly concerned with a concept called hyperparameter optimization. Hyperparameters are parameters of your entire solution to a problem. They include but are not limited to your Neural Network architecture. They also include amongst other things, how long your training will take and how your learning algorithm is configured. You – as a Deep Learning expert – will mostly be busy tweaking your hyperparameters in order to get the Neural Network that generalises your problem the best. And what does Deep Reinforcement Learning add to that picture? More hyperparameters and a completely different point of view.

The moment you know Deep Learning you have one of multiple building blocks for Deep Reinforcement Learning in your hands. But this does help you to understand the complete picture of Deep Reinforcement Learning. There is more to it. Again, the Deep Neural Network is only one component in Deep Reinforcement Learning. A data-structure that helps you to find the best actions. How to train such a network is the crucial part. Here you need a lot of knowledge about stochastics. Why, you may ask? You must be able to model your problem in a formal way in order to understand and solve it. What do you need the most? Markov Decision Processes.

Deep Reinforcement Learning has a lot to do with Markov Decision Processes.

Markov Decision Processes are a very cool thing. Ask yourself: What is the main task of an agent in an environment? What is it doing all the time in order to find the best action in a given situation? Right, decision making.

Markov Decision Processes are a tool to modelling decision making. You assume that the world consists of a set of states. A state is usually an interpretation of the world as perceived through an agent’s sensors. In each state, an intelligent agent knows which action to perform next. This is usually represented with a probability distribution. Every time an agent performs an action it immediately receives two things. First of all a new state. The agent might have moved in the environment and changed its point of view. Different sensor data. And second of all the agent gets a reward. These are exactly the components of a Markov Decision Process.

It became clear that Deep Neural Networks are a great tool for solving such processes. Especially in the case where you have a huge set of possible states. Deep Neural Networks are excellent in learning a feature extraction that allows them to find pattern in what they perceive. Remember the game Go? Solved by AlphaGo Zero? This is a good thing to have a look at. Go is horrible because it has a huge amount of states. Yet it has been solved by Deep Reinforcement Learning.

Deep Reinforcement Learning is heavily goal oriented.

Deep Reinforcement Learning is awesome because it shows a historical trend in engineering. What do I mean with this riddle? Long story short: After quite some time of algorithm-driven development, progress was fueled by data-oriented approaches. And now goal-oriented approaches are upon us.

Back in the old days, solutions were driven by algorithms. Highly-skilled engineers were busy for a long time coming up with algorithms to solve problems. Designing algorithms, combining multiple algorithms… The whole roundtrip. Here, the source-codes were of primary importance. Data had secondary importance.

This changed in the past. Especially with the advent of Machine Learning the perspective shifted heavily towards the data itself. If you remember, Machine Learning is all about finding patterns in data in order to solve problems. How to find those patterns boils down to fitting a Machine Learning algorithm on training data. This is especially true for Deep Learning.

Deep Reinforcement Learning is even a step further. Algorithms are almost off the table. And you are not really involved in doing tricks with data. Data is something that your agent would get anytime from its sensors. You would rarely touch that. Your main task, on the other hand, would be designing a proper rewards-function and connecting your agent to the environment. Once this is up and running, you just watch your agent exploring and exploiting the environment in order to solve its given task. Fascinating!

A couple of words in the end.

I strongly believe that Deep Reinforcement Learning is going to be one of the biggest game changers in Artificial Intelligence in the near future. And it requires a completely different mindset. In order to master Deep Reinforcement Learning you have to arrive at a goal-oriented standpoint. You have to model proper rewards in order to solve problems. Data itself is only secondary. And algorithms… Well, not so important.

In the near future, I expect to see more and more people training Intelligent Agents using one of many Deep Reinforcement Learning approaches. And playing Flappy Bird is only one example, which I solved during a very cold weekend… Thanks for reading!

Stay in touch.

I hope you liked the article. Why not stay in touch? You will find me at LinkedIn, XING and Facebook. Please add me if you like and feel free to like, comment and share my humble contributions to the world of AI. Thank you!

If you want to become a part of my mission of spreading Artificial Intelligence globally, feel free to become one of my Patrons. Become a Patron!

A quick about me. I am a computer scientist with a love for art, music and yoga. I am a Artificial Intelligence expert with a focus on Deep Learning. As a freelancer I offer training, mentoring and prototyping. If you are interested in working with me, let me know. My email-address is tristan@ai-guru.de - I am looking forward to talking to you!