Deep Reinforcement Learning and DOOM

Namaste! A while ago… In one of my recent blog-posts… I told the story of me attending the Stuttgart.AI meetup. A detail of that story was inspiration. Somehow I got motivated to dive again into the Deep Reinforcement Learning topic. And as you know me… Inspiration leads to code! And code leads to me telling you about it!

Last week I held the third Würzburg Deep Learning Meetup. Flyeralarm was the dear host and we had a lot of fun. Discussing Deep Learning and DOOM.

What is Deep Reinforcement Learning?

Deep Reinforcement Learning is Deep Learning plus Reinforcement Learning. Very simple. The first is training Deep Artificial Neural Networks. And the second is training intelligent systems via rewards and penalties. You know, Deep Learning is often in happy land. The place where you can do Supervised Learning. The place where you have a lot of labeled data. Everything is sunny there… But sometimes you do not have the luxury of labeled data…

With Reinforcement Learning you can create your training data on the fly. Remember AlphaZero? The Artificial Intelligence by Google, that learned to play and master Go? The company behind that is DeepMind. It rose to fame when the people showed that you can use Reinforcement Learning in combination with Deep Learning in order to let an AI learn to master a lot of Atari games. By playing those classics!

But how does Deep Reinforcement Learning work? Let me cut the chase… You start with an untrained Neural Network. And you start with an arbitrary simulation. You assume that the simulation evolves in a discrete manner. That is, step by step. In each step, two things can happen. Either it performing a random action, or it is predicting an action that maximizes the reward. No matter what happens, you train the Neural Network while it is happening.

How does DOOM fit into that picture?

DOOM is great! I come from a world where DOOM was almost forbidden. This is the „beauty“ of Germany. Here we have an organization which decides which media is available to the public an which ones are not. Do not worry. This is not censorship, is it? Anyway… Not deviating… For quite some time you could not buy DOOM easily in stores here.

These days are long gone… DOOM was released in 1993, which makes it fairly old. Over the years it turned into some kind of Hello, world!. This is basically due to the fact that the developers decided to release the source-codes very early. Since then you saw DOOM running on digital cameras and even in/on sportscars.

VizDoom is awesome! It is a DOOM-port that opens the game for AI engineering. It allows for any AI to perceive the world as the human player does – seeing what is on screen. And it allows for interacting with the world as the player does – performing actions like turn left, turn right, shoot straight.

See for yourself:

In Deep Reinforcement Learning, you usually use an exploration rate.

The Deep Reinforcement Learning approach is usually between two extremes: Either performing random actions or performing predicted actions. No matter what, the Neural Network always will always learn from the actions. Here the exploration rate is your magic number. A value of 1.0 means that all actions are random. A value of 0.0 means that all actions are predicted by our Neural Network. Usually you start with a rate of 1.0 and then you decrease it to some minimum value over time. Not really to absolute zero. You want to keep some randomness.

This is the whole charm of Deep Reinforcement Learning. The learning system usually starts exploring the simulation and finds out what happens when random actions are executed. Over time it gains more and more confidence and knowledge. Thus, after a couple of time steps, the Neural Network takes over and owns the place.

Let us have a look at the code. Now!

That is enough about Deep Reinforcement Learning theory. Let us dive headfirst into practice… You know, everytime we discuss some Python source-code, we start with the imports! There you go:

from vizdoom import *
import random
import time
import keras
from keras import models
from keras import layers
from keras import optimizers
import numpy as np
import cv2
from collections import deque

This is everything you need including Keras and NumPy. This is our holy duality. You need Keras for the Deep Learning side of our project. And you need NumPy for the Data Science magic. Now, let us start DOOM. Fortunately ViZDoom makes that very easy for us:

# Create an instance of the Doom game.
game = DoomGame()

Most of the lines are self-explanatory. A DOOM-game is started. It is configured using a config-file. And the screen-format is set to 8bit grayscale, which usually works really well. We are about to train a Neural Network. And that is all about hyper-parameter tweaking. These are the hyper-parameters:

# Hyper parameters.
epochs = 20 # Number of epochs to train.
learning_steps_per_epoch = 2000 # Number of learning steps per epoch.
screen_shape = (30, 45, 1) # The target shape of the screen.
#model_type = "dense" # A fully connected network.
model_type = "conv2d" # A convolutional neural network.

What does that express? Well, we train the Neural Network over 20 epochs. In each epoch we perform 2000 learning steps on the Neural Network. We scale the screen to a size of 30 times 45 pixels, which might be enough for training. And we use a Convolutional Neural Network.

Remember that the simulation evolves by executing actions. In our case we only use four actions: Doing nothing, shooting, moving left and moving right:

# These ere the actions.
action_none = [0, 0, 0]
action_shoot = [0, 0, 1]
action_left = [1, 0, 0]
action_right = [0, 1, 0]
actions = [action_shoot, action_left, action_right]
action_length = len(actions)

Got some problems? Use an agent!

Well, the parameters are set. The next thing would be defining our agent. An agent is an intelligent entity that is situated in an environment. In our case a player in the DOOM world. Here it makes sense to go fully OOP:

class DeepLearningAgent:
    """ This is the Deep Learning agent. """

    def __init__(self, model_type, screen_shape, action_length, actions):
        """ Initializes an instance of the agent. """

        # Set all parameters.
        self.screen_shape = screen_shape # The target screen shape.
        self.action_length = action_length # The length of the actions.
        self.actions = actions # The actions themselves.

        # Set the independent parameters.
        self.memory = deque(maxlen=2000) # Size of memory.
        self.gamma = 0.95    # Discount rate for learning.
        self.epsilon = 1.0  # Exploration rate, 1.0 is random, 0.0 is prediction.
        self.epsilon_min = 0.01 # Minimum the exploration rate cools down to.
        self.epsilon_decay = 0.9995 # Decay of the exploraion rate per step.
        self.learning_rate = 0.001 # Learning rate of the optimizer.

        # Create the model.
        self.model = self.build_model(model_type)

Now comes the good part. We are all interested in the Neural Network. I usually make sure that the model type is a parameter. When it comes to predicting something, most of the time it makes sense to start with a dense network first. If this does not perform well, it might make sense to go fully convolutional:

    def build_model(self, model_type):
        """ Builds the model according to a given type. """

        # Builds a simple fully connected network.
        if model_type == "dense":
            model = models.Sequential()
            model.add(layers.Dense(128, activation='relu'))
            model.add(layers.Dense(32, activation='relu'))
            model.add(layers.Dense(self.action_length, activation='linear'))

        # Builds a convolutional neural network.
        elif model_type == "conv2d":
            model = models.Sequential()
            model.add(layers.Conv2D(8, (6, 6), strides=(3, 3), input_shape=self.screen_shape))
            model.add(layers.Conv2D(8, (3, 3), strides=(2, 2)))
            model.add(layers.Dense(125, activation='relu'))
            model.add(layers.Dense(self.action_length, activation='linear'))

        # Compile and render the model.

        return model

A core functionality is to keep an explicit memory of state-transitions. This is good for the so called replay. The algorithm regularly repeats sequences stored in memory to boost its Neural Network:

    def remember(self, screen, action, reward, next_screen, done):
        """ Stores a state transition into memory. """
        assert screen.shape == self.screen_shape
        assert next_screen.shape == self.screen_shape
        self.memory.append((screen, action, reward, next_screen, done))

Also, we need the functionality of either acting randomly or predicting an action. Note, this code is in dire need of a dynamic exploration rate:

    def act(self, screen):
        """ Yields an action. Either a random or a predicted one. """

        # Return a random action.
        if np.random.rand() <= self.epsilon: 
            return random.choice(self.actions)

        # Predict an action.
            screen = np.expand_dims(screen, axis=0)
            act_values = self.model.predict(screen)
            max_index = np.argmax(act_values[0])
            return self.actions[max_index]

Now comes the replay. It is always good to exercise the Neural Network:

    def replay(self, batch_size):
        """ Replays from memory and trains network. """

        # Train a mini batch.
        mini_batch = random.sample(self.memory, batch_size)

        for screen, action, reward, next_screen, done in mini_batch:
            screen = np.expand_dims(screen, axis=0)
            next_screen = np.expand_dims(next_screen, axis=0)

            target = reward
            if not done:
                prediction = self.model.predict(next_screen)[0]
                target = (reward + self.gamma * np.amax(prediction))
            target_f = self.model.predict(screen)
            target_f[0][action] = target
  , target_f, epochs=1, verbose=0)

        # Let the exploration-rate decay.
        if self.epsilon > self.epsilon_min:
            self.epsilon *= self.epsilon_decay

Of course… Loading and saving is a must:

    def load(self, path):
        """ Loads the model from a path. """

    def save(self, path):
        """ Saves the model to a path. """

Going down the main()-road.

This concludes the Neural-Network-Agent that uses Deep Reinforcement Learning to learn DOOM. How dies training look like? Yes, we need an instance of the agent and then we train it for a couple of epochs:

def main():
    """ This is the main method. """

    # Create the agent with its parameters.
    agent = DeepLearningAgent(model_type, screen_shape, action_length, actions)

    # Do the training.
    batch_size = 32 #  Size of the mini-batches for training
    done = True # Is the game episode done?
    for epoch in range(epochs):

        for _ in range(learning_steps_per_epoch):

            # The game episode is done. Proceed properly.
            if done == True:
                done = False
                state = game.get_state()
                screen = state.screen_buffer
                screen = transform_screen_buffer(screen, screen_shape)

            # Perform one step. Get an action, execute it, get the reward. Get
            # the next state.
            action = agent.act(screen)
            reward = game.make_action(action)
            next_state = game.get_state()

            # Make sure that the reset works.
            if game.is_episode_finished():
                done = True
                next_screen = screen
                next_screen = next_state.screen_buffer
                next_screen = transform_screen_buffer(next_screen, screen_shape)

            # Let the agent remember the trainsition.
            agent.remember(screen, action, reward, next_screen, done)
            screen = next_screen

            # Do the training.
            if len(agent.memory) > batch_size:

We need a little utility function. The screen buffer needs some massaging in order to fit perfectly into our Neural Network:

def transform_screen_buffer(screen_buffer, target_shape):
    """ Transforms the screen buffer for the neural network. """

    # If it is RGB, swap the axes.
    if screen_buffer.ndim == 3:
        screen_buffer = np.swapaxes(screen_buffer, 0, 2)
        screen_buffer = np.swapaxes(screen_buffer, 0, 1)

    # Resize.
    screen_buffer = cv2.resize(screen_buffer, (target_shape[1], target_shape[0]))

    # If it is grayscale, add another dimension.
    if screen_buffer.ndim == 2:
        screen_buffer = np.expand_dims(screen_buffer, axis=2)

    screen_buffer = screen_buffer.astype("float32") / 255.0

    return screen_buffer

Almost done:

if __name__ == "__main__":

Well that is it! A very nice fundament for Deep Reinforcement Learning experiments with DOOM!

Call for your teaching!

As you might know, I am very happy about sharing all of my knowledge to you! It is my pleasure. Today I would like to ask you for your knowledge! You see, the agent above is only a sketch. It is a rudimentary building block. It basically does the trick but needs some more coarse- and fine-tuning! Would you like to join? Check out the Meetup’s GitHub repository!

Stay in touch.

I hope you liked the article. Why not stay in touch? You will find me at LinkedIn, XING and Facebook. Please add me if you like and feel free to like, comment and share my humble contributions to the world of AI. Thank you!

If you want to become a part of my mission of spreading Artificial Intelligence globally, feel free to become one of my Patrons. Become a Patron!

A quick about me. I am a computer scientist with a love for art, music and yoga. I am a Artificial Intelligence expert with a focus on Deep Learning. As a freelancer I offer training, mentoring and prototyping. If you are interested in working with me, let me know. My email-address is - I am looking forward to talking to you!

Subscribe to the AI-Guru mailing list!

* indicates required
Privacy Policy

AI Guru will use the information you provide on this form to be in touch with you and to provide updates and marketing.

By clicking on the subscribe button you accept the Privacy Policy of this website.

We use Mailchimp as our marketing platform. By clicking below to subscribe, you acknowledge that your information will be transferred to Mailchimp for processing. Learn more about Mailchimp's privacy practices here.

Hinterlasse einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert.