ChainerRL Visualizer: Deep RL Agent Visualization Library

Tag

Kei Akita

Engineer

This post is contributed by Mr. Takahiro Ishikawa, who was an intern last year and now works as a part-time engineer at PFN.
We have released ChainerRL Visualizer, which visualizes the behaviors of agents trained by ChainerRL on the Web browser.

Github

My name is Takahiro Ishikawa, who participated in PFN 2018 internship and currently work as a part-time engineer.
This library is developed in the aim of “making debugging of deep RL implementations easier” and “contributing to understanding of how deep RL agents work”.
It enables to interactively observe the behaviors of trained deep RL agents on the Web browser.
This library is easy to use. All you have to do is to pass the `agent` object implemented in ChainerRL and the `env` object that satisfies a specific interface to the `launch visualizer` function provided by this library, along with a few options.

from chainerrl_visualizer import launch_visualizer
# Prepare agent and env object here
#
# Prepare dictionary which explains meanings of each action
ACTION_MEANINGS = {
  0: 'hoge',
  1: 'fuga',
  ...
}
launch_visualizer(
    agent,                           # required
    env,                             # required
    ACTION_MEANINGS,                 # required
    port=5002,                       # optional (default: 5002)
    log_dir='log_space',             # optional (default: 'log_space')
    raw_image_input=False,           # optional (default: False)
    contains_rnn=False,              # optional (default: False)
)

After executing this script, a local Web server will be launched and the following features will be provided on the Web browser.

1. Roll-out one episode (or specified steps)

You can tell the agent to run one episode (or specified steps) from the UI, then the outputs of the agent model will be visualized in chronological order.
In the following video, the probabilities of the next action and the state value of the agent trained with A3C are visualized.

2. Tick timestep and visualize the behaviors of environment and agent

In the following video, the agent can be moved back and forth and the outputs in each step are visualized along with the behavior of the environment. The pie chart on bottom-right shows the probabilities of the next action of each step.

3. Saliency map

If the input of the model is raw pixels, the UI can visualize saliency map, which shows the specific sub-area to which the agent pays attention. This feature is implemented based on the paper Visualizing and Understanding Atari Agents.
In the following video, saliency maps of the agent trained with CategoricalDQN are visualized over the image of the environment.
For now, this feature allows us to specify the number of steps for which saliency maps are created because the computational cost of creating saliency maps is very expensive.

4. Miscellaneous visualizations

Various ways of visualization for each type of agent are supported.
For example, the value distributions of the agent trained with CategoricalDQN are visualized in the following video.

Quickstart guides are provided.

For now, almost all of the visualization tools in deep learning have focused on visualizing scores and other metrics along with the progress of the model training. ChainerRL Visualizer is the original visualizer among them as it can interactively and dynamically visualize the behaviors of the deep RL agent and environment themselves.
Atari Zoo, which was released by Uber Research, is developed for a similar purpose. Atari Zoo aims at accelerating research for understanding deep RL agent, providing trained models, and analyzing tools for the frozen models. It enables researchers to participate in the research for understanding deep RL agent even if they don’t have enough computing resources.
ChainerRL Visualizer is different from Atari Zoo in the sense that “all” kinds of agents in ChainerRL can be dynamically analyzed during training by the visualizer while Atari Zoo is only for visualizing already-trained models in the repository and those models are limited to the ALE environments and specific algorithms where the architecture looks like ( raw image => Conv => Conv .. => FC .. ).
There are also other visualization tools with similar motivations, such as DQNViz that visualizes the behaviors of DQN agents and its various metrics during the training.
Though much effort has been dedicated to improve the performance of deep RL algorithms on benchmark tasks, less effort has been paid for understanding what deep RL agents learn by deep RL algorithm, and analyzing how the trained agents behave. However, the research for understanding deep RL agent as seen in the visualizations above will expand in the future.
ChainerRL Visualizer is now in beta version and still does not have sufficient features for deeply analyzing deep RL agents. So continuous development is needed in order to contribute to the emerging research area of understanding deep RL agent. We welcome you to participate in the development of ChainerRL Visualizer to add new features or to improve existing features through OSS collaboration.

Tag

# Internship