We are excited to announce Maze, a new framework for applied reinforcement learning (RL). Beyond introducing Maze, this blog post also outlines the motivation behind it, what distinguishes it from other RL frameworks, and how Maze can support you when applying RL — and hopefully prevent some headaches along the way.
Maze is a framework for applied reinforcement learning. It focuses on solutions for practical problems arising when dealing with use cases outside of academic and toy settings. To this end, Maze provides tooling and support for:
This breadth of features reflects a holistic approach to the RL life-cycle and development process. In contrast, many other RL frameworks follow a more narrow approach by prioritizing algorithms above all other, potentially just as crucial, aspects of building RL-driven applications.
Maze offers both a CLI and an API. Thorough documentation is available. To get you started, we provide Jupyter notebooks you can run locally or in Colab (and further examples in the documentation). Maze utilizes PyTorch for its networks.
We aim to achieve a high degree of configurability while maintaining an intuitive interface and resorting to sane defaults whenever appropriate. Running Maze can be as easy as this:
from maze.api.run_context import RunContext
from maze.core.wrappers.maze_gym_env_wrapper import GymMazeEnvrc = RunContext(env=lambda: GymMazeEnv(‘CartPole-v0’))
rc.train(n_epochs=50)# Run trained policy.
env = GymMazeEnv(‘CartPole-v0’)
obs = env.reset()
done = Falsewhile not done:
action = rc.compute_action(obs)
obs, reward, done, info = env.step(action)
However, every realistic, practical RL use case will be considerably more complex than CartPole-v0. Problems will need to be debugged, behavior recorded and interpreted, components (like policies) configured and customized. This is where Maze shines: It offers several features that support you in mitigating and resolving those issues. Among others:
Maze is completely independent of other RL frameworks, but plays nicely with some of them — like RLlib: Agents can be trained with RLLib’s algorithm implementations while still operating within Maze. This way the full extent of Maze’s features can be leveraged while complementing the native trainers with RLlib’s algorithm implementations.
If this has caught your interest, check out the links mentioned in the introduction — and, of course: read on, there is more to come.
Reinforcement learning (RL) is, together with supervised and unsupervised learning, one of the three branches of modern machine learning. Set apart by its abilities to learn from approximative reward signals and to devise long-term strategies, it poses a great fit for many complex, challenging real-world problems. This includes supply chain management, design optimizations, chip design, and plenty of others — including, of course, many popular games.
Why, then, haven’t we seen more widespread adoption of RL in the industry so far? Compared to the more established un- and supervised learning, traditional RL is rather sample-inefficient and in almost all cases requires interaction with a simulation of the problem domain. However, simulators both accurate and fast enough are often unavailable, and creating them is usually not trivial. On top of that, RL has a reputation for being difficult to implement, train and debug — after all, more moving components have to work in harmony compared to an isolated supervised model.
We have worked on several reinforcement learning projects in recent years, and experienced just that: Due to their complexity, real-world problem settings often require sophisticated tooling support to develop, evaluate and debug them successfully. We are big fans of existing frameworks like RLLib and Stable Baselines and draw a lot of inspiration from them. However, we have encountered enough hiccups and limitations during our projects to start wondering about how to design a framework mitigating those repeatedly experienced issues in the first place.
This motivated us to start working on Maze: a reinforcement learning framework that puts practical concerns in the development and productionisation of RL applications front and center.
Maze emphasizes being an applied and practical RL framework supporting users in building RL solutions to real-world problems. In our opinion, this necessitates adherence to some guiding principles and best software engineering practices:
You’ve seen the minimal code setup to get Maze up and running. What does a realistic project with all the nuts and bolts look like? As a case study, we tried our hand at the “Learning to run a power network” (L2RPN) challenge, in which the specified power grid has to be kept running by avoiding blackouts. We re-implemented last year’s winning solution with Maze and have published the code on GitHub along with a write-up on Medium.
Another use case is supply chain management: we utilize Maze to optimize the stock replenishment process for the steel producer voestalpine. As this is not an open-source project, we cannot provide the source code for this project. However, if this is up to your alley, we strongly recommend giving the joint talk by voestalpine and us on this topic at the applied AI conference for logistics a listen.
More open-source projects are in our pipeline and will be announced shortly.
Discussing every one of Maze’s features is not within the scope of this article, hence we focus on a few of our favorites here. The title of each paragraph links to the corresponding page in the documentation.
Maze offers a CLI via the maze-run command-line script. As maze-run is built on Hydra, it can process configuration settings both directly and via .yaml files. A simple training run on CartPole-v0 might look like this:
maze-run -cn conf_train env.name=CartPole-v0
If you’d like to train with PPO:
maze-run -cn conf_train env.name=CartPole-v0 algorithm=ppo
While a CLI can be convenient, many use cases require more direct interaction with the framework and its components. To this end, we provide a high-level API in Python handling all of the complexity involved in configuring and running training and rollout, so that you don’t need to worry about any of it. We aim to keep API and CLI usage as congruent as possible — if you know how to use one, you should be good to go with the other.
Debugging code is hard. Debugging RL systems is even more so. With Maze, you can fire events whenever something notable happens in your environment or with your agent. Events can then be aggregated and inspected in Tensorboard to get an overview across many episodes. You can also log them in a CSV file and inspect them with a tool of your choice.
The setup for an RL-centered application can be quite complex - especially when experimenting with and comparing different configurations. Hydra is designed with such use cases in mind. We leverage its abilities to enable users to quickly compose and modify their configurations without having to change a single line of code.
To learn an effective policy, an agent first has to be able to make sense of its environment, e.g. by utilizing neural networks to learn useful feature representations. That’s why Maze’s perception module includes extendable, configurable building blocks to conveniently build PyTorch-based neural networks to this end. It also allows visualizing the assembled networks, e.g.:
The OpenAI gym spaces interface is the inofficial standard for environments in the RL community. Maze adheres to this standard but allows a larger degree of freedom with its structured environments. This is done by decoupling the environment from the converter, which transforms the environment’s state into a gym-compatible observation space object. This allows for easier experimentation with different state/observation representations: You can write and configure different converters modifying the observation’s representation to the policy without touching the environment’s core logic.
Structured environments also define all interfaces necessary to support more complex scenarios such as multi-step/auto-regressive, multi-agent, and hierarchical RL.
Tensorboard is widely used for the monitoring of training and evaluation of deep learning models, also independently from Tensorflow (many libraries, e.g. PyTorch, integrate Tensorboard). Maze hooks up its event logging system with Tensorboard — everything that is logged will be visible there, including the run configuration, your custom events, action sampling statistics, and the observation distribution.
We aim for Maze to include state-of-the-art, scalable, and practically relevant algorithms. As of now, the following algorithms are supported: A2C, PPO, Impala, SAC, behavioral cloning, and evolutionary strategies. More are in the pipeline, e.g. SACfD, AlphaZero, and MuZero. Maze also offers out-of-the-box support for advanced training workflows such as imitation learning from teacher policies and policy fine-tuning.
If you want to explore Maze, we recommend giving our “Getting Started” notebooks a shot — you can spin up a Google Colab notebook by clicking the Colab button at the top of the notebook. This way you can try out Maze without having to install anything locally. Alternatively, you can install Maze with pip install maze-rl.
If you enjoyed this article — stay tuned! We plan to regularly release articles revolving around RL and featuring Maze from now on. Some of the upcoming articles will introduce the soon-to-be-released model zoo. Others will be part of a series dedicated to exploring challenging aspects of using RL in practice and how to tackle them with Maze.
We are continuously developing Maze, making it more mature and adding new features. We love community contributions — no matter whether they are new features, identifications of bugs, feature requests, or simply questions. Don’t hesitate to either open an issue on GitHub, drop us a line on GitHub discussions, or ask a question tagged maze-rl on StackOverflow. And if you decide to use Maze in one of your projects — tell us, we might feature your project in our documentation.
For further information, we encourage you to check out Maze on GitHub, its documentation on readthedocs.io, and learning resources as Jupyter notebooks runnable on Google Colab. We hope that Maze is as useful to others as it is to us, and look forward to seeing what the RL community at large will build with it. Give it a try and let us know what you think!