Page MenuHomeWildfire Games

Add RPC interface for Reinforcement Learning
Needs ReviewPublic

Authored by irishninja on Aug 20 2019, 2:46 AM.
Restricted Owners Package, Restricted Owners Package, Restricted Owners Package and 4 others
"Love" token, awarded by Stan.


Group Reviewers
Restricted Owners Package(Owns No Changed Paths)
Trac Tickets

This revision adds an RPC interface (using GRPC) which enables training reinforcement learning agents in 0 AD. This also includes a python wrapper for conveniently interacting with 0 AD including setting scenarios and controlling players in lock step.

Notes about remaining features, etc:

  • Windows support for premake
  • As this adds a new dependency (GRPC), should it be only available as a feature flag? Or should we update the installation instructions? Or should we add a copy of grpc/protobuf to the libraries directory?
  • There are currently some outstanding features:
    • I have not implemented all the features that may be desireable for configuring scenarios (such as ceasefire durations, custom victory conditions). The outstanding options are commented out in source/rlinterface/proto/RLAPI.proto.
    • The python wrapper includes a wrapper for the game engine actions (in source/tools/clients/python/zero_ad/ I have not added support for all the game engine commands yet and have mostly been using movement and attack commands, myself.

This being said, I wanted to make a revision with what I currently have completed so I could get feedback. It also could be useful to have the foundational components integrated (features in this PR) then add other features in subsequent revisions.

Let me know what you think!

Test Plan

There is currently an example script demonstrating a lot of different supported actions which can be used to aid in manual testing. It still may be useful to add more test scripts for getting unit health and some of the other capabilities of the interface and wrapper.

Diff Detail

rP 0 A.D. Public Repository
Lint OK
No Unit Test Coverage
Build Status
Buildable 9353
Build 15503: arc lint + arc unit

Event Timeline

irishninja created this revision.Aug 20 2019, 2:46 AM
Stan awarded a token.Aug 20 2019, 7:21 AM
Stan edited reviewers, added: Restricted Owners Package; removed: Stan.Aug 20 2019, 7:25 AM

Hello and thanks for the patch ! I'm not a programmer so I won't be able to review it fully. I will make some general comments though, I'll have a closer look today :)


You might want to make that config option :)


Remove dead code.

After a second look at it RL code should probably be moved to libraries


Any reason for this type and not u32/u16 ?


range based for loop ?


No braces for single line statements see


avoid auto when you can :)


range based for loop ?




we have a weird convention for this. -1 indent.




Avoid eval calls, we have a way to make empty objects @elexis or @Itms will tell you more.


use cpp style casts like you did above.


same here about auto.


Anything you can forward declare here ?


I guess it's pretty straightforward, but if you want to add such comments you can use the doxygen style :)

Hey, thanks for contributing this on Phabricator :)

Few things first:

  • yes, this will need to be behind a feature flag I think.
  • I wouldn't version control the generated files but rather write a script to generate them when building workspaces. (also it would be easier to read on phabricator)

Implementation-wise, I have one high level comment for now: I don't think we want game configs to be hardcoded in C++ . It's not performance-sensitive, and I would much rather pass it as JSON and keep the engine mostly agnostic of it. Did you have a particular reason to not pass it as a string, like you did template data?
(template data is one thing where a gRPC-specific channel might matter enough...)

What about the test results from the patch, and the feasibility of machine learning in 0ad?
The example AI can move an army but not much more? What is it's incentive (how is it training?)?
What is the prospect of the feature?

Stan added a comment.Aug 20 2019, 11:35 PM

Maybe some more insights can be found here

I personally think this can be a great feature, and could maybe be interesting outside the scope of 0 A.D. it's the second time people try to develop such a feature. It would probably make things like Open AI easier to implement.

Thanks for the feedback! I can move it to be behind a feature flag and remove the generated files.

I hardcoded the game configs because I started from the code to start a scenario from the command line and then refactored it so I could use basically the same logic when I receive a protobuf message (and then I wasn't copying code from creating the game config from the CLI but sharing the logic to create the JSON). It also made sure that defaults were the same, too, so there would hopefully be minimal surprises.

I can certainly move this logic to the python side (that is, wrap the game config like I did with actions and send them as a string). On the positive side, other games using pyrogenesis would also be able to use the python wrapper as long as they provided their own ScenarioConfig class. Let me know if you prefer this approach and I can update the diff.


So far, I have made it configurable via the command line (like --rpc-server= Is that what you meant or did you have something else in mind?


I have to remove these curly braces to match the coding conventions (note to self). I am used to a different style guide :)


Not particularly. I can update it.


I don't think it is possible since this is using the autogenerated cpp files from the protobuf (it uses commands->actions(i) rather than commands->actions[i] so the "array" of actions is actually a functions which deserializes the given action from an index). Otherwise, I would definitely prefer it!


Good point. I can update it

Stan added inline comments.Aug 21 2019, 8:00 AM

I had something else in mind. All the default options are in a config file called default.cfg in the binaries/data/config folder.
this way you can override them in your user.cfg

See this example for loading the variable in js code and adding an option in the game menu

Here is an example (likely not the best) for doing it in cpp


Ah right, I always suggest it, then look after at the code XD

As @Stan said, there is more discussion on the PR on GitHub. The goal of this contribution is to provide the fundamental underlying capabilities for exploring RL/ML agents in 0 AD (such as an OpenAI gym environment which can actually be implemented on top of this quite easily). I am intentionally not adding any action/observation spaces or reward as expected by OpenAI gym as these are not obvious and there are many different candidates for each of these (this is discussed more on the GitHub thread).

That said, defining an OpenAI gym environment is pretty straight-forward and can be done entirely in Python (no need to change 0 AD at all once this is integrated!). A gym environment needs:

  • a conversion from the symbolic game state to a fixed size observation tensor
  • an action space and mapping to 0 AD JSON commands. For example, there could be a discrete action space with 2 actions which could be (semantically) mapped to "attack" and "retreat" actions. When the RL agent takes the first action, it would need to be converted to a valid JSON command which would make the player's units attack (or whatever we want to action to actually do).
  • a function to compute the reward given the interaction (ie, previous state, action, and current state)

It is worth mentioning that there are many different levels of abstraction which could be used for defining both the observation and action spaces. There are also many different types of reward functions which could be used including reward shaping techniques such as using damage dealt, units killed, etc. There has also been some interesting work on companion AI in games that was trained using a reward based on what the other player was able to do (it learned to enable the player to be successful rather than win itself). More information about OpenAI gym spaces can be found at

Anyway, I was hoping to integrate the fundamental (unopinionated) capabilities so that 0 AD could be used to explore the myriad of possibilities that are out there. I am hoping that this will both help researchers as well as make 0 AD more awesome :)

More specific answers to @elexis 's comments are below:

What about the test results from the patch, and the feasibility of machine learning in 0ad?

About the test results, this is likely a mistake on my part due to a lack of familiarity with Phabricator (and the included tools). I am not sure why they were skipped but I can try to find out how to run them.

I am not exactly sure if you are considering feasibility of 0 AD as an environment (ie, can 0 AD be used to train an RL agent) or existing algorithms/compute. As far as feasibility of 0 AD as an environment, 0 AD is great imo as it can be run headlessly and on linux (convenient since linux is such a common platform for ML). This also makes deployment/installation easy as it can be run in a docker container and only requires docker to be installed. Some of the other complex OpenAI gym environments require some tricks with xvfb and other x11 tricks to get it running in a headless linux environment.

As far as feasibility of existing algorithms, there have been some promising results with AlphaStar and OpenAI Five but it is certainly challenging to train an RL agent to play the game completely end-to-end. However, there are simpler integrations of AI that could be fruitful such as learning a high level controller that decides when to attack or what build order to follow.

The example AI can move an army but not much more? What is it's incentive (how is it training?)?

Currently, I have mostly been moving units and attacking others. Commands are sent to the game engine as JSON so anything supported by the game engine should be supported by the RPC interface. That said, I haven't added all of the game engines to the python wrapper (though they certainly can be sent regardless) just yet. As far as the reward signal used during training, this is expected to be implemented on the Python side and is mentioned above.

irishninja updated this revision to Diff 9459.Aug 23 2019, 10:34 PM
irishninja marked 7 inline comments as done and an inline comment as not done.
  • Added --without-rlinterface compile flag
  • Removed generated files
  • Added config option for the rpc server address (both to the default.cfg and to the options.json used in-game)

Some remaining steps:

  • Move the grpc/protobuf code to the libraries/ directory

A couple questions:

  • Should 0 AD be built with the RPC interface by default? Currently, I have the RPC interface enabled by default (but it can be disabled with a flag to premake).
  • Is the general consensus to remove the GameConfig class? I just want to confirm this before I get started on it. If so, I will just handle the scenario configurations on the python side (same as the actions) and send the JSON.
Owners added subscribers: Restricted Owners Package, Restricted Owners Package, Restricted Owners Package.Aug 23 2019, 10:34 PM
Stan added inline comments.Aug 24 2019, 2:41 AM

I wonder if the option should be visible to end user. i guess it's linked to the answe; Should check rpc be by default in pyrogenesis.

irishninja added inline comments.Aug 24 2019, 6:20 AM

Agreed. I had some doubts but saw that the options for some of the lobby settings (which also can be disabled via a flag to premake) were available here, too, so I thought it might be analogous. However, it also might make more sense not to expose this feature via the in-game menu since it is kind of a developer/researcher feature anyway.

Regardless, I figured I would give it a try and get feedback :)


haha, no worries :)

Stan added a comment.Sep 10 2019, 10:25 AM

@wraitii @elexis any changes you'd like to be made for that patch ?

Stan added a comment.Sep 10 2019, 1:19 PM

Maybe you could split the gamesetup → gameconfig change to another patch ?


Should use debug_printf like other calls in that file :)


Still there :)

See discussion on


  • There are two tickets about Gamesetup.cpp autostart options needing refactoring
  • There exist map specific options, so hardcoding gamesettings in C++ is not so ideal (for example random map biome supported by about 20 maps and not supported by about 60). The idea was that one could specify one JS file with JS GUI logic for that setting. There could be a python handler for this specific setting as well. And then the entire thing could be passed as JSON from python. That would mean the *.proto file doesn't have to hardcode the specific settings and is extensible too. Incidentally making this patch shorter (but Gamesetup.cpp still hardcodes the settings. If you want to clean up that file, it can be done independently in the course of the referred tickets in the irclogs)
  • If the ToJSValue function remains, it could become ScriptInterface::ToJSVal (doesn't have to be actually, since the thing is big and grouping it in a class doesnt sound wrong)
  • Adding the feature itself seems still ok, even if there wont be a functional AI at first, since it provides an interface for people interested in ML to implement that. There were the other ML people on the forums, perhaps they could be convinced that this approach is what they want as well? (Bundling the forces, or having them point out features that this diff misses)
  • Maintaining the library is a mess, distributing it with it even more - it sounded okay if it is excluded by default and only tested that it still works once before the release (refs longer release process).
Stan added a comment.Sep 12 2019, 1:57 PM

I wonder if one could hook petra to it as a separate node.js server.

irishninja updated this revision to Diff 9746.Sep 14 2019, 12:29 AM
irishninja edited the test plan for this revision. (Show Details)

Added more actions (construct, gather, train, attack-walk) and an example script showing a lot of these basic actions.