Page MenuHomeWildfire Games

Add RPC interface for Reinforcement Learning
Needs ReviewPublic

Authored by irishninja on Aug 20 2019, 2:46 AM.
Tags
None
Subscribers
Restricted Owners Package, Restricted Owners Package, Restricted Owners Package and 4 others
Tokens
"Like" token, awarded by nani."Love" token, awarded by Stan.

Details

Reviewers
wraitii
Group Reviewers
Restricted Owners Package(Owns No Changed Paths)
Trac Tickets
#5548
Summary

This revision adds an RPC interface (using GRPC) which enables training reinforcement learning agents in 0 AD. This also includes a python wrapper for conveniently interacting with 0 AD including setting scenarios and controlling players in lock step.

Notes about remaining features, etc:

  • Windows support for premake
  • As this adds a new dependency (GRPC), should it be only available as a feature flag? Or should we update the installation instructions? Or should we add a copy of grpc/protobuf to the libraries directory?
  • There are currently some outstanding features:
    • I have not implemented all the features that may be desireable for configuring scenarios (such as ceasefire durations, custom victory conditions). The outstanding options are commented out in source/rlinterface/proto/RLAPI.proto.
    • The python wrapper includes a wrapper for the game engine actions (in source/tools/clients/python/zero_ad/actions.py). I have not added support for all the game engine commands yet and have mostly been using movement and attack commands, myself.

This being said, I wanted to make a revision with what I currently have completed so I could get feedback. It also could be useful to have the foundational components integrated (features in this PR) then add other features in subsequent revisions.

Let me know what you think!

Test Plan

There is currently an example script demonstrating a lot of different supported actions which can be used to aid in manual testing. It still may be useful to add more test scripts for getting unit health and some of the other capabilities of the interface and wrapper.

Diff Detail

Repository
rP 0 A.D. Public Repository
Branch
arcpatch-D2199
Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 10189
Build 17284: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Stan awarded a token.Aug 20 2019, 7:21 AM
Stan edited reviewers, added: Restricted Owners Package; removed: Stan.Aug 20 2019, 7:25 AM

Hello and thanks for the patch ! I'm not a programmer so I won't be able to review it fully. I will make some general comments though, I'll have a closer look today :)

source/main.cpp
475

You might want to make that config option :)

source/rlinterface/proto/RLAPI.proto
74

Remove dead code.

After a second look at it RL code should probably be moved to libraries

source/ps/GameSetup/GameConfig.h
83 ↗(On Diff #9411)

Any reason for this type and not u32/u16 ?

source/rlinterface/RLInterface.cpp
34

range based for loop ?

52

No braces for single line statements see https://trac.wildfiregames.com/wiki/Coding_Conventions

74

avoid auto when you can :)

78

range based for loop ?

85

nullptr

120

we have a weird convention for this. -1 indent.

122

braces

145

Avoid eval calls, we have a way to make empty objects @elexis or @Itms will tell you more.

186

use cpp style casts like you did above.

202

same here about auto.

source/rlinterface/RLInterface.h
45

Anything you can forward declare here ?

67

I guess it's pretty straightforward, but if you want to add such comments you can use the doxygen style :)

Hey, thanks for contributing this on Phabricator :)

Few things first:

  • yes, this will need to be behind a feature flag I think.
  • I wouldn't version control the generated files but rather write a script to generate them when building workspaces. (also it would be easier to read on phabricator)

Implementation-wise, I have one high level comment for now: I don't think we want game configs to be hardcoded in C++ . It's not performance-sensitive, and I would much rather pass it as JSON and keep the engine mostly agnostic of it. Did you have a particular reason to not pass it as a string, like you did template data?
(template data is one thing where a gRPC-specific channel might matter enough...)

What about the test results from the patch, and the feasibility of machine learning in 0ad?
The example AI can move an army but not much more? What is it's incentive (how is it training?)?
What is the prospect of the feature?

Stan added a comment.Aug 20 2019, 11:35 PM

Maybe some more insights can be found here https://github.com/0ad/0ad/pull/25

I personally think this can be a great feature, and could maybe be interesting outside the scope of 0 A.D. it's the second time people try to develop such a feature. It would probably make things like Open AI easier to implement.

Thanks for the feedback! I can move it to be behind a feature flag and remove the generated files.

I hardcoded the game configs because I started from the code to start a scenario from the command line and then refactored it so I could use basically the same logic when I receive a protobuf message (and then I wasn't copying code from creating the game config from the CLI but sharing the logic to create the JSON). It also made sure that defaults were the same, too, so there would hopefully be minimal surprises.

I can certainly move this logic to the python side (that is, wrap the game config like I did with actions and send them as a string). On the positive side, other games using pyrogenesis would also be able to use the python wrapper as long as they provided their own ScenarioConfig class. Let me know if you prefer this approach and I can update the diff.

source/main.cpp
475

So far, I have made it configurable via the command line (like --rpc-server=0.0.0.0:50050). Is that what you meant or did you have something else in mind?

477

I have to remove these curly braces to match the coding conventions (note to self). I am used to a different style guide :)

source/ps/GameSetup/GameConfig.h
83 ↗(On Diff #9411)

Not particularly. I can update it.

source/rlinterface/RLInterface.cpp
34

I don't think it is possible since this is using the autogenerated cpp files from the protobuf (it uses commands->actions(i) rather than commands->actions[i] so the "array" of actions is actually a functions which deserializes the given action from an index). Otherwise, I would definitely prefer it!

source/rlinterface/RLInterface.h
67

Good point. I can update it

Stan added inline comments.Aug 21 2019, 8:00 AM
source/main.cpp
475

I had something else in mind. All the default options are in a config file called default.cfg in the binaries/data/config folder.
this way you can override them in your user.cfg

See this example for loading the variable in js code and adding an option in the game menu https://code.wildfiregames.com/D2176

Here is an example (likely not the best) for doing it in cpp https://code.wildfiregames.com/D1217

source/rlinterface/RLInterface.cpp
34

Ah right, I always suggest it, then look after at the code XD

As @Stan said, there is more discussion on the PR on GitHub. The goal of this contribution is to provide the fundamental underlying capabilities for exploring RL/ML agents in 0 AD (such as an OpenAI gym environment which can actually be implemented on top of this quite easily). I am intentionally not adding any action/observation spaces or reward as expected by OpenAI gym as these are not obvious and there are many different candidates for each of these (this is discussed more on the GitHub thread).

That said, defining an OpenAI gym environment is pretty straight-forward and can be done entirely in Python (no need to change 0 AD at all once this is integrated!). A gym environment needs:

  • a conversion from the symbolic game state to a fixed size observation tensor
  • an action space and mapping to 0 AD JSON commands. For example, there could be a discrete action space with 2 actions which could be (semantically) mapped to "attack" and "retreat" actions. When the RL agent takes the first action, it would need to be converted to a valid JSON command which would make the player's units attack (or whatever we want to action to actually do).
  • a function to compute the reward given the interaction (ie, previous state, action, and current state)

It is worth mentioning that there are many different levels of abstraction which could be used for defining both the observation and action spaces. There are also many different types of reward functions which could be used including reward shaping techniques such as using damage dealt, units killed, etc. There has also been some interesting work on companion AI in games that was trained using a reward based on what the other player was able to do (it learned to enable the player to be successful rather than win itself). More information about OpenAI gym spaces can be found at http://gym.openai.com/docs/#spaces.

Anyway, I was hoping to integrate the fundamental (unopinionated) capabilities so that 0 AD could be used to explore the myriad of possibilities that are out there. I am hoping that this will both help researchers as well as make 0 AD more awesome :)

More specific answers to @elexis 's comments are below:

What about the test results from the patch, and the feasibility of machine learning in 0ad?

About the test results, this is likely a mistake on my part due to a lack of familiarity with Phabricator (and the included tools). I am not sure why they were skipped but I can try to find out how to run them.

I am not exactly sure if you are considering feasibility of 0 AD as an environment (ie, can 0 AD be used to train an RL agent) or existing algorithms/compute. As far as feasibility of 0 AD as an environment, 0 AD is great imo as it can be run headlessly and on linux (convenient since linux is such a common platform for ML). This also makes deployment/installation easy as it can be run in a docker container and only requires docker to be installed. Some of the other complex OpenAI gym environments require some tricks with xvfb and other x11 tricks to get it running in a headless linux environment.

As far as feasibility of existing algorithms, there have been some promising results with AlphaStar and OpenAI Five but it is certainly challenging to train an RL agent to play the game completely end-to-end. However, there are simpler integrations of AI that could be fruitful such as learning a high level controller that decides when to attack or what build order to follow.

The example AI can move an army but not much more? What is it's incentive (how is it training?)?

Currently, I have mostly been moving units and attacking others. Commands are sent to the game engine as JSON so anything supported by the game engine should be supported by the RPC interface. That said, I haven't added all of the game engines to the python wrapper (though they certainly can be sent regardless) just yet. As far as the reward signal used during training, this is expected to be implemented on the Python side and is mentioned above.

irishninja updated this revision to Diff 9459.Aug 23 2019, 10:34 PM
irishninja marked 7 inline comments as done and an inline comment as not done.
  • Added --without-rlinterface compile flag
  • Removed generated files
  • Added config option for the rpc server address (both to the default.cfg and to the options.json used in-game)

Some remaining steps:

  • Move the grpc/protobuf code to the libraries/ directory

A couple questions:

  • Should 0 AD be built with the RPC interface by default? Currently, I have the RPC interface enabled by default (but it can be disabled with a flag to premake).
  • Is the general consensus to remove the GameConfig class? I just want to confirm this before I get started on it. If so, I will just handle the scenario configurations on the python side (same as the actions) and send the JSON.
Owners added subscribers: Restricted Owners Package, Restricted Owners Package, Restricted Owners Package.Aug 23 2019, 10:34 PM
Stan added inline comments.Aug 24 2019, 2:41 AM
binaries/data/mods/public/gui/options/options.json
546 ↗(On Diff #9459)

I wonder if the option should be visible to end user. i guess it's linked to the answe; Should check rpc be by default in pyrogenesis.

irishninja added inline comments.Aug 24 2019, 6:20 AM
binaries/data/mods/public/gui/options/options.json
546 ↗(On Diff #9459)

Agreed. I had some doubts but saw that the options for some of the lobby settings (which also can be disabled via a flag to premake) were available here, too, so I thought it might be analogous. However, it also might make more sense not to expose this feature via the in-game menu since it is kind of a developer/researcher feature anyway.

Regardless, I figured I would give it a try and get feedback :)

source/rlinterface/RLInterface.cpp
34

haha, no worries :)

Stan added a comment.Sep 10 2019, 10:25 AM

@wraitii @elexis any changes you'd like to be made for that patch ?

Stan added a comment.Sep 10 2019, 1:19 PM

Maybe you could split the gamesetup → gameconfig change to another patch ?

source/main.cpp
484

Should use debug_printf like other calls in that file :)

source/ps/GameSetup/GameConfig.h
87 ↗(On Diff #9459)

Still there :)

See discussion on http://irclogs.wildfiregames.com/2019-09/2019-09-11-QuakeNet-%230ad-dev.log

Recap:

  • There are two tickets about Gamesetup.cpp autostart options needing refactoring
  • There exist map specific options, so hardcoding gamesettings in C++ is not so ideal (for example random map biome supported by about 20 maps and not supported by about 60). The idea was that one could specify one JS file with JS GUI logic for that setting. There could be a python handler for this specific setting as well. And then the entire thing could be passed as JSON from python. That would mean the *.proto file doesn't have to hardcode the specific settings and is extensible too. Incidentally making this patch shorter (but Gamesetup.cpp still hardcodes the settings. If you want to clean up that file, it can be done independently in the course of the referred tickets in the irclogs)
  • If the ToJSValue function remains, it could become ScriptInterface::ToJSVal (doesn't have to be actually, since the thing is big and grouping it in a class doesnt sound wrong)
  • Adding the feature itself seems still ok, even if there wont be a functional AI at first, since it provides an interface for people interested in ML to implement that. There were the other ML people on the forums, perhaps they could be convinced that this approach is what they want as well? (Bundling the forces, or having them point out features that this diff misses)
  • Maintaining the library is a mess, distributing it with it even more - it sounded okay if it is excluded by default and only tested that it still works once before the release (refs longer release process).
Stan added a comment.Sep 12 2019, 1:57 PM

I wonder if one could hook petra to it as a separate node.js server.

irishninja updated this revision to Diff 9746.Sep 14 2019, 12:29 AM
irishninja edited the test plan for this revision. (Show Details)

Added more actions (construct, gather, train, attack-walk) and an example script showing a lot of these basic actions.

irishninja updated this revision to Diff 10196.Oct 23 2019, 1:46 AM

Updates:

  • Define ScenarioConfig in Python. Added basic helpers for defining victory conditions as well as more generic setters (set_map_setting, set_game_setting).
  • Refactored RL Server main loop to own method
  • Changed default build to not include RL interface (uses --with-rlinterface)

Next Steps:

  • Make another example demonstrating usage with OpenAI gym. I am not sure if this should be included here or in its own repository...
  • Add windows support (should only be needed for the code generation step in the premake). Any help here would be appreciated as I am a linux user...
  • Write more tests in python for features not covered by the example script. This includes:
    • setting game speed
    • setting map size
    • configuring built-in AI opponent
Stan added a comment.Oct 23 2019, 7:41 AM

Updates:

  • Define ScenarioConfig in Python. Added basic helpers for defining victory conditions as well as more generic setters (set_map_setting, set_game_setting).
  • Refactored RL Server main loop to own method
  • Changed default build to not include RL interface (uses --with-rlinterface)

Nice I had a chat with Itms and we agreed it's better that way since most people won't use it :)

Next Steps:

  • Make another example demonstrating usage with OpenAI gym. I am not sure if this should be included here or in its own repository...

I guess the external code should be provided as a git link or something like all the alternative ais.

  • Add windows support (should only be needed for the code generation step in the premake). Any help here would be appreciated as I am a linux user...

Tell me what you need you can ping me on IRc any time or @tell in case I'm not here :)

  • Write more tests in python for features not covered by the example script. This includes:
    • setting game speed
    • setting map size
    • configuring built-in AI opponent
Stan added inline comments.Oct 23 2019, 1:16 PM
source/main.cpp
326

I guess using interface would be when service!= nullptr which would remove the extra param ?

689

Maybe you could display a nice error message as well ?

Something along the lines of "rpc-server is not available" I believe there is a similar message for atlas.

source/rlinterface/RLInterface.cpp
37

I think we have a specific type for player ids player_id_t or something.

70

Error ?

103

Doxygen.

124

Might need JSAutoRequest pagerq(pcx); @elexis might know.

136

Comments on top missing final .

164

.

177

.

199

Autorequest here to.

source/rlinterface/RLInterface.h
45

Still accurate :)

Usually <> libs are below.

67

Still missing :)

source/simulation2/system/LocalTurnManager.cpp
27

player_id_t ?

source/simulation2/system/LocalTurnManager.h
34

player_id_t ?

irishninja marked 8 inline comments as done.

Miscellaneous code cleanup per Stan's feedback.

Stan added inline comments.Oct 24 2019, 7:38 AM
binaries/data/mods/public/gui/options/options.json
546 ↗(On Diff #9459)

I guess the option can be removed because else it will confuse players if it does nothing

build/premake/premake5.lua
41

Do you need me to translate the below shell commands in windows bash ?

source/rlinterface/RLInterface.cpp
52

Not done :)

irishninja marked an inline comment as not done.Mon, Nov 11, 3:24 PM

As an update, I am currently working on adding tests for the actions and scenario config (using pytest). I have tests for all the actions in zero_ad/actions.py and a number of the different scenario settings. I am currently working on testing random map usage. After finishing the tests, I am planning on training a simple agent as an example demonstrating how it can be used to expose an OpenAI Gym API to 0 AD :) (I will probably do this in an external repo and then just post a link to it.)

Also, thanks again, @Stan for helping out with windows support :)

build/premake/premake5.lua
41

Yeah, that would be awesome and help me out a lot :)

Essentially, I am running the protobuf compiler (protoc) using the C++ plugin and then changing the file extensions from .cc to .cpp so they are detected during the build.

Generating the python files is probably not necessary but it would be nice if it would also work on windows (they are only necessary for building the python client). For the python files, I copy the .proto file to the python project and then build it using python's grpc_tools.protoc.

irishninja added inline comments.Mon, Nov 11, 3:27 PM
source/rlinterface/RLInterface.cpp
52

My bad. I will fix this :)

185

I am guessing I should also remove the curly braces here, too. Right?

Stan added inline comments.Mon, Nov 11, 4:21 PM
source/rlinterface/RLInterface.cpp
185

Yup

irishninja updated this revision to Diff 10306.Tue, Nov 12, 3:42 AM

Added tests for actions and configuring scenarios. I also removed the extra example since it didn't contain anything new as compared to the better one ;)

irishninja updated this revision to Diff 10307.Tue, Nov 12, 4:06 AM

Updated installation/testing instructions in README for the python client and removed the user option for the RL interface address as it must be set on start and may be confusing for users.

nani awarded a token.Tue, Nov 12, 4:21 AM
irishninja updated this revision to Diff 10477.Thu, Dec 5, 10:45 PM

Updated python dependencies and fixed bug when built w/o the RL interface.