Page MenuHomeWildfire Games

Add RL interface for Reinforcement Learning
Needs ReviewPublic

Authored by irishninja on Aug 20 2019, 2:46 AM.

Details

Reviewers
wraitii
Group Reviewers
Restricted Owners Package(Owns No Changed Paths)
Trac Tickets
#5548
Summary

This revision adds an interface for training reinforcement learning agents in 0 AD. This also includes a python wrapper for interacting with 0 AD including setting scenarios and controlling players in lock step.

The revision was originally written using gRPC but has since been updated to use mongoose (greatly simplifying the build process - thanks, @wraitii!).

An example using this to train RL agents with a variety of different state, action spaces can be found at https://github.com/brollb/simple-0ad-example.

Note: This does not contain an actual OpenAI gym environment as they require a clearly defined action/state space (which it is not obvious what the optimal representation is - or even if there is a single optimal representation). That said, it is easy to create a gym using the code in this revision and there are multiple examples in the beforementioned repository. A longer discussion about this can be found at https://github.com/0ad/0ad/pull/25#pullrequestreview-262056969

Test Plan

There is currently an example script demonstrating a lot of different supported actions which can be used to aid in manual testing. It still may be useful to add more test scripts for getting unit health and some of the other capabilities of the interface and wrapper.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
wraitii added inline comments.May 22 2020, 8:29 PM
source/main.cpp
324

I'd duplicate a little more and put just one #if.

400

As above, I'd duplicate a little more and put juste one #if

549

This seems un-necessary?

Thanks, @wraitii ! I am happy to have a review (and someone else who shares my excitement for this! :)

I have responded to your comments below. Thanks for all the feedback; I will update the revision as soon as I have made the updates!

You've made the point above, but as a data scientist myself, I can say this is a _great_ tool to have for RL and machine learning in general. It could even be useful for us, as I can envision several use-case for making tools in an external interface that contact the RPC server.
The summary seems outdated.

Yeah, it definitely is! I will update it.

My main remark is about the dependencies.
Boost fiber.
It seems used only for the 2 channels classes, which if I understand correctly are merely blocking and non-blocking "pipes".
However, from what I can tell your code is actually mostly synchronous (Step is called, you push commands, the game runs for a frame, state is popped, repeat).
So, from what I can tell, you can quite easily rework this to use a mutex and regular data structures, removing the dependency.

Good idea - that would be a bit simpler!

grpc
While I like the idea of GRPC and it can be convenient, its strength lies mostly in passing structured data. Yet most of what you pass around is strings (and it will likely remain that way forever). So I don't think you actually gain that much.
Further, we already ship Mongoose, an HTTP server, which is used for Profiler2. It's simple to setup and it should be good enough to pass JSON around (in fact Profiler2 does that).
So I think you should:

  • Separate the server routes from the implementation (even with gRPC, it's a good idea).
  • Try to implement a quick rest-like API with mongoose instead of gRPC.

This sounds great. I actually didn't know there was already an HTTP server in 0 AD. This will also make life easier in terms of cross platform support :)


On the python code (I admit it's a somewhat lesser concern):

So, this is actually a bit more complicated because there isn't an obvious choice for the state and action space. So far, I have been working more with a sort of meta-gym which can be used as soon as an action space and state space are defined here: https://github.com/brollb/simple-0ad-example (specifically, here). Admittedly, the repository should probably be renamed since it has more than one simple example :) Anyway, there is a bigger discussion about this on the 0ad PR which is probably worth a read: https://github.com/0ad/0ad/pull/25#pullrequestreview-262043850

It's also tricky when we have the perspective not of trying to reach human performance given the same input (ie, pixels for the game state and keyboard/mouse inputs for actions) but when we are trying to create an agent to use in the game. In the latter case, we would likely want to exploit intelligent state/action representations to make the problem easier for the actual underlying algorithm.

  • The separation of concern isn't great ("closest" in GameState for example). It's not a huge concern and can be addressed later.

Definitely. I got a bit caught up addressing the other comments on this and haven't done a good job of polishing this interface. It is a bit clunky now since it would be nicer to be able to vectorize these sorts of operations and use a library like numpy.


Closing remarks
I'd like to see something as plug-and-play as possible, so I do think we should try and do something clean, and ideally if it can end up being pip-installable for use in OpenAI Gym, it'd be awesome.
Ideally, the install instructions would be reduced to:

apt-get install 0ad
pip install aigym_0ad (or something)

I 100% agree. This would be awesome. Perhaps one way this could be done would be providing a package filled with different environments with given state/action spaces? It would be nice to have more examples/demos before defining a fixed set for a package like this. Either way, I am open to ideas (and collaboration :))!

irishninja updated this revision to Diff 12186.Sat, Jun 6, 7:34 PM
irishninja edited the summary of this revision. (Show Details)

Updated the diff to use mongoose instead of gRPC and cleaned up python API.

irishninja retitled this revision from Add RPC interface for Reinforcement Learning to Add RL interface for Reinforcement Learning.Sat, Jun 6, 7:34 PM
irishninja updated this revision to Diff 12188.Sat, Jun 6, 7:36 PM

Remove empty requirements.txt file

irishninja updated this revision to Diff 12189.Sat, Jun 6, 7:45 PM

Remove old build option, reorder python args, and minor update to the python readme. Python arguments for reset were reordered from reset(config, player_id=1, save_replay=False) to reset(config, save_replay=False, player_id=1) as I suspect it will be more common for users to want to save the replay rather than change the player's ID.

Vulcan added a comment.Sat, Jun 6, 7:46 PM

Build failure - The Moirai have given mortals hearts that can endure.

Link to build: https://jenkins.wildfiregames.com/job/docker-differential/2351/display/redirect

Vulcan added a comment.Sat, Jun 6, 7:47 PM

Build failure - The Moirai have given mortals hearts that can endure.

Link to build: https://jenkins.wildfiregames.com/job/vs2015-differential/1820/display/redirect

Stan added inline comments.Sat, Jun 6, 7:52 PM
build/premake/extern_libs5.lua
206 β†—(On Diff #12188)

I guess that file doesn't need any more changes now :)

build/premake/premake5.lua
593

Do you need the SDL2 ? Also indent :)

source/main.cpp
320

Missing spaces beside =

source/rlinterface/RLInterface.cpp
56

const method?

63

const &?

source/rlinterface/RLInterface.h
23

Probably don't need that anymore

irishninja updated this revision to Diff 12190.Sat, Jun 6, 8:01 PM

Removed some grpc includes that I missed

Vulcan added a comment.Sat, Jun 6, 8:01 PM

Build failure - The Moirai have given mortals hearts that can endure.

Link to build: https://jenkins.wildfiregames.com/job/docker-differential/2352/display/redirect

Vulcan added a comment.Sat, Jun 6, 8:02 PM

Build failure - The Moirai have given mortals hearts that can endure.

Link to build: https://jenkins.wildfiregames.com/job/vs2015-differential/1821/display/redirect

BTW, you've finally done what Ykkrosh described here 10 years ago:

(If it's not much extra effort, it'd be nice to support people doing AI research in other languages (as opposed to making AIs for real players to use). I think that should be fairly easy with IPC (like with the Broodwar proxy): run the AI in a separate process, have it connect to the game engine over a TCP socket, then send and receive data with some kind of JSON-based API that's equivalent to the normal internal JS<->engine interface. That means we just need to document the API and expose a socket, and people can use whatever language they fancy, and we can stay focused on JS.)

Good job on removing the dependencies πŸ‘
I think this makes this much more of a no-brainer to merge. With that being said, I want to get @Itms' opinion on this before merging, and he won't be back before June 20, so we'll still have to wait a little bit.

In the meantime, I've actually tested this :)
I had to apply the following patch:

.

I think you need a rebase, and the ComponentManager changes seem un-necessary.
The test scenario doesn't boot (it's arcadia, not Arcadia now), but with that change it worked nicely.
I quite like the non-intrusiveness of this, as you can actually play a pretty normal game.


Mongoose implementation is quite string-annoying, as you've probably noticed. It seems we already ship a JSON library for atlas: jsonspirit. Maybe you could use that? Starting a spider monkey context just to read/write JSON feels overkill, but the manual implementation is rather annoying.
This would probably let you drop the boost dependency entirely.


I've left a few other comments inline. I think you can cleanup the header file a bit, and add some comments.

As a final remark, I'm wondering if we can find a better name than "RLInterface", since technically it doesn't have to be used for RL. But I don't think it's too bad either.

build/premake/premake5.lua
596

Indentation seems wrong

source/main.cpp
322

I think you can check for the pointer directly below.
Spaces around =

source/rlinterface/RLInterface.cpp
331

Depending on the AI interface for this isn't great.

You can probably mimic the function from C++ however, by calling Simulation2's GetEntitiesWithInterface, and then making GUIInterfaceCalls using the ComponentManager.

source/simulation2/system/ComponentManager.cpp
34 β†—(On Diff #12190)

unnecessary also

73 β†—(On Diff #12190)

not needed from what I can tell

Stan added inline comments.Fri, Jun 19, 8:54 AM
source/main.cpp
485

Can the other params be made const?

488

Can you explain why we need such a loop? Code below uses if

496

I wonder if we need to support installing mods and launching the rl interface at the same time. I guess we do.

source/rlinterface/RLInterface.cpp
56

?

63

const ?

145

const & ?

191

nullptr here and below I guess

258

Missing jsautorequest here and below? @wraitii might know more

source/rlinterface/RLInterface.h
20

I think the std headers go below the game ones

rogermushroom added a subscriber: rogermushroom.EditedSat, Jun 20, 10:17 PM

I pulled from the D2199 branch at commit 19cb13a0fcb186cf79f5055421440e7f0486f27b and when I compiled I see:

../../../source/rlinterface/RLInterface.cpp:109:22: error: variable-sized object may not be initialized
            char buf[bufSize]{};

I added the CXXFLAGS="-Wvla-extension" argument and memset which resolved it

int bufSize = conn->buf_size;
char buf[bufSize];
memset(buf, 0, bufSize*sizeof(char));
mg_read(conn, &buf[0], bufSize);
std::string content(buf);

The compiler version is

Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/4.2.1
Apple clang version 11.0.3 (clang-1103.0.32.62)
Target: x86_64-apple-darwin19.5.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

I've attached the diff

irishninja planned changes to this revision.Sun, Jun 21, 3:42 PM
irishninja marked 34 inline comments as done.

I have made most of the changes and will be updating the revision soon (arcanist is hanging a bit after the rebase). I still need to update the GetGameState function and make it a little less string heavy as mentioned by @wraitii but most other changes should have been applied.

source/main.cpp
324

I removed the compile flag and then removed the #if entirely

549

This is required for running headless and not specifying an initial map (as this is configured by the agent itself).

source/rlinterface/RLInterface.cpp
56

Unfortunately, GetTemplates cannot be const since it acquires m_lock :/

source/rlinterface/RLInterface.h
20

updated for the next update to the revision.

67

Ah, yeah it is missing. I figured that the comment wasn't adding that much and just removed it. If you would like a comment instead, I can certainly add one.

irishninja updated this revision to Diff 12406.Sun, Jun 21, 5:08 PM
irishninja marked 3 inline comments as done.

Fixed various linting/style issues. Rebased to the latest version of 0 AD.

irishninja updated this revision to Diff 12408.Sun, Jun 21, 5:15 PM

Revert extern_libs5 file.

Build failure - The Moirai have given mortals hearts that can endure.

Link to build: https://jenkins.wildfiregames.com/job/vs2015-differential/1961/display/redirect

irishninja updated this revision to Diff 12409.Sun, Jun 21, 5:17 PM

Build failure - The Moirai have given mortals hearts that can endure.

Link to build: https://jenkins.wildfiregames.com/job/vs2015-differential/1962/display/redirect

Build failure - The Moirai have given mortals hearts that can endure.

Link to build: https://jenkins.wildfiregames.com/job/vs2015-differential/1963/display/redirect

Stan added a comment.Sun, Jun 21, 5:25 PM

Some style comments, you might want to wait for @Itms or @wraitii before adressing those.

source/main.cpp
83

Sort it in Alphabetical order :)

473

Do you need it to be a std::string? I'm not sure whether we shouldn't use Cstr instead

672

{ on newline

680

Nuke

source/rlinterface/RLInterface.cpp
222

nullptr

321

no braces

source/rlinterface/RLInterface.h
28

order alphabetically, + newline bewteen special stl headers

76

Is this one used?

irishninja marked 8 inline comments as done.Wed, Jun 24, 11:38 PM

Made the miscellaneous code fixes. Pushing them momentarily!

source/rlinterface/RLInterface.h
76

Nope. Good catch.

irishninja marked an inline comment as done.

Miscellaneous code cleanup. Still need to address the GetGameState comments and the use of jsonspirit rather than strings by @wraitii.

Build failure - The Moirai have given mortals hearts that can endure.

Link to build: https://jenkins.wildfiregames.com/job/vs2015-differential/1986/display/redirect

Build failure - The Moirai have given mortals hearts that can endure.

Link to build: https://jenkins.wildfiregames.com/job/docker-differential/2519/display/redirect

Build failure - The Moirai have given mortals hearts that can endure.

Link to build: https://jenkins.wildfiregames.com/job/macos-differential/887/display/redirect

wraitii added inline comments.Thu, Jun 25, 4:00 PM
source/rlinterface/RLInterface.cpp
56

You could make the lock mutable. This is a common well defined pattern exactly for mutexes. See https://en.cppreference.com/w/cpp/language/cv

irishninja updated this revision to Diff 12470.Sat, Jun 27, 5:30 PM

GetTemplates is now const and removed unused include

Build failure - The Moirai have given mortals hearts that can endure.

Link to build: https://jenkins.wildfiregames.com/job/vs2015-differential/2004/display/redirect

Build failure - The Moirai have given mortals hearts that can endure.

Link to build: https://jenkins.wildfiregames.com/job/docker-differential/2537/display/redirect

Build failure - The Moirai have given mortals hearts that can endure.

Link to build: https://jenkins.wildfiregames.com/job/macos-differential/905/display/redirect

irishninja updated this revision to Diff 12493.Mon, Jun 29, 8:56 PM

Removed sdl linking from premake (only removed the include last time!)

Build failure - The Moirai have given mortals hearts that can endure.

Link to build: https://jenkins.wildfiregames.com/job/vs2015-differential/2017/display/redirect

Build failure - The Moirai have given mortals hearts that can endure.

Link to build: https://jenkins.wildfiregames.com/job/macos-differential/918/display/redirect

Build failure - The Moirai have given mortals hearts that can endure.

Link to build: https://jenkins.wildfiregames.com/job/docker-differential/2550/display/redirect