Page MenuHomeWildfire Games

Rewrite the pathfinder dirtiness information system
ClosedPublic

Authored by Itms on Jun 24 2017, 12:09 AM.

Details

Summary

The dirtiness information, used to perform clever and efficient updates of the pathfinder, was handled in a very hacky way. Three components work together: the obstruction manager keeps track of small changes that don't warrant a full update, the pathfinder keeps track of terrain updates and performs the clever updates, and the AI manager must perform the same updates on its own pathfinder asynchronously with the pathfinder (the AI manager passes data to its worker once per turn).

The previous system was obfuscated, badly commented and unpractical to maintain. Additionally it included a bug that was triggered during water rising in Extinct Volcano, resulting in an OOS on rejoin (the rejoining player would have the correct passability map whereas the host would suffer from incomplete pathfinder updates).

The new code is cleaner, much easier to understand and to fix in case of bugs (notably this one OOS, which is fixed here). The obstruction manager <-> pathfinder and the pathfinder <-> AI interaction are better compartmentalized and should work regardless of what the modders do with their terrain updates...

Test Plan

Test that the OOS is fixed and no new one appears.
Test the AI pathfinding.
Test saving and loading a game (ideally save at a moment when many buildings are built/destroyed).

Check that the patch doesn't worsen the performance in any situation.

Diff Detail

Repository
rP 0 A.D. Public Repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Itms created this revision.Jun 24 2017, 12:09 AM
Itms updated this revision to Diff 2671.Jun 24 2017, 12:39 AM

Improve a bit by removing the useless persistent state leftover from the old code.
Proper initialization of the AI information state.

Vulcan added a subscriber: Vulcan.Jun 24 2017, 12:58 AM

Build is green

Updating workspaces.
Build (release)...
Build (debug)...
Running release tests...
Running cxxtest tests (306 tests)..................................................................................................................................................................................................................................................................................................................OK!
Running debug tests...
Running cxxtest tests (306 tests)..................................................................................................................................................................................................................................................................................................................OK!
Checking XML files...

http://jw:8080/job/phabricator/1615/ for more details.

Executing section Default...
Executing section Source...
Executing section JS...
Executing section XML GUI...
Executing section Python...
Executing section Perl...

http://jw:8080/job/phabricator_lint/232/ for more details.

Build is green

Updating workspaces.
Build (release)...
Build (debug)...
Running release tests...
Running cxxtest tests (306 tests)..................................................................................................................................................................................................................................................................................................................OK!
Running debug tests...
Running cxxtest tests (306 tests)..................................................................................................................................................................................................................................................................................................................OK!
Checking XML files...

http://jw:8080/job/phabricator/1616/ for more details.

Executing section Default...
Executing section Source...
Executing section JS...
Executing section XML GUI...
Executing section Python...
Executing section Perl...

http://jw:8080/job/phabricator_lint/233/ for more details.

mimo edited edge metadata.Jun 24 2017, 10:49 AM
In D675#26947, @Itms wrote:

Improve a bit by removing the useless persistent state leftover from the old code.
Proper initialization of the AI information state.

What is done exactly? up to now, the AI uses the passability maps "default-terrain-only" and "ship-terrain-only" only at initialization (or when reloading a saved game) to determine its connectivity grids. So it assumes that these maps never change! Is this the case except of course for extinct volcano (but that one has a comment saying that AI won't work on it). If other maps also have changes in these two passability maps, that would break reload of saved games in SP.

Allowing connectivity grids changes inside a game would require a few changes in the AI code as currently every movement in based on them (to know if a point is accessible, if a ship should be used, ...) but nothing impossible (only somebody motivated to do the changes). The biggest problem would be on the performance side: we could think to recompute these grids from time to time (for example if we add bridges in the future), but i think doing it on each turn as would be needed for extinct volcanoes will take too much time (although i've never tried to measure how much time is needed for that).

wraitii edited edge metadata.EditedJun 24 2017, 11:07 AM

Mh, correct me if I'm wrong, but this looks like a performance regression compared to D70.

First updateInformation now needs to performa bitwise or of the whole dirtiness grid vs a single pointer swap, which is bound to be much slower, but you also appear to again call a big "New" every turn (l460 of CCmpPathfinder.cpp).

Might be necessary to fix the OOS, but one should be careful here imo.

Edit: also AIPathfinder is a bit of a misnomer I think, and @mimo we might be able to use the hierarchical pathfinder global regions in the future but yes.

elexis edited edge metadata.Jun 24 2017, 8:03 PM

(If nothing else changed the affected code, he code could be tested by applying it to alpha 21 and running rejointests on replays (of which we have more available than for the most recent svn version).
However after a21, rP19145 and rP19149 changed the code, probably some more commits, so that'll either require some digging or some proper svn replays to test with instead.)

Itms planned changes to this revision.Jun 24 2017, 11:27 PM
In D675#26966, @mimo wrote:

What is done exactly? up to now, the AI uses the passability maps "default-terrain-only" and "ship-terrain-only" only at initialization (or when reloading a saved game) to determine its connectivity grids. So it assumes that these maps never change! Is this the case except of course for extinct volcano (but that one has a comment saying that AI won't work on it). If other maps also have changes in these two passability maps, that would break reload of saved games in SP.

Allowing connectivity grids changes inside a game would require a few changes in the AI code as currently every movement in based on them (to know if a point is accessible, if a ship should be used, ...) but nothing impossible (only somebody motivated to do the changes). The biggest problem would be on the performance side: we could think to recompute these grids from time to time (for example if we add bridges in the future), but i think doing it on each turn as would be needed for extinct volcanoes will take too much time (although i've never tried to measure how much time is needed for that).

The patch doesn't change the data exposed to the AI (by the way if it does, the patch has a bug). Since the new pathfinder is in, the AI does receive connectivity grids updates! You have absolutely nothing to do to recompute the grids, it's already done by the AI pathfinder in the engine. So the performance is not a problem here, the only difficult thing is improving Petra so it handles mid-game changes to the terrain.

In D675#26973, @wraitii wrote:

Mh, correct me if I'm wrong, but this looks like a performance regression compared to D70.

First updateInformation now needs to performa bitwise or of the whole dirtiness grid vs a single pointer swap, which is bound to be much slower, but you also appear to again call a big "New" every turn (l460 of CCmpPathfinder.cpp).

Might be necessary to fix the OOS, but one should be careful here imo.

You're totally right, I must be much more careful. I kind of missed D70. I will reintroduce swaps and cleverer memory handling where it's needed, I don't think it's difficult to get rid of the perf regression.

In D675#26973, @wraitii wrote:

Edit: also AIPathfinder is a bit of a misnomer I think, and @mimo we might be able to use the hierarchical pathfinder global regions in the future but yes.

Why is it a misnomer? The AI manager has a LongPathfinder instance.

Itms updated this revision to Diff 2766.Jun 29 2017, 11:01 PM

This version is slightly cleverer ?

Executing section Default...
Executing section Source...
Executing section JS...
Executing section XML GUI...
Executing section Python...
Executing section Perl...

http://jw:8080/job/phabricator_lint/271/ for more details.

Build is green

Updating workspaces.
Build (release)...
Build (debug)...
Running release tests...
Running cxxtest tests (306 tests)..................................................................................................................................................................................................................................................................................................................OK!
Running debug tests...
Running cxxtest tests (306 tests)..................................................................................................................................................................................................................................................................................................................OK!
Checking XML files...

http://jw:8080/job/phabricator/1671/ for more details.

Blackbox test:
The reported OOS is gone.
Tried a game on a regular map with arbitrary rejoins and got no OOS.
Tried the rejointest on the reported OOS with all kinds of turn numbers and got no OOS.
Got no hash mismatches either with the rejointest on a current (playerless) multiplayer test replay of that scenario map.
Would like to test it in actual svn game (svn lobby empty recently), but from the testing I expect it to work correctly.
(The years in the header of the files are correct)

elexis added a comment.Jul 5 2017, 2:33 PM

Besides, it crashes atlas when starting:

#0  0x00007ffff36c7428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007ffff36c902a in __GI_abort () at abort.c:89
#2  0x00000000009e9246 in sys_display_error (
    text=text@entry=0x7fff7d7c0000 L"Assertion failed: \"dirtinessGrid.compare_sizes(&b.dirtinessGrid)\"\r\nLocation: Grid.h:247 (MergeAndClear)\r\n\r\nCall stack:\r\n\r\n(0x9de32e) /media/elexis/Externe/0ad_svn/dev/trunk/binaries/system/pyrogenesis"..., flags=flags@entry=6) at ../../../source/lib/sysdep/os/unix/unix.cpp:267
#3  0x0000000000971a9b in CallDisplayError (flags=<optimized out>, 
    text=0x7fff7d7c0000 L"Assertion failed: \"dirtinessGrid.compare_sizes(&b.dirtinessGrid)\"\r\nLocation: Grid.h:247 (MergeAndClear)\r\n\r\nCall stack:\r\n\r\n(0x9de32e) /media/elexis/Externe/0ad_svn/dev/trunk/binaries/system/pyrogenesis"...) at ../../../source/lib/debug.cpp:383
#4  debug_DisplayError (suppress=0xdbc000 <GridUpdateInformation::MergeAndClear(GridUpdateInformation&)::suppress__>, func=0xaa5078 <_ZZN21GridUpdateInformation13MergeAndClearERS_E8__func__> "MergeAndClear", 
    line=247, pathname=<optimized out>, lastFuncToSkip=<optimized out>, context=0x7fffe053a580, flags=<optimized out>, 
    description=0x7fffe0539f40 L"Assertion failed: \"dirtinessGrid.compare_sizes(&b.dirtinessGrid)\"") at ../../../source/lib/debug.cpp:474
#5  debug_OnAssertionFailure (expr=expr@entry=0xaa23e8 L"dirtinessGrid.compare_sizes(&b.dirtinessGrid)", 
    suppress=suppress@entry=0xdbc000 <GridUpdateInformation::MergeAndClear(GridUpdateInformation&)::suppress__>, file=file@entry=0xaa2338 L"../../../source/simulation2/helpers/Grid.h", line=line@entry=247, 
    func=func@entry=0xaa5078 <_ZZN21GridUpdateInformation13MergeAndClearERS_E8__func__> "MergeAndClear") at ../../../source/lib/debug.cpp:563
#6  0x00000000004fdaaa in GridUpdateInformation::MergeAndClear (b=..., this=0x7fffd89896a0) at ../../../source/simulation2/helpers/Grid.h:247
#7  CCmpObstructionManager::UpdateInformations (this=0x7fffd8990fe0, informations=...) at ../../../source/simulation2/components/CCmpObstructionManager.cpp:503
#8  0x00000000004d3611 in CCmpPathfinder::UpdateGrid (this=0x7fffd89895f0) at ../../../source/simulation2/components/CCmpPathfinder.cpp:480
#9  0x00000000004b2387 in CComponentManager::BroadcastMessage (this=0x7fffd840add8, msg=...) at ../../../source/simulation2/system/ComponentManager.cpp:1049
#10 0x00000000005260a8 in CCmpWaterManager::SetWaterLevel (this=this@entry=0x7fffd8685240, h=...) at ../../../source/simulation2/components/CCmpWaterManager.cpp:120
#11 0x0000000000740731 in CXMLReader::ReadEnvironment (this=this@entry=0x7fff60a87d20, parent=...) at ../../../source/graphics/MapReader.cpp:727
#12 0x0000000000744476 in CXMLReader::ReadXML (this=0x7fff60a87d20) at ../../../source/graphics/MapReader.cpp:1093
#13 0x000000000074534e in CMapReader::ReadXML (this=0x7fffd84a8300) at ../../../source/graphics/MapReader.cpp:1195
#14 0x000000000073ddc4 in MemFunThunk<CMapReader> (param=0x7fffd85c1f10) at ../../../source/ps/LoaderThunks.h:60
#15 0x000000000060b7c9 in LDR_ProgressiveLoad (time_budget=time_budget@entry=100, description=description@entry=0x7fffe053b580 L"Simulation init", max_chars=max_chars@entry=100, 
    progress_percent=progress_percent@entry=0x7fffe053b57c) at ../../../source/ps/Loader.cpp:228
#16 0x000000000060bb85 in LDR_NonprogressiveLoad () at ../../../source/ps/Loader.cpp:318
#17 0x00000000008a041f in (anonymous namespace)::StartGame (attrs=..., attrs@entry=...) at ../../../source/tools/atlas/GameInterface/Handlers/MapHandlers.cpp:70
#18 0x00000000008a1298 in AtlasMessage::fLoadMap (msg=<optimized out>) at ../../../source/tools/atlas/GameInterface/Handlers/MapHandlers.cpp:165
#19 0x0000000000881b97 in RunEngine (data=<optimized out>) at ../../../source/tools/atlas/GameInterface/GameLoop.cpp:173
#20 0x00007ffff3a636ba in start_thread (arg=0x7fffe053c700) at pthread_create.c:333
#21 0x00007ffff37993dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
elexis added a comment.Jul 6 2017, 3:56 AM

5 r19873 replays, one of which was an extinct volcano game.

Itms updated this revision to Diff 2832.Jul 6 2017, 9:18 PM

Fix elexis' bug in Atlas by adding a check.

Executing section Default...
Executing section Source...
Executing section JS...
Executing section XML GUI...
Executing section Python...
Executing section Perl...

http://jw:8080/job/phabricator_lint/292/ for more details.

Vulcan added a comment.Jul 7 2017, 1:24 AM

Build is green

Updating workspaces.
Build (release)...
Build (debug)...
Running release tests...
Running cxxtest tests (306 tests)..................................................................................................................................................................................................................................................................................................................OK!
Running debug tests...
Running cxxtest tests (306 tests)..................................................................................................................................................................................................................................................................................................................OK!
Checking XML files...

http://jw:8080/job/phabricator/1704/ for more details.

elexis added a comment.Jul 7 2017, 2:46 PM

In case someone wants to test the patch with the actual-game replays with the current (rP19880) svn and without reverting rP19877, revert only rP19874 by using svn merge -c -19874 .
and create the scenario-equivalent map using atlas.
The two replays in that folder from 2017-07-06 are on extinct volcano.

Where did the context hide?

source/simulation2/components/CCmpPathfinder_Common.h
155 ↗(On Diff #2832)

trailing whitespace

Itms added a subscriber: leper.

Hello @leper, I'm sorry for not putting you in the loop earlier, I wrongfully thought you had commented here before.

Would you accept taking a look at the code and comment on the memory usage and the things wraitii pointed out before going awol? That would be great and lift the last block to a commit freeze. If you don't have time please tell me asap. Sorry again for the rushy review request.

leper edited edge metadata.Jul 10 2017, 2:34 AM
In D675#28796, @Itms wrote:

Hello @leper, I'm sorry for not putting you in the loop earlier, I wrongfully thought you had commented here before.

Would you accept taking a look at the code and comment on the memory usage and the things wraitii pointed out before going awol? That would be great and lift the last block to a commit freeze. If you don't have time please tell me asap. Sorry again for the rushy review request.

A diff with context would be nice, apart from that dunno.

Itms updated this revision to Diff 2885.Jul 10 2017, 2:12 PM

Without trailing whitespace and with context (which doesn't seem to work properly on Windows).

Executing section Default...
Executing section Source...
Executing section JS...
Executing section XML GUI...
Executing section Python...
Executing section Perl...

http://jw:8080/job/phabricator_lint/307/ for more details.

Build is green

Updating workspaces.
Build (release)...
Build (debug)...
Running release tests...
Running cxxtest tests (306 tests)..................................................................................................................................................................................................................................................................................................................OK!
Running debug tests...
Running cxxtest tests (306 tests)..................................................................................................................................................................................................................................................................................................................OK!
Checking XML files...

http://jw:8080/job/phabricator/1730/ for more details.

I'm getting someone to test this on the affected map with the patch applied.

svn (19902) 1v1 with Grugnas on this map WITH this patch applied:

ffffffff added a subscriber: ffffffff.EditedJul 10 2017, 11:30 PM

Patch applied fpre vs Player (NO AI/NO Player) vs Bot vs Bot (no Human players)

good movement

ram good

AI good

Map Mainland

Shouldn't be a lot worse and has (according to others) the benefit of being correct regarding memory usage, apart from that one question regarding the AI interaction which seems slightly wasteful and unnecessarily duplicated.

Might want to profile this, but then again the old code is broken, so I guess there isn't much we can do.

source/simulation2/components/CCmpPathfinder.cpp
525 ↗(On Diff #2885)

This sounds a lot like we should be able to just swap them.

I assume this line is here because we cannot rely on m_DirtinessInformation to still exist like this when we want it in the AIManager. But since we just get this there, then discard it (aka FlushAIPathfinderDirt...) why do we even store it explicitly? If the dirtiness information does not change while the AI is updated with the current state of it then we could just pass m_DirtinessInformation to it and after the AI used it it will probably be invalidated upon the next grid update.

source/simulation2/helpers/Grid.h
98 ↗(On Diff #2885)

Why not memcmp?

105 ↗(On Diff #2885)

Not sure if that is the best name, though empty would also sound somewhat wrong.

129 ↗(On Diff #2885)

Does this actually happen a lot? (Just curious.)

248 ↗(On Diff #2885)

Wouldn't call that move sementics, which is a slightly different thing.

256 ↗(On Diff #2885)

|=

257 ↗(On Diff #2885)

|=

(Trivia:
Setting Map Seed 2924282261, 2 Players, Medium Mapsize, Player 1 Athen Team 1, Player Brit Team 2 in Atlas produces the correct map.
However the TriggerScripts are executed in the inverse order for scenario/skirmish maps for reason I don't know.
I've swapped the order of the TriggerScripts entry in the commands.txt and the contents of the two files (Regicide.js and extinct_volcano_triggers.js), but the heroes are always spawned after those 15 gaia units in the gaia towers on that map, so entity IDs are off.
Changing the scripts itself to do some code earlier or later fails because the RNG changes when executing things.
I can't get it to replay the actual game on a scenario map and spent almost the same time on trying to fix it than playing that game, so I guess pretend the replay has never been uploaded


)

Itms added a comment.Jul 12 2017, 2:51 PM

Thanks a lot for taking a look! I'll address your remarks and if I am right about your comment about swapping, I will commit.

source/simulation2/components/CCmpPathfinder.cpp
525 ↗(On Diff #2885)

I'm not sure I follow your reasoning. The issue is that there can be several grid updates between two AI updates (when the water rises and the static entities are destroyed, typically, but that can happen every time the mods do something funky). So swapping twice in this situation will erase the information from the first grid update and the AI will be partially updated.

That's why we need to merge, but in most situations (actually, each time the AI information is flushed, i.e. each first update in a simulation turn), the code will be clever and perform a swap.

source/simulation2/helpers/Grid.h
98 ↗(On Diff #2885)

Good idea.

105 ↗(On Diff #2885)

I thought about void, empty, null, all of them have other meanings that make them somewhat wrong. What about flat? It sounds like being 2D instead of 3D to me, but it conveys the right meaning.

129 ↗(On Diff #2885)

I don't think it does. I just added the inexpensive check because it was in operator= and it looked sensible.

248 ↗(On Diff #2885)

Ok. "optimized through a careful memory management" ?

256 ↗(On Diff #2885)

Ha, I first tried ||=, which doesn't exist, and I didn't realize |= would work.

leper added inline comments.Jul 12 2017, 10:17 PM
source/simulation2/components/CCmpPathfinder.cpp
525 ↗(On Diff #2885)

Ah, so the issue is that the AI (or rather the AIManager) isn't updated every time the grid is updated.

Seems fine, might be interesting to figure out how often the non-swap case occurs (since that seems potentially somewhat slower).

source/simulation2/helpers/Grid.h
105 ↗(On Diff #2885)

Not sure if that is better than the current one. Also I just noticed that this is the only two-word function name for this class.

Dunno, pick something or leave it like this.

256 ↗(On Diff #2885)

It's a boolean, so | and || (apart from the former not really being allowed to short-circuit (which in this case doesn't matter) are equivalent ;-)

This revision was automatically updated to reflect the committed changes.

The patch was tested in a 1v1v1 after the commit and after the autobuild in rP19918 with borg-, ffffffff and mapkoc.
The latter was OOS (hence some "market exchange warnings in the replay), but that was a missing build.
After doing a clean compile, he could replay the match without errors.
borg- and mapkoc did not rejoin and were not OOS and borg- could replay it entirely without OOS.

I could create the equivalent scenario map (but only after commenting out the GarrisonWoodenTowers call in the triggers script! We should have some convention to not execute any code when loading the script but only after processing the simulation, so that we can always reproduce equivalent scneario maps without changing the code!!)
Could run several rejointests and they all succeeded: