Page MenuHomeWildfire Games

Hierarchical Pathfinder out of bounds bug
ClosedPublic

Authored by temple on Sep 22 2017, 9:33 PM.

Details

Summary

As I was watching the replay from #4787 a second time, I experienced lag at a certain point and thought that might've happened the first time too. So I watched it a third time and indeed there was a big lag spike:

There didn't seem to be anything too unusual happening at that point in the game:

turn 1376 500
cmd 2 {"type":"walk","entities":[1771,1792,1800,1824,1857,1890,1894,1895,1902,1903,1917,1918,1922,1964,1965,1981,1982,1992,2003,2004,2006,2046,2047,2048,2067,2068],"x":732.886474609375,"z":344.46600341796875,"queued":false}
cmd 2 {"type":"walk","entities":[1771,1792,1800,1824,1857,1890,1894,1895,1902,1903,1917,1918,1922,1964,1965,1981,1982,1992,2003,2004,2006,2046,2047,2048,2067,2068],"x":768,"z":437.0575866699219,"queued":false}
end
hash-quick 4202c46eaf6fe44c65077769c0484f0c
turn 1377 500
cmd 1 {"type":"gather","entities":[1790,2034,2035,2036,2037,2038,2039,2040,2041],"target":177,"queued":false}
cmd 2 {"type":"walk","entities":[1771,1792,1800,1824,1857,1890,1894,1895,1902,1903,1917,1918,1922,1964,1965,1981,1982,1992,2003,2004,2006,2046,2047,2048,2067,2068],"x":768,"z":458.6866760253906,"queued":false}
cmd 2 {"type":"walk","entities":[1771,1792,1800,1824,1857,1890,1894,1895,1902,1903,1917,1918,1922,1964,1965,1981,1982,1992,2003,2004,2006,2046,2047,2048,2067,2068],"x":764.3375854492188,"z":437.4991760253906,"queued":false}
cmd 2 {"type":"walk","entities":[1771,1792,1800,1824,1857,1890,1894,1895,1902,1903,1917,1918,1922,1964,1965,1981,1982,1992,2003,2004,2006,2046,2047,2048,2067,2068],"x":757.325927734375,"z":436.5977478027344,"queued":false}
end
hash-quick f1acad3b8598f8306513a3e47ccc3683

But actually on both turns there was a command to walk to the very right edge of the map, x = 768. And this turned out to be the problem.

When we're exactly on the top edge or right edge of a region, the computed navcell in RegionNearestNavcellInGoal() is outside the bounds of m_Regions:

	// Calculate the navcell that contains the center of the goal.
	int gi = (goal.x >> Pathfinding::NAVCELL_SIZE_LOG2).ToInt_RoundToNegInfinity();
	int gj = (goal.z >> Pathfinding::NAVCELL_SIZE_LOG2).ToInt_RoundToNegInfinity();

The x and z coordinates have a 16-bit fractional part, and the chunk size is 96, so on average this will only happen one out of every 96 * 2**16 = 6 million times. But on small and large maps, the map size is an exact multiple of 96, so it happens whenever you click beyond the top or right edge and it sends the max coordinate = map size. (It's possible to get coordinates less than 0 or greater than the map size if you aren't too greedy; I haven't explored why it doesn't send 0 or the map size in those cases.)

When you go out of bounds on the right side of the array it wraps around to the left, so you get sane-looking values. And when you go out of bounds on the top it continues into the next region (since the regions are created right after each other), so you again can get sane-looking values rather than randomness. Thus sometimes it will say you can reach this navcell when actually you can't.

Then MakeGoalReachable() will say you can reach the goal, but the while loop in ComputeJPSPath in the long pathfinder will take a long time looking at a lot of different tiles and won't actually get there. Thus the huge lag spike.

Once units get to the edge of the map after the initial lag spike, they repeat the impossible calculation over and over again so it can easily bring a game to a grinding halt or crash.

(There might be similar bugs in PathGoal::NavcellContainsGoal() and PathGoal::NavcellRectContainsGoal() which are used in the long pathfinder, occurring when you're exactly on the top edge or right edge of a navcell, but I don't know how much trouble they can cause, if any.)

Test Plan

I checked the other instances of m_Regions[j][i] and they all seemed to have sane inputs.

Diff Detail

Repository
rP 0 A.D. Public Repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

temple created this revision.Sep 22 2017, 9:33 PM

To clarify a little, there's already a check if (!goal.RectContainsGoal(x0, z0, x1, z1)) before this is called, so we already know 0 <= i,j <= CHUNK_SIZE. It's just the rare occasion when i or j = CHUNK_SIZE that there's a problem. The max part in my patch is unnecessary but I guess I was being extra cautious.

wraitii wanna try to find the hidden bug or commit it if you don't find any but tried hard?

I have some changes there in D53... Not sure if that was to address the same bug.

Lionkanzen added a subscriber: Lionkanzen.
mimo accepted this revision.Oct 2 2017, 7:22 PM
mimo added a subscriber: mimo.

Thanks for the patch. Tested that indeed we have this systematic out-of-bounds on small maps, which is fixed by the patch.
But as the max part is not needed, i'll remove it while committing.

This revision is now accepted and ready to land.Oct 2 2017, 7:22 PM
This revision was automatically updated to reflect the committed changes.