As I was watching the replay from #4787 a second time, I experienced lag at a certain point and thought that might've happened the first time too. So I watched it a third time and indeed there was a big lag spike:
There didn't seem to be anything too unusual happening at that point in the game:
turn 1376 500 cmd 2 {"type":"walk","entities":[1771,1792,1800,1824,1857,1890,1894,1895,1902,1903,1917,1918,1922,1964,1965,1981,1982,1992,2003,2004,2006,2046,2047,2048,2067,2068],"x":732.886474609375,"z":344.46600341796875,"queued":false} cmd 2 {"type":"walk","entities":[1771,1792,1800,1824,1857,1890,1894,1895,1902,1903,1917,1918,1922,1964,1965,1981,1982,1992,2003,2004,2006,2046,2047,2048,2067,2068],"x":768,"z":437.0575866699219,"queued":false} end hash-quick 4202c46eaf6fe44c65077769c0484f0c turn 1377 500 cmd 1 {"type":"gather","entities":[1790,2034,2035,2036,2037,2038,2039,2040,2041],"target":177,"queued":false} cmd 2 {"type":"walk","entities":[1771,1792,1800,1824,1857,1890,1894,1895,1902,1903,1917,1918,1922,1964,1965,1981,1982,1992,2003,2004,2006,2046,2047,2048,2067,2068],"x":768,"z":458.6866760253906,"queued":false} cmd 2 {"type":"walk","entities":[1771,1792,1800,1824,1857,1890,1894,1895,1902,1903,1917,1918,1922,1964,1965,1981,1982,1992,2003,2004,2006,2046,2047,2048,2067,2068],"x":764.3375854492188,"z":437.4991760253906,"queued":false} cmd 2 {"type":"walk","entities":[1771,1792,1800,1824,1857,1890,1894,1895,1902,1903,1917,1918,1922,1964,1965,1981,1982,1992,2003,2004,2006,2046,2047,2048,2067,2068],"x":757.325927734375,"z":436.5977478027344,"queued":false} end hash-quick f1acad3b8598f8306513a3e47ccc3683
But actually on both turns there was a command to walk to the very right edge of the map, x = 768. And this turned out to be the problem.
When we're exactly on the top edge or right edge of a region, the computed navcell in RegionNearestNavcellInGoal() is outside the bounds of m_Regions:
// Calculate the navcell that contains the center of the goal. int gi = (goal.x >> Pathfinding::NAVCELL_SIZE_LOG2).ToInt_RoundToNegInfinity(); int gj = (goal.z >> Pathfinding::NAVCELL_SIZE_LOG2).ToInt_RoundToNegInfinity();
The x and z coordinates have a 16-bit fractional part, and the chunk size is 96, so on average this will only happen one out of every 96 * 2**16 = 6 million times. But on small and large maps, the map size is an exact multiple of 96, so it happens whenever you click beyond the top or right edge and it sends the max coordinate = map size. (It's possible to get coordinates less than 0 or greater than the map size if you aren't too greedy; I haven't explored why it doesn't send 0 or the map size in those cases.)
When you go out of bounds on the right side of the array it wraps around to the left, so you get sane-looking values. And when you go out of bounds on the top it continues into the next region (since the regions are created right after each other), so you again can get sane-looking values rather than randomness. Thus sometimes it will say you can reach this navcell when actually you can't.
Then MakeGoalReachable() will say you can reach the goal, but the while loop in ComputeJPSPath in the long pathfinder will take a long time looking at a lot of different tiles and won't actually get there. Thus the huge lag spike.
Once units get to the edge of the map after the initial lag spike, they repeat the impossible calculation over and over again so it can easily bring a game to a grinding halt or crash.
(There might be similar bugs in PathGoal::NavcellContainsGoal() and PathGoal::NavcellRectContainsGoal() which are used in the long pathfinder, occurring when you're exactly on the top edge or right edge of a navcell, but I don't know how much trouble they can cause, if any.)