Page MenuHomeWildfire Games

Fix PercentMapExplored OOS
ClosedPublic

Authored by elexis on Jun 11 2017, 9:34 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Sep 13, 5:01 PM
Unknown Object (File)
Fri, Sep 6, 10:53 AM
Unknown Object (File)
Sat, Aug 31, 6:21 AM
F154213: commands_danubius.txt
Jun 12 2017, 5:19 AM
F154219: debugpatch
Jun 12 2017, 5:19 AM
Subscribers

Details

Summary

As reported in #4598, there is an OOS on rejoin on a tiny anatolian plateau map.
The reason for that seems to be that ExploreTerritories() doesn't exclude tiles that are out of world, whereas all the other places do exclude these tiles.

Test Plan

Add the following debug output to observe that the values are in sync before but not after this function:

		debug_printf("GetPercentMapExplored %d\n", GetPercentMapExplored(1));
		debug_printf("GetUnionPercentMapExplored %d\n", GetUnionPercentMapExplored({1}));

Start a networked game on anatolian plateau on a tiny mapsize.
Rejoin before second 30 and wait until second 30 passed. Without the patch there is an OOS, with the patch it should not occur.

Diff Detail

Repository
rP 0 A.D. Public Repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Build is green

Updating workspaces.
Build (release)...
Build (debug)...
Running release tests...
Running cxxtest tests (306 tests)..................................................................................................................................................................................................................................................................................................................OK!
Running debug tests...
Running cxxtest tests (306 tests)..................................................................................................................................................................................................................................................................................................................OK!
Checking XML files...

http://jw:8080/job/phabricator/1539/ for more details.

Executing section Default...
Executing section Source...
Executing section JS...
Executing section XML GUI...
Executing section Python...
Executing section Perl...

http://jw:8080/job/phabricator_lint/179/ for more details.

How the OOS occurs:

The OOS on rejoin comes from the fact that ExploreTerritories is called on init and marks tiles outside of the world as explored (in both m_LosState and the cached number of explored tiles m_ExploredVertices) and
that a rejoined client calls ResetDerivedData upon deserialization which will init the cache only with explored tiles inside the world.
Since m_ExploredVertices is only used for the statistics tracker and hadn't been serialized until then, it never became OOS before the summary screen graph patch.
Since m_LosState is serialized, it contained wrong data for both rejoined and non-rejoined client in sync.

Why the attached patch is not complete:
Imarok has shown that with the attached replay that there is still a discrepancy between what GetPercentMapExplored and GetUnionPercentMapExplored returns:

This is because GetPercentMapExplored includes explored tiles outside of the world whereas GetUnionPercentMapExplored counts only those inside of the world.
Because ResetDerivedData equally ignores tiles out of the world, the OOS can't be fixed with the proposed patch.
The discrepancy simply arises from the inner loop of the ExploreTerritories function which was an oversight.

Why the updated patch is correct and complete:
By printing out the number of explored tiles per player m_ExploredVertices and the non-cached counts in m_LosState we can find the place where the discrepancy occurs and
it's the inner loop as mentioned above. See this debug patch:


Since every other write access to m_LosState checks for the location validity, the patch is complete (with regards to these two variables at least).

Blame:
rP9951 implements player territory exploration in a function named UpdateTerritoriesLos. It was forgotton to add a check, but it didn't cause bugs.
rP13576 implements caching of explored vertices. It didn't add the missing check from the previous commit either and caused different map exploration ratios to be seen for a rejoined client.
(rP15612 implements fog of war and removes the call to UpdateTerritoriesLos temporarily.)
(rP15681 implements ExploreMap option, renamed the function to ExploreTerritories and used it again.)

Add the missing check in the inner loop.

Build is green

Updating workspaces.
Build (release)...
Build (debug)...
Running release tests...
Running cxxtest tests (306 tests)..................................................................................................................................................................................................................................................................................................................OK!
Running debug tests...
Running cxxtest tests (306 tests)..................................................................................................................................................................................................................................................................................................................OK!
Checking XML files...

http://jw:8080/job/phabricator/1540/ for more details.

Executing section Default...
Executing section Source...
Executing section JS...
Executing section XML GUI...
Executing section Python...
Executing section Perl...

http://jw:8080/job/phabricator_lint/180/ for more details.

This looks good, I can test it home tonight. Nice catch!

I believe the off-world-ness is coupled either with the territory tiles, either the vision tiles, so one of the two added checks is either redundant either wrong (depending on whether the correct type of tile to check is bigger or smaller).

In D630#25748, @Itms wrote:

I believe the off-world-ness is coupled either with the territory tiles, either the vision tiles, so one of the two added checks is either redundant either wrong (depending on whether the correct type of tile to check is bigger or smaller).

The first if statement

  • alone isn't sufficient as proven by the replay.
  • is already included in the second if. But if it's logically correct, skipping those nested loops would be good for performance.
  • were wrong if one of these tiles would be partially in the world and partially out of the world.
  • LosIsOffWorld loops over m_TerrainVerticesPerSide and i and j loop over cmpTerritoryManager->GetTerritoryGrid() but I don't understand their units yet. So no hypothesis based upon code.
  • The correctness of the first if-statement can be disproven by showing coordinates that would not be skipped with the second if-statement but in the first.
							if (LosIsOffWorld(i, j) && !LosIsOffWorld(ti, tj))
								LOGERROR("Correctness of the first loop is disproven");
This error was shown 4045 times with Imaroks danubius replay.
So the first if statement is wrong.

The second loop alone is still demonstrably correct and sufficient by the same means as before:
The above debug patch complains if some tile out of world is explored. This is the case with the current svn code if there is a CC close to the map border. It does not throw an error anymore if we keep only the second loop.

Also inline those 2 variables.

Build is green

Updating workspaces.
Build (release)...
Build (debug)...
Running release tests...
Running cxxtest tests (306 tests)..................................................................................................................................................................................................................................................................................................................OK!
Running debug tests...
Running cxxtest tests (306 tests)..................................................................................................................................................................................................................................................................................................................OK!
Checking XML files...

http://jw:8080/job/phabricator/1541/ for more details.

Executing section Default...
Executing section Source...
Executing section JS...
Executing section XML GUI...
Executing section Python...
Executing section Perl...

http://jw:8080/job/phabricator_lint/181/ for more details.

This looks good and it builds and runs on Windows. I didn't look into reproducing the OOS though, I can do that later unless @Imarok beats me to it ?

OOS is fixed.
I tested it on some maps, and it seems, the cache works now.

(By calling CheckExploredVertices from the debug patch we can verify that no tile off world has been explored (correctness) and we can check for completeness by reading the code looking for m_LosState write access, checking whether those coordinates have been tested against LosIsOffWorld prior)

This revision was automatically updated to reflect the committed changes.