Page MenuHomeWildfire Games

Fix PercentMapExplored OOS
ClosedPublic

Authored by elexis on Jun 11 2017, 9:34 PM.

Details

Summary

As reported in #4598, there is an OOS on rejoin on a tiny anatolian plateau map.
The reason for that seems to be that ExploreTerritories() doesn't exclude tiles that are out of world, whereas all the other places do exclude these tiles.

Test Plan

Add the following debug output to observe that the values are in sync before but not after this function:

		debug_printf("GetPercentMapExplored %d\n", GetPercentMapExplored(1));
		debug_printf("GetUnionPercentMapExplored %d\n", GetUnionPercentMapExplored({1}));

Start a networked game on anatolian plateau on a tiny mapsize.
Rejoin before second 30 and wait until second 30 passed. Without the patch there is an OOS, with the patch it should not occur.

Diff Detail

Repository
rP 0 A.D. Public Repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

elexis created this revision.Jun 11 2017, 9:34 PM
Vulcan added a subscriber: Vulcan.Jun 11 2017, 10:20 PM

Build is green

Updating workspaces.
Build (release)...
Build (debug)...
Running release tests...
Running cxxtest tests (306 tests)..................................................................................................................................................................................................................................................................................................................OK!
Running debug tests...
Running cxxtest tests (306 tests)..................................................................................................................................................................................................................................................................................................................OK!
Checking XML files...

http://jw:8080/job/phabricator/1539/ for more details.

Executing section Default...
Executing section Source...
Executing section JS...
Executing section XML GUI...
Executing section Python...
Executing section Perl...

http://jw:8080/job/phabricator_lint/179/ for more details.

How the OOS occurs:

The OOS on rejoin comes from the fact that ExploreTerritories is called on init and marks tiles outside of the world as explored (in both m_LosState and the cached number of explored tiles m_ExploredVertices) and
that a rejoined client calls ResetDerivedData upon deserialization which will init the cache only with explored tiles inside the world.
Since m_ExploredVertices is only used for the statistics tracker and hadn't been serialized until then, it never became OOS before the summary screen graph patch.
Since m_LosState is serialized, it contained wrong data for both rejoined and non-rejoined client in sync.

Why the attached patch is not complete:
Imarok has shown that with the attached replay that there is still a discrepancy between what GetPercentMapExplored and GetUnionPercentMapExplored returns:

This is because GetPercentMapExplored includes explored tiles outside of the world whereas GetUnionPercentMapExplored counts only those inside of the world.
Because ResetDerivedData equally ignores tiles out of the world, the OOS can't be fixed with the proposed patch.
The discrepancy simply arises from the inner loop of the ExploreTerritories function which was an oversight.

Why the updated patch is correct and complete:
By printing out the number of explored tiles per player m_ExploredVertices and the non-cached counts in m_LosState we can find the place where the discrepancy occurs and
it's the inner loop as mentioned above. See this debug patch:


Since every other write access to m_LosState checks for the location validity, the patch is complete (with regards to these two variables at least).

Blame:
rP9951 implements player territory exploration in a function named UpdateTerritoriesLos. It was forgotton to add a check, but it didn't cause bugs.
rP13576 implements caching of explored vertices. It didn't add the missing check from the previous commit either and caused different map exploration ratios to be seen for a rejoined client.
(rP15612 implements fog of war and removes the call to UpdateTerritoriesLos temporarily.)
(rP15681 implements ExploreMap option, renamed the function to ExploreTerritories and used it again.)

elexis updated this revision to Diff 2532.Jun 12 2017, 5:20 AM

Add the missing check in the inner loop.

Build is green

Updating workspaces.
Build (release)...
Build (debug)...
Running release tests...
Running cxxtest tests (306 tests)..................................................................................................................................................................................................................................................................................................................OK!
Running debug tests...
Running cxxtest tests (306 tests)..................................................................................................................................................................................................................................................................................................................OK!
Checking XML files...

http://jw:8080/job/phabricator/1540/ for more details.

Executing section Default...
Executing section Source...
Executing section JS...
Executing section XML GUI...
Executing section Python...
Executing section Perl...

http://jw:8080/job/phabricator_lint/180/ for more details.

Itms edited edge metadata.Jun 12 2017, 2:21 PM

This looks good, I can test it home tonight. Nice catch!

I believe the off-world-ness is coupled either with the territory tiles, either the vision tiles, so one of the two added checks is either redundant either wrong (depending on whether the correct type of tile to check is bigger or smaller).

elexis planned changes to this revision.Jun 12 2017, 3:26 PM
In D630#25748, @Itms wrote:

I believe the off-world-ness is coupled either with the territory tiles, either the vision tiles, so one of the two added checks is either redundant either wrong (depending on whether the correct type of tile to check is bigger or smaller).

The first if statement

  • alone isn't sufficient as proven by the replay.
  • is already included in the second if. But if it's logically correct, skipping those nested loops would be good for performance.
  • were wrong if one of these tiles would be partially in the world and partially out of the world.
  • LosIsOffWorld loops over m_TerrainVerticesPerSide and i and j loop over cmpTerritoryManager->GetTerritoryGrid() but I don't understand their units yet. So no hypothesis based upon code.
  • The correctness of the first if-statement can be disproven by showing coordinates that would not be skipped with the second if-statement but in the first.
							if (LosIsOffWorld(i, j) && !LosIsOffWorld(ti, tj))
								LOGERROR("Correctness of the first loop is disproven");
This error was shown 4045 times with Imaroks danubius replay.
So the first if statement is wrong.
elexis updated this revision to Diff 2533.Jun 12 2017, 3:27 PM

The second loop alone is still demonstrably correct and sufficient by the same means as before:
The above debug patch complains if some tile out of world is explored. This is the case with the current svn code if there is a CC close to the map border. It does not throw an error anymore if we keep only the second loop.

Also inline those 2 variables.

Build is green

Updating workspaces.
Build (release)...
Build (debug)...
Running release tests...
Running cxxtest tests (306 tests)..................................................................................................................................................................................................................................................................................................................OK!
Running debug tests...
Running cxxtest tests (306 tests)..................................................................................................................................................................................................................................................................................................................OK!
Checking XML files...

http://jw:8080/job/phabricator/1541/ for more details.

Executing section Default...
Executing section Source...
Executing section JS...
Executing section XML GUI...
Executing section Python...
Executing section Perl...

http://jw:8080/job/phabricator_lint/181/ for more details.

Itms added a comment.Jun 12 2017, 8:29 PM

This looks good and it builds and runs on Windows. I didn't look into reproducing the OOS though, I can do that later unless @Imarok beats me to it ?

Imarok edited edge metadata.Jun 14 2017, 4:06 PM

OOS is fixed.
I tested it on some maps, and it seems, the cache works now.

(By calling CheckExploredVertices from the debug patch we can verify that no tile off world has been explored (correctness) and we can check for completeness by reading the code looking for m_LosState write access, checking whether those coordinates have been tested against LosIsOffWorld prior)

This revision was automatically updated to reflect the committed changes.