Page MenuHomeWildfire Games

Unknown Player has rejoined (Broadcast calls should mention their adressed client states explicitly)
ClosedPublic

Authored by elexis on Dec 27 2016, 4:10 PM.

Details

Summary

The Broadcast function of the NetServer sends to all clients that are in the gamesetup or ingame state, avoiding those in the loading screen, resulting in missing player-assignment updates when another client starts a rejoin while the current client is generating the map.

The first proposed patch extends the Broadcast function by an argument specifying the addressed client states.
The second diff solves the reported defect by accounting for clients in the loading state when refreshing the player assignments.

Test Plan
  1. Get a unix distribution (so that simultaneous file reads are possible)
  2. Start hosting a game with late observers enabled (see options).
  3. Open the application in two more windows.
  4. Click on "Multiplayer -> Join Game" and fill in two different playernames and the correct IP.
  5. Click on join in one window, then quickly in the other window.

Result: One window will end up with a "Unknown Player has rejoined" message and throw errors each time that client sends a chat message.
Expected Result: The playername should be contained in the rejoin message. Chat mesages should be presented with the correct username and not error out.

test_Net.h could be extended to cover such a case, but would have to control the client state (via something like the non-existant GetState in the commented out code of test_Net.h), because we can't rely on the map loading time (race condition).

Diff Detail

Repository
rP 0 A.D. Public Repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

elexis updated this revision to Diff 46.Dec 27 2016, 4:10 PM
elexis retitled this revision from to Unknown Player has rejoined (Broadcast calls should mention their adressed client states explicitly).
elexis updated this object.
elexis edited the test plan for this revision. (Show Details)
elexis set the repository for this revision to rP 0 A.D. Public Repository.
Vulcan added a subscriber: Vulcan.Dec 27 2016, 4:28 PM

Build has FAILED

Build (release)...
../../../source/gui/CChart.cpp:38:41: warning: unused parameter ‘Message’ [-Wunused-parameter]
 void CChart::HandleMessage(SGUIMessage& Message)
                                         ^
../../../source/lib/tex/tex_png.cpp: In member function ‘virtual Status TexCodecPng::encode(Tex*, DynArray*) const’:
../../../source/lib/tex/tex_png.cpp:309:9: warning: variable ‘ret’ might be clobbered by ‘longjmp’ or ‘vfork’ [-Wclobbered]
  Status ret = ERR::FAIL;
         ^
Build (debug)...
../../../source/gui/CChart.cpp:38:41: warning: unused parameter ‘Message’ [-Wunused-parameter]
 void CChart::HandleMessage(SGUIMessage& Message)
                                         ^
Running debug tests...
Build (release)...
../../../source/simulation2/components/CCmpUnitMotion.cpp:350:45: warning: unused parameter ‘paramNode’ [-Wunused-parameter]
  virtual void Deserialize(const CParamNode& paramNode, IDeserializer& deserialize)
                                             ^
../../../source/simulation2/components/CCmpUnitMotion.cpp:350:71: warning: unused parameter ‘deserialize’ [-Wunused-parameter]
  virtual void Deserialize(const CParamNode& paramNode, IDeserializer& deserialize)
                                                                       ^
../../../source/simulation2/components/CCmpUnitMotion.cpp: In instantiation of ‘void CCmpUnitMotion::SerializeCommon(S&) [with S = ISerializer]’:
../../../source/simulation2/components/CCmpUnitMotion.cpp:347:28:   required from here
../../../source/simulation2/components/CCmpUnitMotion.cpp:314:26: warning: unused parameter ‘serialize’ [-Wunused-parameter]
  void SerializeCommon(S& serialize)
                          ^
Build (debug)...
../../../source/simulation2/components/CCmpUnitMotion.cpp:350:45: warning: unused parameter ‘paramNode’ [-Wunused-parameter]
  virtual void Deserialize(const CParamNode& paramNode, IDeserializer& deserialize)
                                             ^
../../../source/simulation2/components/CCmpUnitMotion.cpp:350:71: warning: unused parameter ‘deserialize’ [-Wunused-parameter]
  virtual void Deserialize(const CParamNode& paramNode, IDeserializer& deserialize)
                                                                       ^
../../../source/simulation2/components/CCmpUnitMotion.cpp: In instantiation of ‘void CCmpUnitMotion::SerializeCommon(S&) [with S = ISerializer]’:
../../../source/simulation2/components/CCmpUnitMotion.cpp:347:28:   required from here
../../../source/simulation2/components/CCmpUnitMotion.cpp:314:26: warning: unused parameter ‘serialize’ [-Wunused-parameter]
  void SerializeCommon(S& serialize)
                          ^
Running debug tests...
./test_dbg: error while loading shared libraries: libmozjs38-ps-debug.so: cannot open shared object file: No such file or directory
Build (release)...
In file included from ../../../source/simulation2/components/CCmpPathfinder_Common.h:39:0,
                 from ../../../source/simulation2/components/CCmpPathfinder.cpp:25:
../../../source/simulation2/helpers/LongPathfinder.h:266:15: error: ‘mutex’ in namespace ‘std’ does not name a type
  mutable std::mutex m_Mutex;
               ^
make[1]: *** [obj/simulation2_Release/CCmpPathfinder.o] Error 1
make[1]: *** Waiting for unfinished jobs....
make: *** [simulation2] Error 2
make: *** Waiting for unfinished jobs....
Build (release)...
Build (debug)...
Running debug tests...
./test_dbg: error while loading shared libraries: libmozjs38-ps-debug.so: cannot open shared object file: No such file or directory
Build (release)...
Build (debug)...
Running debug tests...
./test_dbg: error while loading shared libraries: libmozjs38-ps-debug.so: cannot open shared object file: No such file or directory

Link to build: http://jw:8080/job/phabricator/25/
See console output for more information: http://jw:8080/job/phabricator/25/console

elexis updated this revision to Diff 47.Dec 27 2016, 5:44 PM
elexis removed rP 0 A.D. Public Repository as the repository for this revision.

Reupload to test the compilation again.

Imarok accepted this revision.Dec 28 2016, 11:27 PM
Imarok edited edge metadata.

But don't forget the actual fix for the issue

This revision is now accepted and ready to land.Dec 28 2016, 11:27 PM
elexis planned changes to this revision.Dec 29 2016, 3:51 PM

As stated on trac,

  • The Kick part should use Broadcast too.

As stated in irc,

  • I should lookup whether im doing unneeded copies there by not adding the ampersand.
  • When testing, Imarok could only reproduce the error when sending chat messages, but never received the "has rejoined" message with the affected client. That might be a race condition (some client rejoining too fast), might be explored next time when looking closely at that snippet.

Also

  • Wanted to ensure that D16 doesn't need a rebase, since echotangoecho is currently unavailable and if I rebase the patch, one (respectively two) other reviewers were needed, so might commit D16 prior to this one.
source/network/NetServer.cpp
352 ↗(On Diff #47)

Such sequential If-statements should be merged.

With an increasing number of clients, Broadcast induces repeated serialization and might be performance-optimized as pointed out by Sandarac.

source/network/NetServer.cpp
347 ↗(On Diff #47)
elexis edited edge metadata.Dec 29 2016, 8:07 PM
elexis set the repository for this revision to rP 0 A.D. Public Repository.
elexis updated this revision to Diff 83.Dec 30 2016, 5:55 PM
elexis marked 2 inline comments as done.
elexis updated this object.
elexis edited the test plan for this revision. (Show Details)

Use const ref, Broadcast for kick information messages, merge two If statements.

This revision is now accepted and ready to land.Dec 30 2016, 5:55 PM
elexis requested a review of this revision.Dec 30 2016, 5:56 PM
elexis edited edge metadata.

Thanks for the review and testing.

Luckily D16 and this patch have no merge conflicts, independent of the order they are applied.

In D17#482, @elexis wrote:
  • The Kick part should use Broadcast too.

Done

  • I should lookup whether im doing unneeded copies there by not adding the ampersand.

Const ref added.

Merged those two if's.

The proposal to send rejoin, chat and OOS messages to clients in the loading screen (as is done with the kick/ban messages already) can be done separately, to be easily reviewable after this cleanup commit. The pause message could be fixed in the same go (as it seems pausing while someone is in the loading screen doesn't set the state correctly, and there was this pausing FSM error too #4261.

source/network/NetServer.cpp
852 ↗(On Diff #47)

Actual bugfix to the originally reported "Unknown Player has rejoined" / broken chat message issue is going to be inserted here and won't be part of this proposed cleanup.

1090 ↗(On Diff #47)

Missing space, ha!

1090 ↗(On Diff #47)

Proposing the introduction of NSS_JOIN_SYNCING so that players receive chat messages that were sent while the player was in the loading screen. Needs new FSM transition.

1106 ↗(On Diff #47)

Looks like this change would fix #3199 Clicking "not ready" just before the host starts the game causes a network fsm error, but actually doesn't as the previous code also avoided NSS_JOIN_SYNCING.

1243 ↗(On Diff #47)

This silently changed entry fixes an actual error, refs #3199:

ERROR: Net client: Error running FSM update (type=17(NMT_REJOINED) state=7(NCS_JOIN_SYNCING))

Reproduce:

  1. Start a game in a networked host with late observers enabled
  2. Open two more instances of 0 A.D., open the join tab, enter the local ip and two distinct playernames
  3. Click on join in the first window. Wait until the progress bar has reached > 90%.
  4. Join in the second window.

The first client will have finished the rejoin when the second client still syncrhonizing, throwing the mentioned error.

Proposing to add NCS_JOIN_SYNCING to the broadcast call and supplement the missing FSM transition to display rejoin messages to players who just rejoined - in a separate commit and to present our thorough thought xd


When testing, Imarok could only reproduce the error when sending chat messages, but never received the "has rejoined" message with the affected client. That might be a race condition (some client rejoining too fast), might be explored next time when looking closely at that snippet.

It might be the second client finishing the rejoin before the first client does, instead of the first one (who started rejoining a bit sooner), thus the Server would not send the broken rejoin message to the first client.

1332 ↗(On Diff #47)

Consider a player rejoining while each existing client is waiting for everyone to finish the loading screen.

The fact that joining while the game is loading was silently ruled out in r16945 late observers may or may not be considered relevant. Perhaps we do want to support that again, so that late observers don't need to wait for the loading screen of those that just started the game.

source/network/NetServer.h
173 ↗(On Diff #47)

Added const ref.

source/network/NetTurnManager.cpp
679 ↗(On Diff #47)

We should likely send the OOS message to clients that are in the loading screen, so the GUI message arrives when they finish the load, informing the player that the game is messed up.

(In fact that message could be sent too when players rejoin a game that is already OOS, but out of scope of this differential)

Build has FAILED

Link to build: http://jw:8080/job/phabricator/54/
See console output for more information: http://jw:8080/job/phabricator/54/console

elexis updated this revision to Diff 87.Dec 30 2016, 7:41 PM
elexis edited edge metadata.

Reupload of the first patch and update.

elexis updated this revision to Diff 88.Dec 30 2016, 7:48 PM
elexis marked 2 inline comments as done and an inline comment as not done.

Two whitespace fixes.

Build is green

Updating workspaces.
Build (release)...
../../../source/gui/CChart.cpp:38:41: warning: unused parameter ‘Message’ [-Wunused-parameter]
 void CChart::HandleMessage(SGUIMessage& Message)
                                         ^
../../../source/lib/tex/tex_png.cpp: In member function ‘virtual Status TexCodecPng::encode(Tex*, DynArray*) const’:
../../../source/lib/tex/tex_png.cpp:309:9: warning: variable ‘ret’ might be clobbered by ‘longjmp’ or ‘vfork’ [-Wclobbered]
  Status ret = ERR::FAIL;
         ^
Build (debug)...
../../../source/gui/CChart.cpp:38:41: warning: unused parameter ‘Message’ [-Wunused-parameter]
 void CChart::HandleMessage(SGUIMessage& Message)
                                         ^
Running debug tests...
Running cxxtest tests (302 tests)..............................................................................................................................................................................................................................................................................................................OK!

http://jw:8080/job/phabricator/58/ for more details.

Build is green

Updating workspaces.
Build (release)...
../../../source/gui/CChart.cpp:38:41: warning: unused parameter ‘Message’ [-Wunused-parameter]
 void CChart::HandleMessage(SGUIMessage& Message)
                                         ^
../../../source/lib/tex/tex_png.cpp: In member function ‘virtual Status TexCodecPng::encode(Tex*, DynArray*) const’:
../../../source/lib/tex/tex_png.cpp:309:9: warning: variable ‘ret’ might be clobbered by ‘longjmp’ or ‘vfork’ [-Wclobbered]
  Status ret = ERR::FAIL;
         ^
Build (debug)...
../../../source/gui/CChart.cpp:38:41: warning: unused parameter ‘Message’ [-Wunused-parameter]
 void CChart::HandleMessage(SGUIMessage& Message)
                                         ^
Running debug tests...
Running cxxtest tests (302 tests)..............................................................................................................................................................................................................................................................................................................OK!

http://jw:8080/job/phabricator/59/ for more details.

Imarok accepted this revision.Jan 5 2017, 1:38 PM
Imarok edited edge metadata.
This revision is now accepted and ready to land.Jan 5 2017, 1:38 PM
Imarok updated this object.Jan 5 2017, 4:05 PM
Imarok edited edge metadata.
Imarok updated the Trac tickets for this revision.
elexis added inline comments.Jan 25 2017, 6:19 PM
source/network/NetServer.cpp
1243 ↗(On Diff #47)

Correction, the described error only occurs when broadcasting to NSS_JOIN_SYNCING, which wasn't the case before. At least I can't reproduce it otherwise.
It would have to be a very unlikely edge case where a client in the loading screen NSS_PREGAME is targetted by the broadcast, but the player switches its state to NCS_JOIN_SYNCING just before receiving the message.
This could only happen if two clients finish the rejoin simultaneously within a margin of a second, I guess.

Adding the state should still be done.

elexis added inline comments.Jan 25 2017, 8:06 PM
source/network/NetServer.cpp
1053 ↗(On Diff #88)

Removal of NSS_PREGAME correct, since rejoiners are skip that state, as seen in CNetServerWorker::OnAuthenticate.

1243 ↗(On Diff #47)

Correction, the rare edge can't occur either as clients in the loading screen are only in NSS_PREGAME if they didn't rejoin, if they did, they are transfered to NSS_JOIN_SYNCING in the if (isRejoining) part of CNetServerWorker::OnAuthenticate.

elexis changed the visibility from "All Users" to "Public (No Login Required)".Dec 20 2018, 9:56 AM