- rP21837: Fix network FSM errors when a client closes the game during the loading screen…
- Trac Tickets
Notice that this message is useful in the ingame state but not elsewhere.
(I know, TLDR, but we wanted reviews and prevent broken code, so forcefeeding proof of correctness to everyone x) )
The NetServer runs in a separate thread, the NetClient in the main thread.
So the FSM error occurs iff. the main thread is destroyed before the NetServer thread I believe.
Maybe the circumstances are more than that.
Reproducing the issue is entirely coupled to luck (it seems to work better when selecting mainland and pressing alt+f4 when it's at 6%). But even when killing at the exact same time of the mapgen process, sometimes it's triggered, other times not.
But this condition is missing whether we like it or not, so we have to add it and adding it makes our clients not trigger FSM errors in the NetServer anymore.
As reported in the ticket, a second way to trigger the error message on the NetServer is to Alt+F4 the client while it rejoins:
ERROR: Net server: Error running FSM update (type=25 state=5) NMT_END_COMMAND_BATCH while NSS_JOIN_SYNCING
NSS_JOIN_SYNCING state specification is here D1556#62860
The server started to or finished transfering the simulation state to deserialize and the client has not finished loading the map and deserializing the state yet.
I couldn't reproduce the issue however.
This leads me to believe that the NetClient just doesn't have chance to send out his message before it's choked off.
Afte 10 attempts I finally could reproduce it.
Generally, I wonder if these messages should be shown like that.
At some point we should also think about the possibility of evil clients killing games by sending broken packets.
The FSM errors could be used to spam errors on the host machine.
(So maybe for that scenario, rather than not showing these error messsages by default, we might think about adding an (possibly configurable) automatic kick if clients cause too many errors, or kick button link on the LOGERROR as proposed for the lagwarnings in D1461.)
Thanks for the review Imarok, good to put this case to a close years later.
- Missing comment: The statement should explanain why it's necessary to begin with. When one assumes that the code works, it's not clear why it's needed. Only archeology has shown that it was introduced by rP14732 to try to flush out the resign just before calling EndGame after clicking on Exit and clicking on Resign in the resulting question (from rP14772). So if this code would work, we should add // Send resign before exiting the game or so.
- Delete code instead? If we can only trigger the FSM error 1 out of 10 times it means that the code only does what it should 1 out of 10 times... Testint to call leaveGame instantly after resignGame in alpha 23 proves that the resign is also sent only 1 out of 10 times. Calling EndGame() and destroying the g_NetClient after posting the network command was the case prior to rP15106. So the latter commit is the third piece of evidence that the patched code doesn't work and can be nuked. See also https://trac.wildfiregames.com/ticket/2420#comment:6.
So are we going for a straight revert of rP14732 rather than fixing a bug in code that is almost never effective?
|99 ↗||(On Diff #6712)|
I suspect that this +2 when not ready yet may even have triggered the Client %d (%s) is ready for turn %d, but expected %d breakpoint in CNetServerTurnManager::NotifyFinishedClientCommands that is reported in #3643 and replaced with a LOGERROR in rP21023 but never understood how it could be triggered.