Page MenuHomeWildfire Games

NAT Traversal / STUN UDP Punching
ClosedPublic

Authored by fcxSanya on Apr 20 2017, 8:25 AM.

Details

Reviewers
elexis
Commits
rP19703: STUN + XMPP ICE implementation.
Trac Tickets
#2305
Summary

Uploading the patch here per elexis' request (https://trac.wildfiregames.com/ticket/2305#comment:40), but still proceeding to work on it.

Current state: the patch works with some network configurations, see details here: https://trac.wildfiregames.com/ticket/2305#comment:39

TODO:

  • re-test with multiple 0ad instances on the same machine (host + client and multiple clients) (previously crashed, but should be fixed now)
  • split STK-based code into a separate file
  • change the STUN-server setting (once our own server would be configured)
Test Plan

[1] Windows build note:
glooxwrapper was changed in the patch and it's linked via DLL (--use-shared-glooxwrapper is set in update-workspaces.bat), so either:
a) it can be linked as a static library by removing --use-shared-glooxwrapper
or
b) DLL has to be rebuilt by adding --build-shared-glooxwrapper

Diff Detail

Repository
rP 0 A.D. Public Repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
leper added inline comments.Apr 20 2017, 5:59 PM
source/network/StunClient.cpp
1–4 ↗(On Diff #1371)

No + to be seen there...

1–4 ↗(On Diff #1371)

Depending on how much you expect to change there we could treat it roughly similar to a few of the source/third_party files and keep the different style, which might make sharing fixes with STK easier.

46 ↗(On Diff #1371)

STK code style

87 ↗(On Diff #1371)

Defining pointers to a type, and variables of that type on the same line tends to be confusing to some readers.

264 ↗(On Diff #1371)

rather explain why that is skipped

337 ↗(On Diff #1371)

These 4 lines seem to be used quite a few times, might make sense to unify that.

source/network/StunClient.h
1 ↗(On Diff #1371)

Missing copyright header.

24 ↗(On Diff #1371)

not really, the strange double space before the comment starts should however be replaced by one space.

elexis retitled this revision from [WIP] NAT Traversal to NAT Traversal / STUN UDP Punch though.May 3 2017, 11:26 PM
  • Licensing issue solved for the most part, since all supertuxkart contributors gave their ACK. It should be committed to their repository still.
  • Should be checked whether it compiles --without-lobby

@fcxSanya can you update the patch so we can test it? Getting failed hunks for gamesetup_mp.js (file moved), ScriptFunctions.cpp since the NetTurnManager split in D16 (gamesetup.js still succeeds). It will make it easier for people to test it, which should be priority number one now (besides fixing the bugs found). Here a fixed diff for those who want to test:

Tested this thing with @Grugnas and @user1 and it worked (everyone could host on non-forwarded ports).

@vladislavbelov had compile issues on windows

, however

binaries\system\pyrogenesis.exe : fatal error LNK1120: 11 unresolved externals, i.e.: engine.lib(GameSetup.obj) : error LNK2019: unresolved external symbol "public: bool __thiscall CNetClient::SetupConnection(class CStr8 const &,unsigned short,struct _ENetHost *)" (?SetupConnection@CNetClient@@QAE_NABVCStr8@@GPAU_ENetHost@@@Z) referenced in function "bool __cdecl Autostart(class CmdLineArgs const &)" (?Autostart@@YA_NABVCmdLineArgs@@@Z)

(looks like NetClient.cpp misses the = NULL default argument. Gamesetup.cpp or something)

(23:37:21) Vladislav: elexis: I found what's the problem, it's because windows includes and so on
(23:37:29) Vladislav: i.e. windows has own min/max, etc`

I have identified two SEGFAULTs:
1. MUST FIX: Host with a client that has its port forwarded and join in another window. Reproduced by @user1.

Thread 1 "pyrogenesis" received signal SIGSEGV, Segmentation fault.
parseStunResponse[abi:cxx11](_ENetHost*) (transactionHost=transactionHost@entry=0x0) at ../../../source/network/StunClient.cpp:180
180		int len = recvfrom(transactionHost->socket, buffer, LEN, 0,

Full stack:

  1. May fix later: Join a host, join the same host with another instance:
Thread 1 "pyrogenesis" received signal SIGSEGV, Segmentation fault.
0x0000000000474782 in recvfrom (__addr_len=0x7fffffff88dc, __addr=0x7fffffff8990, __flags=0, __n=2048, __buf=0x7fffffff8ec0, __fd=<error reading variable: Cannot access memory at address 0x0>)
    at /usr/include/x86_64-linux-gnu/bits/socket2.h:76
76	  return __recvfrom_alias (__fd, __buf, __n, __flags, __addr, __addr_len);

source/lobby/XmppClient.cpp
1129 ↗(On Diff #1371)

UNUSED() session

source/lobby/glooxwrapper/glooxwrapper.cpp
288 ↗(On Diff #1371)

make that session UNUSED()

839 ↗(On Diff #1371)

probably make those errors LOGERROR and the others LOGMESSAGE after including CLogger,
so that the user at least sees something while the window appears to be frozen

Also

  • It's totally frozen while trying to connect to the STUN server, would be great to execute some callback or wait to let the renderer thread render
  • Can this become optional? (Can be done in a separate patch) For example Host -> "Do you want to use STUN?" (or something more user friendly)" and a config option dropdown ("Never STUN", "Ask", "Always STUN") where "Ask" is the default
  • Had this unreproducible segfault. Perhaps we can guess how it occured and then reproduce it, for example an unreachable STUN server?
fcxSanya updated this revision to Diff 1654.May 4 2017, 11:46 PM

Merged the latest master from the github repo (rP19507) and fixed Windows build on it (one more #undef was needed: bc492f1)

The corresponding commit is here: 26a631d

fcxSanya added inline comments.May 5 2017, 1:16 AM
build/premake/premake4.lua
598 ↗(On Diff #1371)

It's for SDL_Delay (the only cross-platform sleep method I was able to find in our codebase) in StunClient.cpp and NetServer.cpp, there may be a better alternative, but I didn't find it yet

source/gui/scripting/ScriptFunctions.cpp
400 ↗(On Diff #1371)

It was one of the test changes I made while trying to get the procedure working (see 2ed4795 and #2305#comment:36) and I expected to be able to revert it later. But when I got a reliable working version (where I was able to host on both my mobile connections) and reverted this change I wasn't able to host on one of the connections anymore (while still was able to reliable host on another).
So I'm not sure how it helps (maybe by binding client to the same port as host or by not allowing the client to be bound to some unacceptable port or something else). I will try to change it to serverPort and re-test.

vladislavbelov added inline comments.May 5 2017, 3:24 AM
build/premake/premake4.lua
598 ↗(On Diff #1371)

boost and C++11 have this_thread::sleep_for, but I'm not sure that VS2013 compiles the last one.

fcxSanya edited the summary of this revision. (Show Details)May 7 2017, 8:16 AM
fcxSanya edited the test plan for this revision. (Show Details)
fcxSanya updated this revision to Diff 1768.May 8 2017, 4:17 PM

Refactoring:

  • merge duplicate hole-punching-messages-sending code (1c0666d + 936d3b8)
  • replace duplicate code blocks with a loop (29d5116)
  • replace SDL_Delay with std::this_thread::sleep_for in StunClient.cpp, get rid of sdl dependency for the 'network' project (ee4e8b1)

and a fix:

  • send hole-punching messages on client only when stun is enabled (660cba1)

Note: I didn't test this version yet, since proper testing with two devices / connections is time-consuming, I plan to accumulate more changes and then re-test everything together.

fcxSanya added inline comments.May 8 2017, 4:40 PM
build/premake/premake4.lua
598 ↗(On Diff #1371)

boost and C++11 have this_thread::sleep_for, but I'm not sure that VS2013 compiles the last one

Changed it to C++11'th this_thread::sleep_for in ee4e8b1, seems to compile fine in VS2013. Thanks for the hint!

source/gui/scripting/ScriptFunctions.cpp
421–426 ↗(On Diff #1371)

going to replace the copy-pasted statements with a loop

Fixed in 29d5116

source/network/NetServer.cpp
1430 ↗(On Diff #1371)

We do have quite similar code in the scriptinterface, is that duplicated

Yes, the duplicate code in ScriptFunctions.cpp was one of the expiremental changes (see #2305#comment:37) which I done in the fastest way possible, since I wasn't sure whether they will stay or I will revert them later.

it would be nice to put all of that in one place so we don't have the same include/undef combination multiple times

Done in 1c0666d + 936d3b8.

1436–1441 ↗(On Diff #1371)

going to replace the copy-pasted statements with a loop

Fixed in 29d5116

echotangoecho added inline comments.
source/network/StunClient.cpp
61 ↗(On Diff #1768)

should be GetFromBuffer.

89 ↗(On Diff #1768)

struct isn't needed here in c++, is it?

91 ↗(On Diff #1768)

for consistency, sizeof(hints) might be better.

96 ↗(On Diff #1768)

use nullptr here instead of NULL? (and in the other cases below).

99 ↗(On Diff #1768)

Use debug_warn instead of printf? or what do we use generally in those cases?

143 ↗(On Diff #1768)

spaces after commas

42 ↗(On Diff #1371)

should be AddUInt16 instead of addUint16

Since Feature Freez is May 28th, i.e. in 14 days, we must get that segfault, unused warnings, potentially the appearance freeze fixed and the patch committed asap (assuming we want player feedback and other devs to work on improvements until CF).

Still getting those unused warnings when compiling, fix them by adding UNUSED() to the header, remove the result variable:

../../../source/lobby/XmppClient.cpp: In member function ‘virtual void XmppClient::SendStunEndpointToHost(StunClient::StunEndpoint, const string&)’:
../../../source/lobby/XmppClient.cpp:1111:7: warning: unused variable ‘result’ [-Wunused-variable]
  bool result = session.sessionInitiate(ipStr, stunEndpoint.port);
       ^~~~~~
../../../source/lobby/XmppClient.cpp: In member function ‘virtual void XmppClient::handleSessionAction(gloox::Jingle::Action, glooxwrapper::Jingle::Session*, const glooxwrapper::Jingle::Session::Jingle*)’:
../../../source/lobby/XmppClient.cpp:1114:99: warning: unused parameter ‘session’ [-Wunused-parameter]
 void XmppClient::handleSessionAction(gloox::Jingle::Action action, glooxwrapper::Jingle::Session *session, const glooxwrapper::Jingle::Session::Jingle *jingle)
                                                                                                   ^~~~~~~
JSInterface_Lobby.cpp
NetServerTurnManager.cpp
Linking lobby
StunClient.cpp
==== Building glooxwrapper (release) ====
glooxwrapper.cpp
NetClientTurnManager.cpp
../../../source/lobby/glooxwrapper/glooxwrapper.cpp: In member function ‘virtual void glooxwrapper::SessionHandlerWrapper::handleSessionActionError(gloox::Jingle::Action, gloox::Jingle::Session*, const gloox::Error*)’:
../../../source/lobby/glooxwrapper/glooxwrapper.cpp:293:62: warning: unused parameter ‘action’ [-Wunused-parameter]
  virtual void handleSessionActionError(gloox::Jingle::Action action, gloox::Jingle::Session* session, const gloox::Error* error)
                                                              ^~~~~~
../../../source/lobby/glooxwrapper/glooxwrapper.cpp:293:94: warning: unused parameter ‘session’ [-Wunused-parameter]
  virtual void handleSessionActionError(gloox::Jingle::Action action, gloox::Jingle::Session* session, const gloox::Error* error)
                                                                                              ^~~~~~~
../../../source/lobby/glooxwrapper/glooxwrapper.cpp:293:123: warning: unused parameter ‘error’ [-Wunused-parameter]
  virtual void handleSessionActionError(gloox::Jingle::Action action, gloox::Jingle::Session* session, const gloox::Error* error)
                                                                                                                           ^~~~~
../../../source/lobby/glooxwrapper/glooxwrapper.cpp: In member function ‘virtual void glooxwrapper::SessionHandlerWrapper::handleIncomingSession(gloox::Jingle::Session*)’:
../../../source/lobby/glooxwrapper/glooxwrapper.cpp:298:61: warning: unused parameter ‘session’ [-Wunused-parameter]
  virtual void handleIncomingSession(gloox::Jingle::Session* session )
                                                             ^~~~~~~

I've addressed a number of remarks and going to add more right now:
https://github.com/elexis1/0ad/tree/stk-stun-2-a22

For example there should be a checkbox in gamesetup_mp.xml that enables / disables STUN usage.

binaries/data/config/default.cfg
381 ↗(On Diff #1768)

The description should explain the non-technical people (those who can't setup their router) what this setting does.

binaries/data/mods/public/gui/gamesetup/gamesetup.js
2000 ↗(On Diff #1371)

)\n {

binaries/data/mods/public/gui/gamesetup_mp/gamesetup_mp.js
27 ↗(On Diff #1768)
/**
  * IP address of the
334 ↗(On Diff #1768)

let

binaries/data/mods/public/gui/lobby/lobby.js
733 ↗(On Diff #1371)

removeRatingFromNick(hostPlayerName)

source/gui/scripting/ScriptFunctions.cpp
400 ↗(On Diff #1371)

Perhaps the segfault I got when trying to join a game on port 20595 if that port is forwarded is related to this place.

All remarks besides the segfaults and htonl have been addressed in https://github.com/elexis1/0ad/tree/stk-stun-2-a22 / https://github.com/AlexanderOlkhovskiy/0ad/pull/1

source/gui/scripting/ScriptFunctions.cpp
380 ↗(On Diff #1768)

Does that even work in theory and practice independent of the host using STUN?

27 ↗(On Diff #1371)

JSInterface_Network and JSInterface_Game would definitely make this nicer to read.

Using to_be32 is nice to dislodge those includes, but further complication such support isn't the cleanest thing either

source/lobby/glooxwrapper/glooxwrapper.cpp
323 ↗(On Diff #1371)

404

source/network/NetServer.h
137 ↗(On Diff #1371)

and most occurances in the file are CStr

source/network/StunClient.cpp
1–4 ↗(On Diff #1371)

When comparing it to the original, we see that a lot of logic has changed, so can't really keep it as is (which would have had the advantage of pulling stk changes directly)
https://github.com/supertuxkart/stk-code/blob/6a578e374fc2fe45191111ad0d81a45262b473e3/src/network/protocols/get_public_address.cpp

85 ↗(On Diff #1371)

debug_printf -> LOGERROR / LOGMESSAGERENDER

87 ↗(On Diff #1371)

and that asterisk goes to the struct

fcxSanya updated this revision to Diff 1925.May 14 2017, 4:55 PM
fcxSanya edited the summary of this revision. (Show Details)
fcxSanya added inline comments.May 14 2017, 5:48 PM
source/gui/scripting/ScriptFunctions.cpp
380 ↗(On Diff #1768)

Does that even work in theory and practice independent of the host using STUN?

In theory if STUN is disabled on either client or host (or on both), the client should connect to the 'standard' lobby-detected host IP + host-reported port just like it's doing without the patch (see joinSelectedGame in lobby.js).

I didn't test test it on practice though.

source/lobby/glooxwrapper/glooxwrapper.cpp
867 ↗(On Diff #1371)

If those are arbitrary identifiers, "component_1" might be nicer to read than "1"

I guess I used the example values from XEP-0176 (see Table 2 in section 5.3), which are in turn based on SDP attributes from RFC 5245 (section 15.1), where component-id is defined as 1*5DIGIT

fcxSanya updated this revision to Diff 1932.May 14 2017, 9:34 PM

Attempt to bind STUN-enabled client to a couple of subsequent ports if the default one is occupied (see d3eb265).

Presumable should fix (but not tested yet) the crashes reported by @elexis in D364#17065 and D364#17075.

found some more (minor) style issues.

source/network/StunClient.cpp
151 ↗(On Diff #1932)

unneeded struct

187 ↗(On Diff #1932)

another unneeded struct

191 ↗(On Diff #1932)

unneeded struct

199 ↗(On Diff #1932)

unneeded struct

205 ↗(On Diff #1932)

can merge these if statements here

Lionkanzen rescinded a token.
fcxSanya updated this revision to Diff 2181.EditedMay 25 2017, 12:44 PM
fcxSanya edited the summary of this revision. (Show Details)

Add an explicit "hostUsername" param into the lobby game record instead of parsing the first player name (see ed7ccf8)

Edit: also, I tested this version with my two mobile connections and I'm still able to connect, so no regressions due to recent changes

elexis retitled this revision from NAT Traversal / STUN UDP Punch though to NAT Traversal / STUN UDP Punching.May 31 2017, 5:13 AM

Add an explicit "hostUsername" param into the lobby game record instead of parsing the first player name (see ed7ccf8)

Good, that was a hacky workaround, likely wrong in some case.

Edit: also, I tested this version with my two mobile connections and I'm still able to connect, so no regressions due to recent changes

That commit was also good because it incidentally fixed the wrong [0] I had added there before the `splitRatingFromNick' function was added that needed it :P

source/lobby/XmppClient.h
126 ↗(On Diff #2181)

Notice half of the member functions start with lowercase.

135 ↗(On Diff #2181)

virtual. Renamed to handleSessionInitiation since this only handles that event.

source/lobby/glooxwrapper/glooxwrapper.cpp
295 ↗(On Diff #2181)

(This could become useful once we implement asynchroneous calls)

298 ↗(On Diff #2181)

(This occurs for the host if someone attempts to join the game via ICE+STUN but we have nothing to do here as the connection to the STUN server is already registered)

315 ↗(On Diff #2181)

The engine should be agnostic of 0 A.D. (it should be usable for other games, f.e. for 500 AD). I could find no use or need for this Plugin and propose to delete it.

816 ↗(On Diff #2181)

Appears correct to own the wrapped object as it's not used otherwise and needs to be deleted after usage.

874 ↗(On Diff #2181)

network should be a number

source/network/StunClient.cpp
1–2 ↗(On Diff #2181)

Guess we can't do much about this out-of-place header.

elexis accepted this revision.May 31 2017, 7:24 AM

Patch ready to be committed from my side if https://github.com/AlexanderOlkhovskiy/0ad/pull/2 implementing the changes mentioned below is merged. Request review if you want to discuss something.

Previous work:
The patch was developed within the last 18 months (3 release cycles).
On trac we see reviews since from 8 months ago.
Withing the last 4 months the code by fcxSanya was tested by him and his collegue, me, user1, Vladislav, Grugnas, javiergodas and/or reviewed by echotangoecho with the STUN server stun.ekiga.net.
The relicensing of the StunClient of supertuxkart was authorized by all eight relevant supertuxkart developers.

Recent developments:

Infrastructure:
lobby.wildfiregames.com using the ejabberd STUN listen module up and running (thanks implodedok for helping)
The ejabberd XMPP server comes with native STUN support implementing rfc5245 and is configured by adding one line in ejabberd config, forwarding the port and adding a DNS record.

Code:
The code was developed to work with rfc3489 (the draft of rfc5245) and tested with the Vovida.org 0.96 STUN server at stun.ekiga.net that is complient with that draft too.
Hence a handful of bytes in the ejabberd message are not supported by the StunClient from the supertuxkart Google Summer of Code 2013 project.
This is fixed by three tiny commits:

rfc5245 support (2 lines of code each):

  • Refactoring to support iteration over multple attributes (required because ejabberd sends the SOFTWARE version first) by simply skipping irrelevant attributes.
  • IP & port "deobfuscation" (XOR-MAPPED-ADDRESS, which just applies a XOR with the magic cookie).
  • Parsing of two bits was added (comprehension-required and IETF Review, because XOR-MAPPED-ADDRESS becomes comprehension-required with rfc5245)

Documentation and Verification:

  • Proof of correctness of the STUN client is offered by documenting protocol constants and protocol logic with specific references to the RFCs (f.e. previous code skipped bytes for no transparent reason).
  • Hardcoded numbers were replaced with protocol constants and variables were renamed for readability.
  • With the network analysis tool Wireshark, STUN messages can bit dissected bit per bit with description refering to the RFCs, making the correctness observable. This screenshot shows the three differences between the two server responses mentioned above:

Bugfixes:

  • GUI error instead of a segfault if the STUN server is unreachable.
  • GUI error instead of silently hosting without STUN if the STUN connection failed (by returning a pointer or null upon error)
  • Don't freeze for exactly 33 minutes if the servername exists but doesn't respond
  • Show a human readable error strings instead of error codes. Removed debug and user IP address output.

Small improvements:

  • As it requires XMPP-ICE, hosting without the lobby can't be supported and hence isn't offered anymore.
  • To prevent confusion of non-technical users, offer the choice to use or not use STUN only to the host and let clients connect via STUN exactly if the host uses STUN.
  • Make STUN server port as configurable as the STUN server hostname

Things to consider in a review:

  • network/hostbyte order changes in the deobfuscation
  • Memory leaks

Future work:

  • IPv6 support
  • Asynchroneous calls instead of freezing the GUI until the connection was established or rejected

Thanks a lot for the project!

This revision is now accepted and ready to land.May 31 2017, 7:24 AM
elexis added inline comments.May 31 2017, 6:04 PM
source/gui/scripting/ScriptFunctions.cpp
399 ↗(On Diff #2181)

(100ms works well for me and reduces the impact of the freeze-join problem drastically, can become a config number trivially, got some 5 line commit for that)

Stan added a subscriber: Stan.May 31 2017, 6:08 PM
Stan added inline comments.
source/network/StunClient.cpp
1–2 ↗(On Diff #2181)

Update the date maybe ?

This revision was automatically updated to reflect the committed changes.