Page MenuHomeWildfire Games

Fix lobby utf8 conversion following rP22856 / D2264
ClosedPublic

Authored by elexis on Mon, Sep 9, 3:45 PM.

Details

Summary

In rP22856 / D2264, the struct was removed, so that one has the opportunity to create JS::Values directly with arbitrary properties as GUI messages rather than converting properties into a struct with 2 string properties.

This however missed to apply the wstring_from_utf8 conversion, so strings like "ä" are displayed incorrectly on the lobby.

Test Plan

Try removing the utf8_from_wstring from g_L10n.Tranlsate for a string that is known to contain an UTF8 symbol in the pology file and see that the utf8_from_wstring is necessary for translated strings.
See gloox specs to see that gloox in fact uses UTF8 for strings

Make sure that every utf8 conversion is applied where necessary.
Make sure that every string copy is avoided that can be avoided, or find a good reason to copy.
Test all of the GUI messages with two player accounts and a mod account.

Diff Detail

Repository
rP 0 A.D. Public Repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

elexis created this revision.Mon, Sep 9, 3:45 PM
elexis planned changes to this revision.Mon, Sep 9, 3:46 PM

This version includes D2128 and some other unrelated changes that I now may split.

Vulcan added a comment.Mon, Sep 9, 3:49 PM

Build failure - The Moirai have given mortals hearts that can endure.

Link to build: https://jenkins.wildfiregames.com/job/docker-differential/618/display/redirect

elexis added a comment.Mon, Sep 9, 3:49 PM

(+ forgotton unrevisioned file)

Vulcan added a comment.Mon, Sep 9, 3:51 PM

Build failure - The Moirai have given mortals hearts that can endure.

Link to build: https://jenkins.wildfiregames.com/job/vs2015-differential/109/display/redirect

elexis updated this revision to Diff 9699.Tue, Sep 10, 12:11 AM

Update following rP22880, rP22875.

Successful build - Chance fights ever on the side of the prudent.

Link to build: https://jenkins.wildfiregames.com/job/vs2015-differential/121/display/redirect

Successful build - Chance fights ever on the side of the prudent.

Linter detected issues:
Executing section Source...

source/lobby/XmppClient.h
|  24| 
|    | [MAJOR] CPPCheckBear (syntaxError):
|    | Code 'namespaceStunClient{' is invalid C code. Use --std or --language to configure the language.

source/lobby/glooxwrapper/glooxwrapper.h
|  88| namespace·glooxwrapper
|    | [MAJOR] CPPCheckBear (syntaxError):
|    | Code 'namespaceglooxwrapper{' is invalid C code. Use --std or --language to configure the language.
Executing section JS...
Executing section cli...

Link to build: https://jenkins.wildfiregames.com/job/docker-differential/630/display/redirect

The player struct was mentioned during the quick talk with Josh on http://irclogs.wildfiregames.com/2019-09/2019-09-06-QuakeNet-%230ad-dev.log

22:46 < elexis> std::map<std::string, std::vector<std::string> > m_PlayerMap;
22:46 < elexis> I guess that might be more readable as a struct intead of string vector
22:46 < Josh`> Maybe add a flag "sentToGui" or something alongside each player in m_PlayerMap
22:46 < Josh`> Yeah

With regards to using map[foo] vs. map.at(foo), the former inserts an item if it doesn't exist, the latter throws an error if it doesn't exist.
So I would recommend to not insert an item unless it is explicitly what is intended (and that is the case for the m_PlayerMap[nick] lines, but not for the .at lines).

There is also the question whether the new struct should store gloox enums, glooxwrapper strings, std::strings, or translated strings.
Notice that it can't store the translated strings, because the translation is meant to be able to change during the course of running the program.

So the data conversion chain is:

gloox enum -> std::string -> wstring_from_utf8 -> JS::Value

So we see there are four steps involved,
and one has the choice between storing the data in four different formats in fact.

One of these four types has to be used for SPlayer.m_Presence and SPlayer.m_Role.

For performance reasons one could argue that one should do as many conversions once upon setting the value before using it (i.e. rightmost format chosen).
I.e. perhaps storing the JS::Value on the Heap, or if that is slower than converting to a new JS::Value every call, store the std::wstring rather than the std::string!

The other choice is saving the data as close as possible to the original source (i.e. leftmost format chosen).
I.e. when receiving enums, one saves the enums.
Usually this is a good pattern to prefer data integrity (as some conversions may be lossy or for example yield different comparisons, for example two JS::Values storing the same string but not being == due to being different instances might or might not be a problem).

So in this iteration of the patch I had decided for storing enums in this patch.

But remembering how much hassle there is about performance, one should not neglect storing intermediary converted strings instead of creating them often.

Then if we take performance optimization as an actual goal of the patch (primary goal or not), then one has to go down to the nitty gritty stuff, write benchmark tests, and analyze exactly which operations will be produced by the code. -.-

Then one might consider further refactoring the data structures so that it is optimized for looking up the data while still storing the enum instead of the string.

Notice that the data in the SPlayer struct is not translated, those are string identifiers.

So in fact the GetRoleString and GetPresenceString wstring_from_utf8 conversion was not necessary, so we/I can avoid reintroducing it in this patch.

Secondly, if we want to optimize performance, I wonder why every player would have to store a separate std::string instance, each allocating memory.
The string literals are compiled as const char[N], then they are COPIED!!111oneeleven to the output std::string` of GetRoleString`.

But if the function would return a char*, then the std::string conversion can be avoided as well, and one can lookup the already defined char pointer from the stored enum and create the JS::Value from that char* directly.

So it seems like this was a good decision to store the gloox enum, and we can delete two conversions in the chain.

.....

Implemented that, works.

We can also observe that returning const string literals is legal:

String literals have static storage duration, and thus exist in memory for the life of the program.

https://en.cppreference.com/w/cpp/language/string_literal

There was also D1668 proposeding to make the Getters return std::string, but returning const char* for the immutable string literals is even better (skipping std::string and std::wstring creation).

So the only question remaining is whether g_L10n.Translate calls really need the utf8_to_wstring decoding, or whether we can copy directly from the std::string.

To test that, I chose my native language german for the UI and created a GUI message with the following existing translated string:

#: ps/ModIo.cpp:531
#, c-format
msgid "Invalid file. Expected md5 %s, got %s."
msgstr "Ungültige Datei. Erwartete MD5-Prüfsumme %s, jedoch %s erhalten."

That translation contains an ü, so it should break if UTF decoding is necessary.

And certainly, the UTF8 decoding is necessary!!

CreateGUIMessage("system", "error", std::time(nullptr), "text", (g_L10n.Translate("Failed to compute final hash.")));
CreateGUIMessage("system", "error", std::time(nullptr), "text", wstring_from_utf8(g_L10n.Translate("Failed to compute final hash.")));

So seems like </verification>, I will go through the entire thing only three more times.

source/lobby/scripting/GlooxScriptConversions.cpp
26 ↗(On Diff #9699)

ToJSVal<glooxwrapper::string> is the one that is used in many places, the other ones are used in 1-2 places.
(It was considered to add ToJSVal<GlooxTagList> in D1667, but the implementation was unfinished.)

m_Ratingtype:
There's also the question whether glooxwrapper::string is the best type for m_Rating, since it's a number in the average case. But:

  • It can be empty string if Ratings is offline
  • JS code will/would have to be changed, which I would prfer to not do in the same diff (after all this is a regression fix and already changing something)

m_PlayerMap key type: std::map<glooxwrapper::string, SPlayer>:
The question whether std::string or glooxwrapper::string is better.

(First we obtain the nickname from gloox as a glooxwrapper::string.
Then its stored to the map.
Then JS interface look for it after having received some JS string, and GUI messages are created.
Since the conversion from glooxwrapper string to JSString is via wstring_from_utf8(val.to_string()), it would be cheaper to use std::string in the map.
The case where JS looks up data for a nickname, it would either have to create a std::string or a glooxwrapper::string from the given JS string.

So unless we want to specialize FromJSVal<glooxwrapper::string> by duplicating FromJSVal<std::string> or FromJSVal<std::wstring> and calling some utf8 codec,
perhaps we're better of with std::string.

Looking at the code again there are two calls from GetRole(const std::string& nick), GetPresence(const std::string& nick) that don't provide the glooxwrapper::string,
there are more calls in handleMUCParticipantPresence where the glooxwrapper string is given.
GUIGetPlayerList iterates over the map as is and converts a JSVal from that.
So in any case there will be the conversion from JSString <-> utf8_from_wstring <-> std::string <-> glooxwrapper::string.

The advantage of having glooxwrapper::string as map keys intead of std::string:
"name", p.first, can be called without utf8_from_wstring(to_string()) and can rely on ToJSVal<glooxwrapper::string> introduced in this patch.
This is more failsafe, because in every other instance glooxwrapper::strings are also passed and the ToJSVal conversion for that type used.
(Also consistently the map values are glooxwrapper or gloox values)
My stomach wants the glooxwrapper::string.
The right to reconsider is not waived.)

I will go through the entire thing only three more times

Make it 30

source/lobby/XmppClient.cpp
862 ↗(On Diff #9699)
  1. s/nick/newNick
  2. Piecewise construct in place and post a LOGERROR if the item already exists, so that if the item already exists for some unexpected reason, we dont kill the game but also dont let it go unnoticed like was before.
  3. Copying the rating of the previous list item instead of empty string.
940 ↗(On Diff #9699)

Can avoid the second [] operator which compares each key of the map until it found the correct one by saving a reference after the first [] operator call!

(Also the debug string is to be improved)

1047 ↗(On Diff #9699)

comments outdated.

1052 ↗(On Diff #9699)

Here it finds the same item in the map twice, it could just store the result of the first iterator.

source/lobby/scripting/GlooxScriptConversions.cpp
33 ↗(On Diff #9699)

(As mentioned wstring_from_utf8 unneeded, std::string(char*) unneeded as well)

Successful build - Chance fights ever on the side of the prudent.

Link to build: https://jenkins.wildfiregames.com/job/vs2015-differential/129/display/redirect

Successful build - Chance fights ever on the side of the prudent.

Linter detected issues:
Executing section Source...

source/lobby/IXmppClient.h
|  24| namespace·StunClient·{
|    | [MAJOR] CPPCheckBear (syntaxError):
|    | Code 'namespaceStunClient{' is invalid C code. Use --std or --language to configure the language.

source/lobby/glooxwrapper/glooxwrapper.h
|  88| namespace·glooxwrapper
|    | [MAJOR] CPPCheckBear (syntaxError):
|    | Code 'namespaceglooxwrapper{' is invalid C code. Use --std or --language to configure the language.

source/lobby/scripting/JSInterface_Lobby.h
|  26| namespace·JSI_Lobby
|    | [MAJOR] CPPCheckBear (syntaxError):
|    | Code 'namespaceJSI_Lobby{' is invalid C code. Use --std or --language to configure the language.

source/lobby/XmppClient.h
|  24| 
|    | [MAJOR] CPPCheckBear (syntaxError):
|    | Code 'namespaceStunClient{' is invalid C code. Use --std or --language to configure the language.
Executing section JS...
Executing section cli...

Link to build: https://jenkins.wildfiregames.com/job/docker-differential/638/display/redirect

Stan added a subscriber: Stan.Wed, Sep 11, 10:30 AM
Stan added inline comments.
source/lobby/IXmppClient.h
50 ↗(On Diff #9708)

why not a std::string / std::wstring ?

source/lobby/XmppClient.cpp
41 ↗(On Diff #9708)

Intended ?

elexis added inline comments.Wed, Sep 11, 10:37 AM
source/lobby/IXmppClient.h
50 ↗(On Diff #9708)
source/lobby/XmppClient.cpp
41 ↗(On Diff #9708)

Yes, because I uploaded it as a backup and because the DbgXMPP calls must be tested as they are changed too.

elexis added inline comments.Wed, Sep 11, 3:07 PM
source/lobby/scripting/GlooxScriptConversions.cpp
17 ↗(On Diff #9699)

#if CONFIG2_LOBBY (that flag should exclude even more code in fact, but OT)

elexis updated this revision to Diff 9713.Wed, Sep 11, 3:07 PM

Rebase following 22887, 22888

Successful build - Chance fights ever on the side of the prudent.

Link to build: https://jenkins.wildfiregames.com/job/vs2015-differential/132/display/redirect

Successful build - Chance fights ever on the side of the prudent.

Linter detected issues:
Executing section Source...

source/lobby/IXmppClient.h
|  24| namespace·StunClient·{
|    | [MAJOR] CPPCheckBear (syntaxError):
|    | Code 'namespaceStunClient{' is invalid C code. Use --std or --language to configure the language.

source/lobby/glooxwrapper/glooxwrapper.h
|  88| namespace·glooxwrapper
|    | [MAJOR] CPPCheckBear (syntaxError):
|    | Code 'namespaceglooxwrapper{' is invalid C code. Use --std or --language to configure the language.

source/lobby/XmppClient.h
|  24| 
|    | [MAJOR] CPPCheckBear (syntaxError):
|    | Code 'namespaceStunClient{' is invalid C code. Use --std or --language to configure the language.

source/lobby/scripting/JSInterface_Lobby.h
|  26| namespace·JSI_Lobby
|    | [MAJOR] CPPCheckBear (syntaxError):
|    | Code 'namespaceJSI_Lobby{' is invalid C code. Use --std or --language to configure the language.
Executing section JS...
Executing section cli...

Link to build: https://jenkins.wildfiregames.com/job/docker-differential/641/display/redirect

elexis added inline comments.Thu, Sep 12, 11:01 AM
source/lobby/scripting/GlooxScriptConversions.cpp
17 ↗(On Diff #9699)

-> D2284

Notice in the current committed code (prior to that patch) source/lobby/scripting/ is built in even with --without-lobby, so the macro must actually be here until then.

If source/lobby/ becomes dropped on --without-lobby, then this macro will have to be removed again.

elexis edited the test plan for this revision. (Show Details)Thu, Sep 12, 4:01 PM
elexis updated this revision to Diff 9723.Thu, Sep 12, 5:01 PM
elexis edited the test plan for this revision. (Show Details)

I don't even know what I updated, but it's better, and I found #5587.

Hypothetical commit message:

Fix missing wstring_from_utf8 for multi-user-chat messages and translated strings following rP22856 / D2264.

The commit introduced arbitrary lobby CreateGUIMessage JS::Value arguments and falsly removed wstring_from_utf8.
Instead of enlengthening the code by reintroducing utf8_from_wstring everyhwere, shorten the code and specialize ToJSVal for glooxwrapper::string and gloox enums.

Avoid string copies for XmppClient getters, refs D1668:
Change GetPresence() and GetRole() to return the pointer to the string literal instead of copy constructing a std::string each call.
Don't reintroduce the unneeded utf8_from_wstring conversions for the untranslatable ASCII identifiers returned by GetPresence() and GetRole().
Change GetSubject() to return a const ref to the member instead of copy assigning the std::string to an output value.
Change m_Subject to std::wstring to avoid constructing a copy using wstring_from_utf8 each GetSubject() call.
Change gloox enum to translatable string functions to be static, so as to use them in the new ToJSVal functions (enabled by not calling CertificateErrorToString from ConnectionErrorToString anyymore in rP22880/D2274).

Avoid per-player string copies in m_PlayerMap:
Change m_PlayerMap value type from std::vector<std::string> to a new PlayerMap struct with named members for readability.
Use gloox enums as PlayerMap values to avoid constructing (performance) and storing (memory footprint) std::strings.
The JS String is created from the pointer to the ASCII string literal without intermediaries.
Use glooxwrapper::string for nickname and m_Rating since it's the data source type received in relevant places, but that might be improved for performance with std::wstring.

Construct map values in place where possible and post error instead of silently inserting if an existing value is to be modified.
Avoid repeated std::map lookups on string keys by caching the PlayerMap::iterator where available.

Remove some glooxwrapper::string to_string() calls redundant with its operator<< and improve DbgXMPP gloox enum output.
Transfer rating too upon nickchange.

Differential Revision: https://code.wildfiregames.com/D2271
Tested on: clang 8.0.1, Jenkins

Successful build - Chance fights ever on the side of the prudent.

Link to build: https://jenkins.wildfiregames.com/job/vs2015-differential/141/display/redirect

Successful build - Chance fights ever on the side of the prudent.

Linter detected issues:
Executing section Source...

source/lobby/IXmppClient.h
|  24| namespace·StunClient·{
|    | [MAJOR] CPPCheckBear (syntaxError):
|    | Code 'namespaceStunClient{' is invalid C code. Use --std or --language to configure the language.

source/lobby/XmppClient.h
|  24| 
|    | [MAJOR] CPPCheckBear (syntaxError):
|    | Code 'namespaceStunClient{' is invalid C code. Use --std or --language to configure the language.

source/lobby/scripting/JSInterface_Lobby.h
|  26| namespace·JSI_Lobby
|    | [MAJOR] CPPCheckBear (syntaxError):
|    | Code 'namespaceJSI_Lobby{' is invalid C code. Use --std or --language to configure the language.

source/lobby/glooxwrapper/glooxwrapper.h
|  88| namespace·glooxwrapper
|    | [MAJOR] CPPCheckBear (syntaxError):
|    | Code 'namespaceglooxwrapper{' is invalid C code. Use --std or --language to configure the language.
Executing section JS...
Executing section cli...

Link to build: https://jenkins.wildfiregames.com/job/docker-differential/651/display/redirect

So I found an UTF8 dependent STUN segfault described in #5587 while testing this diff. I can reproduce it in my a23 copy, and svn without this commit. Others reported no segfault but that it simply doesnt connect after nickchange.
It seems all the JIDs used in the XmppClient misss proper UTF8 encoding / decoding.
And that issue should be orthogonal to this patch, since this patch only changes the CreateGUIMessage calls and PlayerMap, but nothing relating to the STUN stack or hostJID member.

This revision was not accepted when it landed; it landed in state Needs Review.Thu, Sep 12, 7:24 PM
This revision was automatically updated to reflect the committed changes.