Page MenuHomeWildfire Games

Network Dialog / Troll removal device
Changes PlannedPublic

Authored by elexis on Jan 14 2019, 1:34 AM.

Details

Reviewers
wraitii
Group Reviewers
Restricted Owners Package(Owns No Changed Paths)
Restricted Owners Package(Owns No Changed Paths)
Restricted Owners Package(Owns No Changed Paths)
Trac Tickets
#3787
Summary

The network dialog gives hosts full control over the clients they are connected to, for the purpose of moderation, performance control and getting to know each other.

It supports the GeoLite2 database (countryname, optional city and ISP name), comes with country flags, shows hostname, IP, meanRTT, packet loss ratio, adds a button to kick/ban.
There are many functions that can be added on top of it.

This patch is released as a reward for fpre's donation to Wildfire Games. (Symbolic, as writing this patch and the dependent GUI bugfixes in the other revision proposals took about 3 weeks.)

This is the first patch to demonstrate almost full object oriented code in the JS GUI.

Test Plan

A backport to alpha 23 has been tested since some weeks.

To test this revision, either try the github branch directly https://github.com/0ad/0ad/compare/master...elexis1:network_dialog_rebased or

  1. apply the dependent patches
  2. apply this patch
  3. download the GeoLite2 csv files and unzip to public/geolite2/. Only the Country file is needed, the others are optional and memory sinks.
  4. download the country flags from https://github.com/stefangabos/world_countries/ and put them into art/textures/ui/global/icon/flags/

Diff Detail

Repository
rP 0 A.D. Public Repository
Branch
/ps/trunk
Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 6794
Build 11173: Vulcan BuildJenkins
Build 11172: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
elexis added inline comments.Jan 14 2019, 2:08 AM
binaries/data/mods/public/gui/networkreport/CountryFlags.js
7

To be tested against archived mods (public.zip).
See Engine.TextureExists which tests for png and dds variant.

elexis marked an inline comment as done.Jan 14 2019, 2:11 AM
elexis added inline comments.
source/network/GeoLite2.cpp
404

lat/lon currently unused, so can be removed for the commit and added again when I added the uses I have in my mind.

Lefo added inline comments.Jan 14 2019, 9:02 AM
source/network/IPTools.cpp
34

Definitely not. You can simply use getaddrinfo to convert ip address to binary representation. See CreateStunRequest in source/network/StunClient.cpp for reference. It might be a good idea to pass AI_NUMERICHOST to hints.ai_flags. If you want to enforce parsing IPv4 only, pass AF_INET to hints.ai_family.
See also man 3 getaddrinfo and man 7 ip.

Stan added a subscriber: Stan.Jan 14 2019, 9:59 AM

Nice work more pop sound nice and I guess thanks to fpre for the money.
I'm wondering if instead of showing you localhost up it shouldn't show you your public IP.

binaries/data/mods/public/gui/common/network.js
72

Shouldn't that be a constant or better got from the engine itself ?

binaries/data/mods/public/gui/gamesetup/gamesetup.xml
206

Can I help you with that ?

binaries/data/mods/public/gui/networkreport/ClientList.js
5

Config value Maybe ? No strong feeling.

binaries/data/mods/public/gui/networkreport/networkreport.js
19

Can't we inline them ? Or is better to keep all cpp interface calls in one place ?

source/network/GeoLite2.cpp
30

Can't we use forward declaration here ? also isn't that already defined in the header file?

278

Constant maybe.

285

Remove deadcode.

source/network/NetServer.cpp
366

I wonder if we can't use function in <algorithm> for this. Would have to check cpp version.

source/network/NetSession.cpp
229

Maybe 0.0 to mark it as a double.

I'm guessing the "OOP-ness" is the fact you define interactions in the JS instead of inside the XML?

source/network/NetServer.cpp
366

std::find_if probably

wraitii added reviewers: Restricted Owners Package, Restricted Owners Package, Restricted Owners Package, wraitii.Jan 14 2019, 12:07 PM
elexis planned changes to this revision.Jan 15 2019, 2:29 PM
elexis marked 5 inline comments as done.

As mentioned I didn't intend this patch to be reviewed or committed, just wanted to publish what I have.

Must have:

  • Resize columns: There is too much text for the cells, on 1024x768 and even on 1920x1080. Should check if D1717 makes this problem harmless, or whether some columns should be optional, whether to use different XML widths (some columns absolute others relative?), or whether even to implement COList resizing with mouseclicks.
  • New packet types: JS Interface access to g_NetServer means it won't work with a dedicated server. In that case there only were moderators. Therefore, the IP addresses should be sent to moderators using a new packet for the purpose of the moderators being able to implement ip-based or hostname-based banmasks.
  • IPv6 support.
  • Terms and Conditions of hosting / Privacy Policy: Should inform the clients that by the decentral network design of pyrogenesis, the host will be able to see the IP address, hostname and can infer the location data whether he likes it or not, so this patch is only displaying what was already available, but obscured and only available to technical persons. Also according to history, there is significant legitimate interest in banning people based on their name, ISP, subnet or location. I recall some GDPR article said that someone who doesn't host the service but still develops a product that allows clients to use a service still have to make sure the GDPR is met. It should be easy to just reuse the terms dialog that we have in lobby/userreporter/modio for multiplayer-joins. This would only be accepted once, as the policy should be the same for all hosts, at least for now. (Player chatlog is the other part of data that has some expectation of privacy, while the chatlogs are stored in mainlog.html for every player and the chat is not being encrypted currently, nor the connection itself).

Should have:

  • *VPN detection:* Notice the Blocks file contains which clients are using proxies / VPN networks. This is parsed alread in C++. It should be shown, so host can kick people who they suspect to be trolls.
  • Offline players should also be kept in the dialog (gray color or italic font), so that disconnected clients can be banned without the host having to look exactly at the time when the offending client is connected.
    • Filters: So that one can display only players, only spectators, only laggers, or filter by an arbitrary search term that checks for playernames, locations and maybe more.

Nice to have / bloat:

  • The latitude/longitude data is unused and should either be removed (lots of memory for the optional City file), or used. Doubleclicking on a client in the list could open a dialog about that client, which could display a cross on an recent or ancient worldmap), and buttons to handle that client.

I'm guessing the "OOP-ness" is the fact you define interactions in the JS instead of inside the XML?

It's the fact that the new JS files aren't an intermixed spaghetti swash of global procedures and global state variables, but everything grouped in prototypes with member variables.

source/network/IPTools.cpp
34

Thanks!

source/network/NetMessages.h
187

(This packet was called Performance instead of RTT in rP17730 so that this patch can save 5 changed lines...)

source/network/scripting/JSInterface_GeoLite2.h
25

(now unused include)

source/network/scripting/JSInterface_Network.cpp
172

(obsolete TODO)

source/ps/GameSetup/GameSetup.cpp
974

f.e. pt-BR

*VPN detection:* Notice the Blocks file contains which clients are using proxies / VPN networks.

It's not possible for all proxies/VPN.

I don't understand why do we need the geoip coordinates? We can't trust to them, I suppose we don't need to know arbitrary untrusted coordinates, only IP/host name/ping/etc.

*VPN detection:* Notice the Blocks file contains which clients are using proxies / VPN networks.

It's not possible for all proxies/VPN.

Dunno, it's in the database and it can be displayed. The database is incomplete here and there with some entries and that should be caught already. Just a matter of displaying it in JS.

I don't understand why do we need the geoip coordinates? We can't trust to them, I suppose we don't need to know arbitrary untrusted coordinates, only IP/host name/ping/etc.

I guess you don't mean trust but accuracy.

In D1746#70499, @elexis wrote:

Dunno, it's in the database and it can be displayed. The database is incomplete here and there with some entries and that should be caught already. Just a matter of displaying it in JS.

The problem is that the database can not be complete.

I guess you don't mean trust but accuracy.

Probably. I meant that IP can change the location, and we'd see the old value until the database receives updates.

The problem is that the database can not be complete.

Where is the problem? It just displays nothing or "?" then. It can be the case for any column and any row. It displays something if there is data, otherwise it doesn't.

I meant that IP can change the location, and we'd see the old value until the database receives updates.

The accuracy value is very important, it has some fixed values 5km, 25km, ..., 1000km.
Should add a script to download the most recent database, but that won't be used by most players.
But if this feature was committed, it would only add the Country database anyhow, because the City database consumes 500MB ram (150mb unzipped csv files).
I ran out of optimizations to reduce the memory footprint of the GeoLite2.h members. Perhaps there are some alternatives to std::vector, std::map.

Lefo added a comment.Jan 19 2019, 1:06 AM

Instead of adding some database wouldn't it be easier to just do RDNS lookup. And perhaps if it fails fall back to whois service?

In D1746#70502, @elexis wrote:

Where is the problem? It just displays nothing or "?" then. It can be the case for any column and any row. It displays something if there is data, otherwise it doesn't.

So the status of proxy is useless, because it does guarantee anything.

The accuracy value is very important, it has some fixed values 5km, 25km, ..., 1000km.

I'm talking about situation when some group of IP is changed location. But your current DB still contains old values.

Should add a script to download the most recent database, but that won't be used by most players.
But if this feature was committed, it would only add the Country database anyhow, because the City database consumes 500MB ram (150mb unzipped csv files).
I ran out of optimizations to reduce the memory footprint of the GeoLite2.h members. Perhaps there are some alternatives to std::vector, std::map.

500MB is too much, geoip can't spend more memory than the whole game.

In D1746#70502, @elexis wrote:

Where is the problem? It just displays nothing or "?" then. It can be the case for any column and any row. It displays something if there is data, otherwise it doesn't.

So the status of proxy is useless, because it does guarantee anything.

I don't understand why you say that. If there is a 1 in the database for that field, then it's a VPN. If it's a VPN and people want to rule out anonymous users / smurfs, it's very useful to them.
If there was any entry in the GeoLite2 database that was useless, it wouldn't be in the database.

Should add a script to download the most recent database, but that won't be used by most players.
But if this feature was committed, it would only add the Country database anyhow, because the City database consumes 500MB ram (150mb unzipped csv files).
I ran out of optimizations to reduce the memory footprint of the GeoLite2.h members. Perhaps there are some alternatives to std::vector, std::map.

500MB is too much, geoip can't spend more memory than the whole game.

That's what I said above. The City database will not be committed. The country database would be if anything, which is few mb. People who want to use it still could however by just downloading and unzipping it. (If I write a parser for GeoLite2, I don't write a parser for one or two of the three files, but for all three files, as they are very simple to parse.)
I may repeat, that's also why I was wondering if you know some alternatives to std::map that has a smaller memory footprint (or any other method to reduce the footprint. Just not storing some information like lat/lon is the only thing remaining, otherwise I had applied all optimziations I could come up with).

In D1746#70503, @Lefo wrote:

Instead of adding some database wouldn't it be easier to just do RDNS lookup. And perhaps if it fails fall back to whois service?

Looking up the hostname from the IP address is included in the patch.
But that doesn't show the country/cityname/ISP/VPN/Satellite data that GeoLite2 is about.
One can lookup these entries online, but these are usually attached to payment for some reason that isn't clear to me (downloads might yield much more traffic). https://www.maxmind.com/en/geoip2-precision-city-service?pkit_lang=en

Lefo added a comment.Jan 19 2019, 2:43 AM

Looking up the hostname from the IP address is included in the patch.
But that doesn't show the country/cityname/ISP/VPN/Satellite data that GeoLite2 is about.
One can lookup these entries online, but these are usually attached to payment for some reason that isn't clear to me (downloads might yield much more traffic). https://www.maxmind.com/en/geoip2-precision-city-service?pkit_lang=en

Well I ment to just use the TLD of hostname. It usually gives you country. If it doesn't you can use whois. Whois is free service, but I'm not sure if it is designed (and allowed to use) for (a lot of / automatic) queries. But if you really need some VPN hints then I guess there's no way to get it with just DNS and whois,

ffffffff added a comment.EditedJan 19 2019, 3:53 AM

remove Hostname IP Address and Internet Provider

add player list with checkbox for banned, local muted, perma banned

keep also players that were in game connected in list below maybe grey or something to act to them

add auto kick specs option for ping over 500ms for 5 seconds or loosing connection

remove Hostname IP Address and Internet Provider

Hostname and IP will be used for banmasks. For example one could add the banmask *belgacom* after seeing that this is a troll spawnpoint.
The ISP can help distinguishing users when the hostname lookup failed or isn't an intelligible string.
I suppose you ask to remove them because the list is too crowded. That's where the optionality of columns comes in.
Also, as mentioned, doublecliking on a client could display a window that displays information that doesn't fit into the table.

keep also players that were in game connected in list below maybe grey or something to act to them

See my comments above, already marked as a use case.

add auto kick specs option for ping over 500ms for 5 seconds or loosing connection

There is a ticket for that. I'm skeptical about it, I think it will do more harm than benefit, because it is very easy to have a sudden lagspike or one simulation turn that takes 5+ seconds to compute, and then you disconnect the one or even every client and everyone has to rejoin which can take beyond 2 minutes...
There are also many more TODOs of possible features in the code and comment above for you to read.

In D1746#70506, @Lefo wrote:

Well I ment to just use the TLD of hostname. It usually gives you country. If it doesn't you can use whois. Whois is free service, but I'm not sure if it is designed (and allowed to use) for (a lot of / automatic) queries. But if you really need some VPN hints then I guess there's no way to get it with just DNS and whois,

GeoLite2 is very common to use when one wants to lookup information on an IP address. We have used it for years when someone was smurfing or even doing criminal offenses like DDOSing. In the last two years we have added it as an automatic tool for lobby moderation too (as mentioned in its Privacy Policy).
As the database is really simple to parse and provides a lot of information, in particular provides information that is already publicly accessible to the technically not incapable people, it seems logical to me to make it accessible to everyone.
I'm actually not saying that displaying the whois data is bad, just that we want to put in all that is already accessible and I went straight for GeoIP data, because the whois data seemed much less useful. For example the ISP name for 81.190.58.105 is in one of multiple descr lines whereas for 84.185.76.249 it is in the org-name field.
A whois reader could be added later, for example if the client dialog (doubleclicking on a client) would be added.
It would for instance provide the @abuse email address which could be useful for players in case someone uses DDOS or exploits on them again.

memory footprint

Perhaps the next step to reduce the memory footprint would be to not use std::map but straight binary data. Mostly I disagree with C++ using 500mb to store 150mb of commaseparated string values, even after parsing half of them to u16 and bool.

ffffffff added a comment.EditedJan 19 2019, 1:02 PM

There is a ticket for that. I'm skeptical about it, I think it will do more harm than benefit, because it is very easy to have a sudden lagspike or one simulation turn that takes 5+ seconds to compute, and then you disconnect the one or even every client and everyone has to rejoin which can take beyond 2 minutes...
There are also many more TODOs of possible features in the code and comment above for you to read.

im not its only for specs they nerf the players so kick... eae only only optional as checkbox eae

Hostname and IP will be used for banmasks

u can use them but dont show them very private information

Location id go only for country no city to much private information

one could also add a banned/kicked/muted counter just for curious eae

u can use them but dont show them very private information

It's not private information but information that the user knowingly sends to the server. As mentioned already, theres a difference between showing it to the server / server moderators and showing it to every client. The latter don't have a reason to see the IP address.
How do you add a banmask without knowing which IPs to ban. (no questionmark)

Location id go only for country no city to much private information

Again, it's not private information but publicly accessible to anyone. It only makes it more accessible to everyone, in fact educating the players what they already reveal to every host when they join.
It's like walking around nude in the city and then asking people not to look.

How do you add a banmask without knowing which IPs to ban. (no questionmark)

questionmark pls

add mask internal dont show

smiley added a subscriber: smiley.Jan 19 2019, 2:21 PM

Lat/lon and distance is probably too much. An individual possessing the skills necessary to do it doesn't warrant making it available for the masses.
Although, I guess everyone knows how to look at a map and find the flag of the country in the list. Which raises the same "concern" about city.

reply to the reply of this:

K, I guess that's true as well.

In D1746#70505, @elexis wrote:
In D1746#70502, @elexis wrote:

Where is the problem? It just displays nothing or "?" then. It can be the case for any column and any row. It displays something if there is data, otherwise it doesn't.

So the status of proxy is useless, because it does guarantee anything.

I don't understand why you say that. If there is a 1 in the database for that field, then it's a VPN. If it's a VPN and people want to rule out anonymous users / smurfs, it's very useful to them.
If there was any entry in the GeoLite2 database that was useless, it wouldn't be in the database.

Because a VPN service can use a random IP, you can't predict which IP is proxy (because of botnet, etc). Some people use custom VPN (dedicated server with a simple tunnel). So I suppose the GeoLite2 can detect only popular VPN for usual users.

Should add a script to download the most recent database, but that won't be used by most players.
But if this feature was committed, it would only add the Country database anyhow, because the City database consumes 500MB ram (150mb unzipped csv files).
I ran out of optimizations to reduce the memory footprint of the GeoLite2.h members. Perhaps there are some alternatives to std::vector, std::map.

500MB is too much, geoip can't spend more memory than the whole game.

That's what I said above. The City database will not be committed. The country database would be if anything, which is few mb. People who want to use it still could however by just downloading and unzipping it. (If I write a parser for GeoLite2, I don't write a parser for one or two of the three files, but for all three files, as they are very simple to parse.)
I may repeat, that's also why I was wondering if you know some alternatives to std::map that has a smaller memory footprint (or any other method to reduce the footprint. Just not storing some information like lat/lon is the only thing remaining, otherwise I had applied all optimziations I could come up with).

I'd prefer to not bake the static data in the game, but use up-to-date data from our server (we already authorize users for lobby, so it shouldn't be a problem).

Privacy stuff:

As already mentioned the absence of reasonable expectation of privacy of IP addresses to the hosting player should be explained to the user in a Privacy Policy before he joins any games.
The IP address of the client is already being processed by the server, every server can display it trivially:
grep 'Received connection from' /home/user/.config/0ad/logs/mainlog.html
Typing "geoip" into google and copy&pasting that IP address there is also trivial.
It's also not a secret that geoip exists.

The term reasonable expectation of privacy is present in both EU legislation (GDPR) and US courts.
For example this one was specifically about processing IP addresses and a person complaining that this can reveal their particular home address:

a person has no legitimate expectation of privacy in information he voluntarily turns over to third parties . . . even if the information is revealed on the assumption that it will be used only for a limited purpose and the confidence placed in the third party will not be betrayed.” United States v. Miller, 425 U.S. 435, 442-44 (1976).

https://www.drinkerbiddle.com/insights/publications/2016/09/reasonable-expectations-of-data-privacy

The problem is that some people are unaware of how the internet protocol works and the consequences for their own privacy.
Which data is publicly accessible (here under Creative Commons license) to everyone for free, and legally so, is nothing that I can decide, but the people that keep track of which IP subnets are located where and by which ISP (in this case https://dev.maxmind.com/geoip/geoip2/geolite2/ and I don't know where they have it from.)
Players are used to the current implementation and then assume only what is currently displayed is possible and legitimate (when it is trivial and purposeful to display the data that the client voluntarily turns over to the server).

Secondly, it's contradictory to say that the City database (with accuracy from +5km to +1000km) is illegal personal data processing because it has no purpose, but processing the Country data would be okay (because the datasets cover the exact same purpose). The purpose is identifying smurfs, trolls, unwanted people and ultimately allowing the service provider (host) to enforce the service, and only the service that he wants to provide to the clients. For an example a server where there are no anonymous people, for example to be able to predict the skill of the participating players, or for example to have a game where people don't do jokes or statements that are considered particularly offensive by the host).
There have been many precedents for these things, including people who workaround bans in the latest release by reseting their router to get a new IP and changing the nickname (rating part of the nickname) or second (third, n'th) account to work around the current ban implementation (specific nick bans, specific IP bans). Creating new accounts is not always detectable by the lobby server. More than one troll for example plays only one month per year, but creates a new account every time and is only reported by players after a week or two of playing. If the host would be aware of the data that he already sees, he could identify him much sooner.
Notice that it's not only toxic chat and team-skill imbalance issues, but also possible denial of service. A rejoin will drain a lot of memory, and if someone is dedicated to troll that way, they can start one rejoin with a different nick and IP per minute (already happened) - and every time they do this triggers the creation of a savegame of the current gamestate (20mb zipped for maps with many entities?), that can actually make the hosting server freeze.
Those are three purposes for data processing, and it is by no means uncommon for servers to check IP addresses to defend from "cyberattacks":

The operator of a website may have a legitimate interest in storing certain personal data (IP address) relating to visitors to that website in order to protect itself against cyberattacks

If one wants to conceal ones IP address, one has to use a VPN or satellite, whether one likes it or not, whether the IP is displayed in the game or not.
Notice that one can be removed from the GeoLite2 database upon request if some conditions are met (Opt-Out of Database Sharing) https://www.maxmind.com/de/privacy-policy but that will at most work for their database, there are many GeoIP services, government databases and so forth.
The dialog does not make this information accessible (it already is), but the difference is that this dialog does is giving technically unaware players to (1) demonstrate what data they (if they didn't read but accept the privacy policy unknowingly) share and (2) the ability to control who can use the server, and technically capable players less time-consming control over banning, even if these users can change their IP address and/or nickname.

So there are two legal bases for processing of the IP address, ISP and geolocation of the IP address (GDPR 6, should be equivalent in the US):

  • performance of a contract (terms and conditions)
  • legitimate interests (defending from cyberattacks)

The other data that can be inferred from the geolocation is the predominant language spoken in that country (which doesn't have to mean anything to the player) and the time in the timezone of that place.
These two facts have purpose to the players, they can now find out who speaks the same language, which they due to the language barrier might not easily able to do otherwise, and can use google translate to help with the most crucial things (For example I tried to explain for 20 minutes to a spanish player where to research the wall armor upgrade. He said thanks everytime and researched all upgraedes in the tower, but never the one in the wall tower. After I saw that his IP is from spain and putting the same chat into google translate, the tech was researched instantly.).
The timezone can help players to identify players who are close to their timezone (so they can see who will be likely online at the same time and who they only rarely see because they are on the other side of the earth), or players who should rather be sent to bed.

If the legal basis of processing of IP addresses of servers hadn't been proven in courts already and the IP protocol could be changed by us to be address free,
GDPR, depending on some conditions, require the controller to do an assessment of legitimate interests and weigh the risks of data processing against the 'compellingness' of the purposes of processing the individual user.
That would mean creating an attacker model. If the data was not accessible to the server, there might be a stalker, or someone who tells you that he doesn't like the predominent religion of the country your IP is located in (at least the latter was the only thing that triggered reservations so far).
The first case (stalker, cyberattacker, physical attacker) would be very important to prevent. But doing so fails, because specifically these attacker models already have trivial access to that information and in fact do use it already, while with the dialog it gives people more control to keep them at distance.
The second case (I don't like you based on your location) is something that poses no risk for physical harm to the player, and is in (hopefully most) countries covered by free speech laws, and thirdly poses no risk to the player because these players decide for themselves whether they share that information, or whether they expose themselves to players who don't like their properties of the IP address. For the lobby it is the moderators decision how far chat tolerance goes (for example whether to have a zero-tolerance policy for some chat messages while expecting a mandatory full acceptance policy with regards to their own chat messages).

A template for the Legitimate Interest Assessment can be found below. We did that for the lobby too this summer for the lobby too. The more I read about these questions the more solid the case becomes for me, but feel free to provide me new information.
https://ico.org.uk/media/for-organisations/guide-to-the-general-data-protection-regulation-gdpr/lawful-basis-for-processing/legitimate-interests-1-0.pdf

How many space the list of only countries costs?

GeoLite2 can detect only popular VPN for usual users.

Surely. Still useful to display it where it is known. But I noticed the patch doesn't distinguish between absence of VPN information and knowledge of an IP not being a VPN provider. I didn't check whether the database distinguishes between 0 and emptystring for that field. That goes into the TODO list.

I'd prefer to not bake the static data in the game, but use up-to-date data from our server (we already authorize users for lobby, so it shouldn't be a problem).

Doing it remotely means the memory footprint problem and the currentness of data problem would be solved.
The downside however is that it would only work with the lobby server, the current patch works with hosting via IP too (IP isn't really a password, for other games people implemented custom gamelists with published gamehost IPs), adds traffic and code complexity to the lobby server (it would require a new bot).
I think the location of IPs doesn't change countries that often that it would invalidate so many entries within a release cycle that it would be a dealbreaker. (One could even do the http download and launch of the zip file with CURL if one wanted to, at least I didn't see something in their terms that rules out weekly downloads with curl.)
Only the memory footprint of the City level db would be an argument for doing it remotely. But I'm not sure that we want to implement so much networking infrastructure for City grade resolution or the currentness of the data. Parsing csv files is very simple and stupid, and independent of other computers, no?

How many space the list of only countries costs?

As a preface, we can't measure that exactly as far as I know, because programs can keep previously allocated memory to them. But the taskmanager numbers are still highly relevant of course.

MainMenu + no GeoLite2 = 75MB
MainMenu + Country + 90MB
MainMenu + Country + ASN = 140MB memory
MainMenu + City + ASN = 500MB memory
Game running with 8 players, possibly 7 AIs, many units > 4GB (that's why we needed the windows 64bit patch)

Also I need to say that the last number was 2500MB when I started...
It could be that this can be compressed further with some brain sweat.
For example the country names could be gathered with ICU? Then they were even translated into more than 8 languages.
(I wanted to avoid thus at first because ICU can't provide citynames I think.)

At last, notice that the entire story is optional, and if it is considered memory intense, it should be disabled by default, while leaving the feature to most of the players that have so much memory that 15 or 150MB don't make a difference (while having the feature does make a difference).

Perhaps the memory footprint problem could be addressed by just remembering the position inside the file for each IP subnet and then only parsing that line on demand? Hmm.

Lefo added a comment.Jan 19 2019, 5:56 PM

I didn't get why would this function affect memory consumption at all and now I noticed the intention is to use CSV version of GeoLite2 and as I understand it this whole database gets parsed and stored to memory. To me this seems to be definitely wrong decision. MaxMind format of this database contains binary search tree which allows to quickly search for any IP by descending only the necessary parts of search tree with no need to read the whole file. Compared to some parsing of CSV this is much more efficient and does not require any significant memory resources.

The std::map has logarithmic lookup time as well.
Reading the whole file is necessary unless we want to keep the file handle open or re-read the file.
They offer csv files for a reason, they don't need a complicated format.
I think if one can keep the filehandle open, one can just parse the IP address range (pair of u32, u8) and map that to the position inside the file (u32).
Then if a client connects, one could read that one line of the file. Sounds quicker than implementing their custom database format and as memory efficient? (Maybe due to the different arrangement it may need a second or two more to load the initial table)

Lefo added a comment.Jan 19 2019, 10:27 PM

The std::map has logarithmic lookup time as well.

Not when you use it the way you use it. You just linearly traverse all the data in std::map and call a check whether it matches. You can't really use map or dict type object to efficiently look up for ip/mask when you don't know the mask. That's why we generally can't use hash tables for routing (though we can cache addresses to hash tables). You could try all possible masks which should give you constant or logartihmic time complexity depending on whether dict type object is binary tree or hash table (std::map is binary tree). But of course this would still be much slower than single native lookup.

Reading the whole file is necessary unless we want to keep the file handle open or re-read the file.

Keeping file handle or reopening file is certainly not a problem.

They offer csv files for a reason, they don't need a complicated format.

Well you're saying just the opposite of what you should say here. They offer MaxMind DB format for a reason. So that you can use it to quickly search specific record and not buffer all the data in memory. I guess the main reason they offer CSV is to provide you with some well known format which you can read and convert to your own format which you then use or import it to some database. I doubt CSV format was ever intended to be used directly by some app. CSV is not designed for search of any kind. It just holds linear data.
Anyway I doubt looking up some ip in this format would need more work than parsing CSV and implementing some new search algorithm. In MaxMind DB you're doing the search while you're reading the file letting the database lead you to the record you search.
But you don't need to reimplement wheel, there is library libmaxminddb which does this for you.

The point is that MaxMind is database which you can quickly search in specifically designed for IPv4/IPv6 networks. It was designed for exactly this use case and I don't see a point in using anything else.
I understand using MaxMind DB requires to drop some code you've alerady done. But it's definitely the right approach.

Yes, it has to iterate once through the map, because the subnets have different bitmasks, but the other std::map's can lookup directly.
Anyhow, the lookup time is kind of irrelevant, there is also a cache for the few IPs looked up, the memory footprint is the dealbreaker.

But you don't need to reimplement wheel, there is library libmaxminddb which does this for you.

Importing such a library is something should be avoided.

Keeping file handle or reopening file is certainly not a problem.

I will try this.

The point is that MaxMind is database which you can quickly search in specifically designed for IPv4/IPv6 networks. It was designed for exactly this use case and I don't see a point in using anything else.
I understand using MaxMind DB requires to drop some code you've alerady done. But it's definitely the right approach.

Dropping code isn't the problem, importing a huge third party code for something that is a small side feature is.

(Obligatory comment about num_clients/num_cities * 100 and how the current approach seems so wasteful even after ignoring player distributions and whatnot.)

Maybe you should get the cities by continent databases and only load whats necessary. I think I recall having seen something like that. If not, it should be somewhere on the internet.

Lefo added a comment.Jan 20 2019, 1:26 PM

Anyhow, the lookup time is kind of irrelevant, there is also a cache for the few IPs looked up, the memory footprint is the dealbreaker.

Yes, I agree I was just replying to your previous statement.
Though if the game really should start for 2 seconds more it would seem dumb. Since some minor function would make the longest delay in game start process.

Keeping file handle or reopening file is certainly not a problem.

I will try this.

Yeah I encourage you to do so, but note that you probably can't do this with CSV. If you reparse the whole file every time you need to lookup some record, it will be probably too slow. You'd need to use some searchable format. Or at least linear binary format, which should give you pretty much the same speed as you have now with linear lookup since the file content gets cached (unless your system is short on memory).

Dropping code isn't the problem, importing a huge third party code for something that is a small side feature is.

Doesn't seem too huge to me. Compared to your code for geolite parsing and lookup it doesn't seem much bigger.
I'm not sure what you mean by importing third party code. But this library is provided by most distributions including necessary header files for compilation. Seems to me to be just fine to include this as optional dependency. Certainly better approach than reimplementing some suboptimal algorithms all by yourself introducing some unnecessary code complexity to game itself.

From https://dev.maxmind.com/geoip/geoip2/geolite2/:

The GeoLite2 Country and City databases are updated on the first Tuesday of each month. The GeoLite2 ASN database is updated every Tuesday.

It means that our packed IP list will be outdated for the next month after each release.

In D1746#70585, @elexis wrote:

MainMenu + no GeoLite2 = 75MB
MainMenu + Country + 90MB
MainMenu + Country + ASN = 140MB memory
MainMenu + City + ASN = 500MB memory
Game running with 8 players, possibly 7 AIs, many units > 4GB (that's why we needed the windows 64bit patch)
...
At last, notice that the entire story is optional, and if it is considered memory intense, it should be disabled by default, while leaving the feature to most of the players that have so much memory that 15 or 150MB don't make a difference (while having the feature does make a difference).

I'd agree with 15MB. 100-500MB isn't a lot, but it's pretty noticeable, especially on loading from HDD (80-160MB/s) or on processing on low-end processors.

I don't think that such efforts and resources are worth to spend on the small though nice feature.

elexis added a subscriber: smiley.Jan 20 2019, 4:22 PM
In D1746#70616, @smiley wrote:

Yes, that's what we are talking about.

In D1746#70621, @Lefo wrote:

CSV

Not parsing the whole file, parsing only the IP ranges on init and remembering where the IP range data is located inside the file, i.e. map from (u32, u8) -> u32.
Then when looking up one new IP, it would seek to that position inside the file and parse until the first \n.
(I suspect (= lazy because didn't check) that the MaxMind reader would actually have to seek more throughout this file (read the first bit of the IP, jump somewhere, read the next bit, jump somewhere), which means if the user has a HDD, the head of the HDD would have to jump more as well?)
This would also speed up the parsing on init a lot as most parsing and storing would be avoided altogether.

I will try this.

(s/will/might, after all I'm the only one interested in this feature and maybe two lobby enthusiasts)

library is provided by most distributions

It sounds like PITA for macOS, the one outdated linux distribution and the one too recent linux distribution, although I didn't check.
Current GeoLite2.cpp might be 500 lines, but at least its all C++11 that is guaranteed to work everywhere without any shenanigans.
Adding new dependencies in general is something that is frowned upon, people prefer to drop features to have less dependencies, at least from what I recall about libJPEG, which is why I stayed away from that.

The problem with not storing only the parsed contents but keeping an open file handle (std::istream) and parsing only the entries that are requested is that the file contents will be in the virtual filesystem VFS, and that only supports reading an entire file at once.
It must be the VFS because 0ad should be distributable with the game data files being stored in a possibly compressed zipfile.
The same problem happened with the visual replay menu (that wants to read only the first and last few lines of the commands.txt, which increases the performance by something like an order of magnitude.

I had also taken a look at the fileformat

Current GeoLite2.cpp might be 500 lines

Actually it's less than 250 lines + empty lines + comments.
Something like 10 functions with 25 lines each?

The c reference implementation thing has much more and it would likely be a PITA to maintain.
For example for windows we'd always had to build a new version manually, like we do with enet, gloox and so forth.
For osX we'd have to keep the version number up to date in a script, the URL where to download it.
Then different platforms will use different versions and we get bugreports for software that isn't ours (which we in the worst case might find out only after finding the bug in that software, like with gloox #4705).
When we can rely on C++11 features, it should work according to the specs on every platform.
I didn't read much of that c reference implementation, but it looked like beyond 2500 lines (including whitespace and everything)?
I didn't determine whether or not that implementation does not read the entire file into the memory.
There's also another GeoLite2 c++ parser, but that was BSD licensed, i.e. not GPL v2+ compatible.
The binary search tree to lookup IPs described in the file format document is a nice concept, but it does add a lot of code complexity in comparison to csv parsing. Might still be worth it (as in faster to implement than to weigh the cost of not doing it to the possible benefits of the implementation).
But before going either way, the VFS filehandle problem would have to be solved, most likely even a third party library would be used. For compressed, uncompressed and unzipped files each.

Stan added a comment.Jan 21 2019, 1:00 PM

This feature looks nice to me even though people are way too concerned about IPs . Do you want me to test it on Windows ?

smiley added inline comments.Jan 21 2019, 3:30 PM
binaries/data/mods/public/gui/networkreport/CountryFlags.js
8

Adding things that are sure to be used is probably better. Especially when there are so many. It might be a negligible memory usage, but useless things are useless things. But that’s just my opinion.

(At first, I thought that was what the dynamic loading thing was actually added for.)

binaries/data/mods/public/gui/networkreport/NetworkDialog.js
96

Is this from one of the dependent proposals?

source/network/GeoLite2.cpp
278

I would say it’s fine without one. This particular check was present two times above. And it’s purpose is clear IMO.

Adjust for commented out code though.

303

Probably not. But I am not sure.
I think file.Load(... would just return something other than the OK code.

source/network/NetServer.cpp
1599

Maybe I am misunderstanding it, but this is in anticipation of controller server split right?

Lefo added a comment.Jan 21 2019, 3:33 PM

It must be the VFS because 0ad should be distributable with the game data files being stored in a possibly compressed zipfile.

I don't think it's a good idea to put this file in zip. It would be better outside so user can easily update it. Apps usually don't even bundle the GeoLite2 DB and rather let users to download it themselves. Also it would be good to have path to the database configurable so user can download the new database once to some location and make all apps use it without need to taking special care of each app (well on unix like systems this could be also acomplished using symlinks). Though while there are already some apps using MaxMind DB version if you make 0ad require the CSV format I doubt any other app would use it too.

I didn't determine whether or not that implementation does not read the entire file into the memory.

You can't mean this seriously? What would be the point of the database in file if it was implemented the way so that the whole database would be read to memory?
It's using mmap (or that windows file mapping whatever when running on windows). This is a good approach since there are no syscalls needed when reading the file.

For example for windows we'd always had to build a new version manually, like we do with enet, gloox and so forth.
For osX we'd have to keep the version number up to date in a script, the URL where to download it.

Or you could just simply make it an optional dependency. When user does not have it, it would simply compile without GeoLite2 function.

There's also another GeoLite2 c++ parser, but that was BSD licensed, i.e. not GPL v2+ compatible.

BSD is GPL v2+ compatible.

after all I'm the only one interested in this feature and maybe two lobby enthusiasts

I'd like this feature too. I rarely notice anyone playing 0ad is able to speak my language and when I do it's a coincidence. This feature could increase the chances a lot. It would be also nice to share country information to other players, not just keeping it for host. Also it would be nice to have country flags in the lobby. I think many people would find it useful.

Stan added a comment.Jan 21 2019, 4:23 PM
In D1746#70720, @Lefo wrote:

You can't mean this seriously? What would be the point of the database in file if it was implemented the way so that the whole database would be read to memory?
It's using mmap (or that windows file mapping whatever when running on windows). This is a good approach since there are no syscalls needed when reading the file.

Likely pagefile.sys, wonder what happens if you disabled it.

For example for windows we'd always had to build a new version manually, like we do with enet, gloox and so forth.
For Mac Os we'd have to keep the version number up to date in a script, the URL where to download it.

Or you could just simply make it an optional dependency. When user does not have it, it would simply compile without GeoLite2 function.

As a mac packager I'd like not to have go through figuring what to enable to get a proper bundle given making one takes about 4-6 hours. The main case is releasing the game with all dependencies, for compatibility and sanity sake. We usually don't ask people to compile the game themselves at least not on Mac and Windows. The goal is to make the build process as straightforward as possible to simplify potential contributor's life. While the dependencies should be there, there should definitely be a flag to enable/disable it in the game.

AFAIK the only dependency we do not bundle in SVN is WxWidgets on Windows. That's because most of the time you don't need to build the map editor as it's his own dll on windowS.

I don't think it's a good idea to put this file in zip. It would be better outside so user can easily update it. Apps usually don't even bundle the GeoLite2 DB and rather let users to download it themselves. Also it would be good to have path to the database configurable so user can download the new database once to some location and make all apps use it without need to taking special care of each app (well on unix like systems this could be also acomplished using symlinks). Though while there are already some apps using MaxMind DB version if you make 0ad require the CSV format I doubt any other app would use it too.

Well regardless even if we had the user download it at run-time, it should be loaded through the VFS. So the problem stands. If it is such a problem, then we must update the VFS to handle not reading a while file at once, and that's really the end of it.

An alternative would be to bundle a separate executable that starts a server of some kind with which 0 A.D. could communicate. That server could be however simple or complicated we want. It would introduce latency, but would extinguish other problems.


I agree with bundling as few libraries as possible in C++ projects because it's a PITA to maintain.

smiley added inline comments.Jan 21 2019, 5:32 PM
binaries/data/mods/public/gui/common/NetworkDialogManager.js
15

Somehow I didn't quite catch this when I posted those...
And I suppose the line below is that way to avoid a double negation.

elexis marked an inline comment as done.Jan 21 2019, 5:42 PM
In D1746#70720, @Lefo wrote:

It must be the VFS because 0ad should be distributable with the game data files being stored in a possibly compressed zipfile.

I don't think it's a good idea to put this file in zip. It would be better outside so user can easily update it.

The user can update it easily already by downloading it to .local/share/0ad/mods/geolite2updates/geolite2/*.csv.

Zip redistribution advantages:

  • Everyone will have access to the Country database.
  • The intended way of treating files in 0ad: It's a design choice that the mod data that pyrogenesis uses (such as 0ad) is distributed in a zip file. This was done for several reasons, including file lookup times if I recall correctly, but also for consistency (leaving only a fixed zip file instead of many lose directory names and files).

Unzipped redistribution advantages:

  • Everyone will have access to the Country database.
  • Enables third party library use, at least of the c reference implementation.
  • Enables reading from the file without reading it entirely, when not using a third party lib.

Unzipped, no-redistribution advantages:

  • Currentness: The data in a release zip file is supposed to be valid for one release, but GeoLite2 updates are released monthly (is it terribly or tolerably outdated?)

I started looking into providing a VFS zip stream reader, but I don't know if it's actually possible with the compression and not sure if this is a worthwhile endeavour.

Apps usually don't even bundle the GeoLite2 DB and rather let users to download it themselves

Attractive for maintenace and distribution, but I suspect that would reduce the feature use from 1 to 20, and one of the goal was to make it universally accessible without requiring the user to download and setup something.

Currentness

I don't know how often IP ranges are sold from one country to the other. But I would expect that if the data is still correct 90% of the time after an entire release cycle, it would be better to have it than to not have it (based on the argument of the use cases).

Also it would be good to have path to the database configurable

That's already implemented.

I didn't determine whether or not that implementation does not read the entire file into the memory.

What would be the point of the database in file if it was implemented the way so that the whole database would be read to memory?

I don't know, I just said that I didn't read anything that actually states that this isn't read into memory entirely (I spent very limited time though).

It's using mmap (or that windows file mapping whatever when running on windows). This is a good approach since there are no syscalls needed when reading the file.

Which makes it incompatible with geolite2 being contained in a zip file, which implies that 0ad would have to either unzip it to somewhere at program start (costs time), or the user would have to be instructed to do something with his zip tool in the right folder (which means 5-9 in 10 players won't bother and half of the rest will fail to comprehend what is wanted or fail at some random detail).

Or you could just simply make it an optional dependency. When user does not have it, it would simply compile without GeoLite2 function.

People who compile are a tiny fraction of the community.
99.8% of the windows users, > 90% of the macOS users, even > 50% of the ubuntu users from what I gather on the forums and lobby don't compile but use a precompiled package, so the argument about either the maintenance cost of adding the dependency to windows and macOS and in principle (version differences, upstream bugs), or the feature being used by two orders of magnitude less users stands?

There's also another GeoLite2 c++ parser, but that was BSD licensed, i.e. not GPL v2+ compatible.

It was this:
https://www.ccoderun.ca/GeoLite2++/api/usage.html

License:
https://www.ccoderun.ca/GeoLite2PP/download/license.txt
Which is the https://opensource.org/licenses/BSD-2-Clause

I don't see the requirement that distributing the binary implies distributing the source, which GPL v2.5+ mandates last time I checked? (only one example of differences)
But this says BSD 2 Clause is GPL compliant however https://en.wikipedia.org/wiki/BSD_licenses#2-clause_license_(%22Simplified_BSD_License%22_or_%22FreeBSD_License%22)
Which refers to https://www.gnu.org/licenses/license-list.html#FreeBSD
So I guess its GPL compliant because the BSD 2 clause doesn't rule out that the distributor makes less permissive licensing requirements.

In D1746#70713, @Stan wrote:

This feature looks nice to me even though people are way too concerned about IPs .

Some people are also concerned about their nickname being shown and them not being permitted to create multiple accounts per user to conceal their rating/skill indicator and chat personality in the multiplayer lobby and multiplayer matches.
If people really insist on their IP being treated as private, they would either argue for them to use a VPN/satellite or for Wildfire Games to host all games and keep the IPs from any player. This was thought about for rated games, but not for non-rated games. If Wildfire Games hosts the games, they will become responsible about the data that is sent there. Right now players have the ability to play without the eyes from Wildfire Games, which is at least an attractive alternative (much less traffic for WFG, players being able to do their own thing without all the data passing through a WFG server).
The IP address is something like the home address on the postcard or letter you send, and the sender needs to know that address, and it's exactly the same address that will be blacklisted if the sender is unwanted.

Actually that gives me an idea:

add mask internal dont show

New Use case: GeoLite2 ASN Subnet Ban: The player could ban the entire subnet (or prepare the according banmask) that this IP address is part of with a click on a new button.
Same can be done with country/city bans. The entire thing sounds like belgacom, but there are many other anti-users.
The feature can mean players can more selectively chose with whom to play.
At one point there was an "anti-buddy" feature proposed too, I guess the persistent banmasks would be that.

Do you want me to test it on Windows ?

Not before deployment stage if it would come to that.

source/network/IPTools.cpp
66

@ffffffff hardcoded, is that good?

elexis marked 4 inline comments as done.Jan 21 2019, 5:48 PM
elexis added inline comments.
binaries/data/mods/public/gui/common/NetworkDialogManager.js
15
binaries/data/mods/public/gui/networkreport/CountryFlags.js
8

but useless things are useless things

I read that before.

binaries/data/mods/public/gui/networkreport/NetworkDialog.js
96

Not at all that works already (and would be renamed to PopGuiPage in D1684 but nothing else)

source/network/NetServer.cpp
1599

dedicated server requirement, the GUI should be incapable of accessing the NetServer unless going through the network via the NetClient

Stan added a comment.Jan 21 2019, 6:34 PM

If you significantly increase the size of the bundle like by 500MB to add that file, we will have to find workarounds for the windows installer as it has a size limit (IIRC 2GB) bigger than that it cannot handle. This is I believe one of the reason we don't have the Asian Fonts as part of the main bundle.

Another thought. I've been helping a few users that have limited bandwidth or pay for MB. Increasing the size of the bundle might be troublesome for those users. Unfortunately I don't have any statistics on such users.

In D1746#70734, @Stan wrote:

If you significantly increase the size of the bundle like by 500MB to add that file, we will have to find workarounds for the windows installer as it has a size limit (IIRC 2GB) bigger than that it cannot handle. This is I believe one of the reason we don't have the Asian Fonts as part of the main bundle.

If 1/5 of the game is one file that will be outdated each next month and that used only for a small feature, it's pretty strange for me.

Another thought. I've been helping a few users that have limited bandwidth or pay for MB. Increasing the size of the bundle might be troublesome for those users. Unfortunately I don't have any statistics on such users.

I agree, also it's not only about users bandwidth, but about their space, our server bandwidth too. Especially if he needs to download these 500MB each month.

In D1746#70724, @elexis wrote:

New Use case: GeoLite2 ASN Subnet Ban: The player could ban the entire subnet (or prepare the according banmask) that this IP address is part of with a click on a new button.

Which will be outdated to the next month.

Game running with 8 players, possibly 7 AIs, many units > 4GB (that's why we needed the windows 64bit patch)

I have preliminary stats results about total RAM (it means a user can use less than this value): <=1GB (1%), <=2GB (16%), <=3GB (7%), <=4GB (15%). So ~39% of our players can't play with your setup. It means that additional 100-500MB could be noticeable for them.

P.S.: I think it's the nice feature, but it's too expensive within a full pack.

source/network/IPTools.cpp
22

Windows doesn't have this include.

Lefo marked an inline comment as done.Jan 21 2019, 10:14 PM
Lefo added inline comments.
source/network/IPTools.cpp
22

Let's drop windows support entirely and bundle 0ad for windows with cygwin.

source/network/IPTools.cpp
22

I'm not sure that it's possible currently. We use Win API directly in some cases. We need someone who knows cygwin well.

elexis marked an inline comment as done.Jan 22 2019, 12:45 PM

! In D1746#70734, @Stan wrote:
increase the size of the bundle like by 500MB

Then you misread, 500mb is the number of memory allocated to load all 3 files in the city-grade resolution, but 15mb memory use with only the country resolution database and no ASN/ISP database.

As posted already, you can download the files here https://dev.maxmind.com/geoip/geoip2/geolite2/ to see for yourself which file has which size.

The country database is 1.7MB zipped and 15MB unzipped...

If 1/5 of the game is one file that will be outdated each next month and that used only for a small feature, it's pretty strange for me.

If 1/5 of the things I post in a revision proposal are repetitions of something that I already repeated about 3 or 4 times and still not being understood, that's pretty strange for me. Perhaps I misphrased something?

Which will be outdated to the next month.

I have posted some replies to that that would be nice to be taken into account, otherwise there is no new information.

So ~39% of our players can't play with your setup

Seriously, how often did I tell that the City level data base will not, not, not, not be committed (unless there were literally no memory or space issues), but only the Country level database which is about 15 MB memory use.
And we didn't even rule out yet that we can bring down the memory use to an order of magnitude of <1MB.
I repeat, the filesize considerable for any commit would be 10MB, but not 500MB, and 500MB is a memory = RAM = C++ variables, and hte number relates to the optional City db that the user can download if the likes the feature. The filesize number of the City db 150MB unzipped and 40MB zipped! Whether you judge a patch as good or bad, get the facts right before doing so.
Anyone who is interested in more current data or higher resoluted (City instead of Country) data can just download the file and possibly edit the directory name in the options dialog, and possibly use a script to download the more recent data.
There are different classes of users here, as I mentioned above:

  1. the 99% of windows users who never compile, the 90% of users who never customize anything with mods, configs, or here downloads.
  2. the more involved and skilled users who know how to configure a program, who look at tooltips, who have read a README file before, who might google and find one of our wiki pages, or people who ask on the forums or lobby before doing anything, who may want to be interested in getting additional benefits for additional efforts if there is a hint informing them that there are more options.

The Country database is merely to provide country flags/language/timezone, that are accurate something like 90% of the time at the end of a release cycle (? one could try to get actual numbers if there were historic revisions) while the cost would be the size of one scenario map.

Since the other TODOs were not discussed (they don't really need to be discussed though) - they exist and they make this patch inacceptable anyhow until they are addressed. (Many improvement TODOs like banmask dialog, but some code defects too.)

source/network/IPTools.cpp
22

Wasn't the conclusion to use getaddrinfo, making this header obsolete?

Let's drop windows support entirely and bundle 0ad for windows with cygwin.

That escalated quickly :P

I guess the lesson here is "everything you say can be used against you", as per usual :p

I think the best course of action would be a dialog letting the user download the city dataset if he wants to, and let the user auto-download the freshest dataset automatically (which would be put in the .local directory). We only bundle the country dataset, so he does get something by default.

Haven't looked at other potential issues yet.

vladislavbelov added a comment.EditedJan 22 2019, 2:27 PM
In D1746#70762, @elexis wrote:

If 1/5 of the things I post in a revision proposal are repetitions of something that I already repeated about 3 or 4 times and still not being understood, that's pretty strange for me. Perhaps I misphrased something?

1/5 of RAM too. Ok, it's the minor problem. I don't mind.

Which will be outdated to the next month.

I have posted some replies to that that would be nice to be taken into account, otherwise there is no new information.

Where? I only see your suggestion that users have to download this file. But the package will be still outdated, because you don't notify users the original package was changed or update period was changed.

So ~39% of our players can't play with your setup

Seriously, how often did I tell that the City level data base will not, not, not, not be committed (unless there were literally no memory or space issues), but only the Country level database which is about 15 MB memory use.

Where I said that it's related to the committing things? You suggest a downloadable feature to users, but only 61% of them probably can support it. So people won't know do they really support or not. Because they may don't know their free RAM space.

Which will be outdated to the next month.

I have posted some replies to that that would be nice to be taken into account, otherwise there is no new information.

Where? I only see your suggestion that users have to download this file. But the package will be still outdated, because you don't notify users the original package was changed or update period was changed.

ctrl+f currentness.
The argument that for country data, the data, at least so I expect, will not be so outdated to the point of being useless. I am not refering to the update the files are released but to the number of changes between the individual releases.
You can upload an update every day or every month, but that doesn't tell us how many changes there were in the meantime.
I couldn't find historic downloads of the database, otherwise we could easily tell how many entries changed in the timeframe of one release cycle and see whether the currentness is really a big problem or not.
Noone denied that there are the issues, but I'm not sure what your conclusion about them is. It's not automatically updated, therefore we shouldn't use it, or therefore we should inform the user, or therefore only the hackers should be able to use it? 100% or nothing?
The question is not what defects we face but how to remove them. At least the memory part seems solveable unless the deflate algorithm prevents that, and the currentness problem is (by me expected) to not be a dealbreaker.

because you don't notify users the original package was changed or update period was changed

It's easy to add a label displaying the version / date and hints about how to update.

<=1GB (1%), <=2GB (16%), <=3GB (7%), <=4GB (15%).

I wouldn't say that < 4GB can't use the city resolution db but 0ad, because it mostly depends on which gamesettings (mapsize, number of players, population limit, number of AIs) they chose how many gigabytes of memory will be eaten away by 0ad.
It seems people with < 4GB are likely to not be able to use the game already if they chose settings where the OS will get to that order of magnitude of memory use, otherwise the 64bit windows patch wouldn't have resolved so many OOM crashes.
That's even more true for the 2GB folks. A simple message should be able to guide users, such as "You can use city level resolution if you (1) download, (2) can afford X memory, (3) are patient to wait some seconds".
The ones who will be able to download and put it into a directory should also be able to tell whether they have enough memory.
It's the same with other settings such as shadow quality for example.
Also currently available memory is something that pyrogenesis can detect automatically I suppose.
But what we should actually would prefer is writing it in such a way that it doesn't have to be read into memory to begin with, which is the VFS stream issue (where I'm currently exploring many new forms of compile errors).

In D1746#70770, @elexis wrote:

ctrl+f currentness.

Which word or sentence?

The argument that for country data, the data, at least so I expect, will not be so outdated to the point of being useless. I am not refering to the update the files are released but to the number of changes between the individual releases.

It'd be useful only for decent users. Because local user ban list will be outdated even sooner than the geolite release.

It's easy to add a label displaying the version / date and hints about how to update.

What shall we do if the the update period was changed?

<=1GB (1%), <=2GB (16%), <=3GB (7%), <=4GB (15%).

I wouldn't say that < 4GB can't use the city resolution db but 0ad, because it mostly depends on which gamesettings (mapsize, number of players, population limit, number of AIs) they chose how many gigabytes of memory will be eaten away by 0ad.

I agree, that many people can use it without any memory problem. I just have doubts on the ratio: usefulness/cost. Could you try to find a size of diff between months?

memory size / load time / stream

I made a map from IP address to filestream position. (u32 -> u32)
But that map by itself already costs 200MB memory for the city-resolution db, more than the entire filecontents!

So I rewrote it to use an istream.
This brings down the memory use from 15MB/500MB to actually 0MB.
This also allows returning the entire dataset of the given IP instead of dropping data selectively to improve the memory footprint.

However, since I didn't implement a ZIP stream (yet?), it means that the entire file is read into memory as one big string every time an IP is looked up.
Since the csv files are sorted by key, binary search was implemented.
These steps combined take one second for the City-resoluted db with my disk and CPU (when it took 13 seconds when storing and parsing everything).

So only implementing a (non-stringstream-workaround) zip-filestream is remaining. That will be an adventurous.
(Even if that would be unfeasible or impossible, the VFS istream using the stringstream of the filecontents and fstream for unzipped files could still be useful for the commands.txt replay parser (which can't use the VFS until it has access to streams, for performance reasons too).)

elexis added a comment.EditedFeb 2 2019, 1:42 PM

Update:
Added VFS::GetFileStream(const VfsPath& pathname, shared_ptr<std::istream>& stream) that returns

  • a std::ifstream from RealDirectory::GetStream(const OsPath& name, shared_ptr<std::istream>& stream) and
  • an ArchiveFile_Zip::GetStream(const OsPath& name, shared_ptr<std::istream>& stream) for files in uncompressed or deflate compressed zip archives.

For now the archive stream is provided by a std::stringstream after decompressing the zipped file, satisfying the zipped replay use case even though with unoptimized performance.
That is until I succeeded to implement a sequential or random access std::basic_streambuf<char> aka streambuf named ArchiveReader_ZipStream to satisfy any avoidable cpu or memory investment for zipped databases such as replaymenu, geolite2.

Links:
std::streambuf:
http://www.voidcn.com/article/p-vjnlygmc-gy.html
https://artofcode.wordpress.com/2010/12/12/deriving-from-stdstreambuf/

Deflate / Zip:
https://www.zlib.net/manual.html
https://en.wikipedia.org/wiki/DEFLATE
https://www.w3.org/Graphics/PNG/RFC-1951
https://github.com/mateidavid/zstr/blob/master/src/zstr.hpp
http://lh3.github.io/2014/07/05/random-access-to-zlib-compressed-files
https://github.com/madler/zlib/blob/master/examples/zran.c

elexis retitled this revision from Network Dialog to Network Dialog / Troll removal device.Feb 9 2019, 2:31 PM

(done, will upload some day)