In D4366#185970, @vladislavbelov wrote:Why not 320Hz or 360Hz?
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Feed Advanced Search
Advanced Search
Advanced Search
Dec 5 2021
Dec 5 2021
OptimusShepard retitled D4366: Increase the frame limiter maximum up to 360fps from Increase the frame limiter maximum up to 240fps to Increase the frame limiter maximum up to 360fps.
Changed the maximum to 360 as current top end monitors support 360Hz.
Feb 27 2021
Feb 27 2021
OptimusShepard added a comment to D2936: Allow limiting the max number of corpses simultaneously visible in the game.
In D2936#157405, @nani wrote:X axis: Is this for every turn or every frame?
I'm not sure
Y axis: Is this the total render+simulation+gui+etc?
Yep.
If you want to see a difference you must split for each category, in the case for this diff the improvements are in the render category. I kind of see a difference between the two but don't look much diferent which means that all the noise from the other categories are drowning the render values. I tested this with the in-game profiler with AutoCiv mod corpse limit implementation so I can guarantee that there is a big improvement. In the case you are still not sold on it :) you can do a fast test with autociv (a23 or a24, doesn't matter) and compare, no even need for profiling as it very obvious.
Renderer: Red vanilla, green 100, blue 50
OptimusShepard added a comment to D2936: Allow limiting the max number of corpses simultaneously visible in the game.
Feb 19 2021
Feb 19 2021
In D3554#156273, @wraitii wrote:I feel like you might have an easier time with "Enable"/"Disable" instead of Toggle, but otherwise this looks OK :)
Feb 18 2021
Feb 18 2021
OptimusShepard accepted D3578: Do not generate render data in case Decal calculated wrong coordinates.
Patch works for me, can't reproduce the bug anymore. Thx
Feb 14 2021
Feb 14 2021
In D3505#156053, @Stan wrote:Missing the XML file?
Adding triggers for start, save and stop recording. Shorting the cinematic path.
Feb 13 2021
Feb 13 2021
Apply Imarok's comment.
Adding a new version only for testing. Work in progress.
The functionality can be tested, running the attached map.
skirmishes.zip677 KBDownload
Feb 10 2021
Feb 10 2021
In D3554#155633, @Stan wrote:Where do you try to call them? You need to register them differently depending on whether you want them on the gui or the simulation
Feb 7 2021
Feb 7 2021
Feb 5 2021
Feb 5 2021
In D3522#154824, @vladislavbelov wrote:Could you attach the sources?
I take the map from D3505, profil start at 2s end at 120s.
data.zip6 MBDownload
I can't notice a performance improvement. But I only checked the profiler2.
red vanilla, green patched
Feb 3 2021
Feb 3 2021
I made some profiling with my new benchmark map.
red vanilla, green deque, blue vector
profile_data.zip10 MBDownload
Feb 2 2021
Feb 2 2021
In D3505#154314, @wraitii wrote:I don't think it's that necessary to add pathfinding calls and such, to be honest, except perhaps as a way to profile animation code.
Jan 31 2021
Jan 31 2021
Jan 28 2021
Jan 28 2021
OptimusShepard added a comment to D3454: Modified CFixedVector2D/3D that cache length to reduce calls to isqrt64(), plus a few other fixes.
In D3454#153884, @DanW58 wrote:Ah, wait! Did you profile this patch here or the one I submitted in the forum?
I profiled this patch here.
Thanks again. When my present work is finished I will submit a new patch, from the sources
folder, and submit it here.
Thank you. I'm exited to test your new patch :)
Jan 27 2021
Jan 27 2021
OptimusShepard added a comment to D3454: Modified CFixedVector2D/3D that cache length to reduce calls to isqrt64(), plus a few other fixes.
Can you please remove the "0ad" at the beginning of your file paths and start at "source" please?
I made some performance profiling, one time with heavy graphics load and one time with heavy movement load. I didn't see any performance improvements. Are there any other scenarios I should test?
Jan 25 2021
Jan 25 2021
Would have been nice, if you explained, that this is the old Persian form of "Xerxes". Xšayaṛša isn't really known ;)
Jan 24 2021
Jan 24 2021
Seems like a more useful place. But maybe you should rename the files to simd.cpp/h for more general use and preparation for future usage of AVX/AVX2?
Jan 21 2021
Jan 21 2021
Jan 18 2021
Jan 18 2021
Update float values 0 -> 0.f (See comment)
Removes the pointer switch and replaces it with a macro condition.
First, Linux loses performance by using the SSE implementation. Therefore, this implementation is now only for Windows.
Second, we lose too much performance by using pointers. We already use SSE2 flags for Windows. For this reason, the pointers have been removed and replaced by Windows + SSE macro condition.
Jan 17 2021
Jan 17 2021
In D3400#150863, @Imarok wrote:Tried setting three waypoints? first to somewhere, second to the other market, third to somewhere else?
(That gives quite strange behaviour.)
Jan 16 2021
Jan 16 2021
OptimusShepard raised a concern with rP24215: Let players remap hotkeys in-game, fix default hotkeys being qwerty-specific..
Causes #5922
Jan 13 2021
Jan 13 2021
@vladislavbelov any plans to commit this improvement?
Jan 11 2021
Jan 11 2021
Solved in rP24550.
No longer needed as rP24550 has been committed.
Dec 20 2020
Dec 20 2020
In D2857#143713, @Stan wrote:You need to compare generated assembly too :)
So here ist the generated assembly for matrix multiplication, SSE2 flag enabled, MSVC2017.
Dec 19 2020
Dec 19 2020
In D2857#137872, @wraitii wrote:I'm not convinced TBH. If this is hardcoded at compile-time, either we drop support or it's pretty much useless for releases. SIMD-capable compilers seem able to vectorise this functions, so custom versions don't seem particularly useful.
If there was a runtime switch that actually increased performance, might be more interesting.
I will wait until @Stan has finished D3212. I plan, to implement only the matrix multiplication, like the other SSE implementations are done. So the SSE path can be chosen at runtime. Would that be an option?
Made some performance profiling on MSVC2017, SSE2 flag enabled, Ryzen 3700X.
Dec 18 2020
Dec 18 2020
In D3212#143216, @Stan wrote:It seems sadly that gcc on Optimus' computer doesnt take fully advantage of it either
GCC 10.2.0
I despise the OS solution however because it leads to unecessary macro complexity.
Maybe it's less complexity, if we only care on the architecture? So always use the SSE functions and flag on x86. As you already said, Windows doesn't support non SSE2 CPUs. If someone is using such an 20 years old, non SSE Linux PC, the computer is already to slow to run the game. So ignore them.
Dec 13 2020
Dec 13 2020
In D3212#142196, @Stan wrote:It's like 1ms slower?
In D3212#142191, @Stan wrote:I don't understand does the current patch make things slower for you?.
Yep, the patch slowdown the performance for me.
By default the flag is already set to SSE2. (visual studio says so)
I don't know why, but it's a difference, if I set the flag in Premake instead of Visual Studio. Setting flags in VS the performance doesn't change in any way. Setting SSE2, AVX, or AVX2 flags in Premake improves the performance.
Also your profile data always have that weird offset which makes it hard to see anything
This is because you can only save the profile data at the end, not at the beginning of a session. Different framerates leads than to an offset.
Here the performance comparison of your patch.
frame: red unpatched, green your patch.
In D3212#142027, @Stan wrote:@OptimusShepard is right, as shown in P149, the code is now good (it's possible it wasn't on old MSVC);
Dec 12 2020
Dec 12 2020
Maybe I'm a bit off topic, but do we really need those SSE functions? Can't that be done via compiler flags by the compiler itself? I think we should always compile with SSE on x86. SSE was introduced on Pentium III 1999, I guess nobody does use 21 year old hardware?
Would be also nice if wildfiregames could offer a AVX or a AVX2 release version besides the SSE version. I guess, replacing the SSE compiler flags by AVX/AVX2 flags shouldn't be that effort?
Dec 6 2020
Dec 6 2020
In D14#140491, @OptimusShepard wrote:I made some new profiling.
Dec 5 2020
Dec 5 2020
I made some new profiling.
Green patched - 16 threads, red without patch.
Replay and profiling data see below.
profiling.zip6 MBDownload
Nov 24 2020
Nov 24 2020
OptimusShepard added a comment to D3138: Fix AA / Sharpness not being correctly enabled at the start..
In D3138#138391, @vladislavbelov wrote:That breaks/not fixes MSAA.
Everything works correct again, thx.
The sharpening and anti-aliasing options fail every time the game starts. To get them work, I have always to turn them off and on.
Nov 13 2020
Nov 13 2020
In D2812#135846, @OptimusShepard wrote:I tested it on my notebook Windows 10 Intel 8250U, Intel UHD 620 and on my desktop Windows AMD 3700X, AMD RX 5700. Everything works fine.
I also didn't notice any performance impact on my desktop. Is the whole calculation done by the GPU?
I tested it on my notebook Windows 10 Intel 8250U, Intel UHD 620 and on my desktop Windows AMD 3700X, AMD RX 5700. Everything works fine.
I also didn't notice any performance impact on my desktop. Is the whole calculation done by the GPU?
Nov 11 2020
Nov 11 2020
In D2440#135338, @Stan wrote:I'm gonna push some change, can you try again afterwards ?
In D2440#135335, @Stan wrote:@OptimusShepard is it the same if you disable the props?
I made a new profiling with deactivating instead of the patch, absolutely the same.
I suspect <prop actor="props/units/quiver_greek_back.xml" attachpoint="back"/> to be particularly slow.
You're right, this is the main point.
profiler2 frame: red patched, green only quiver greek back deactivated, blue old
I guess, trees could be interesting too.
Nov 10 2020
Nov 10 2020
I made some profiling on the new map oceanside using athenians archers. I noticed realy big performance improvements when zooming out.
Test conditions: oceanside map, athenians archer, no moving of the units, saved at 4:30min, completely zoomed out, no moving of the camera from it's start position (only zooming out)
profiler2 frame: red old, green patched.
Nov 7 2020
Nov 7 2020
Game works for me. (Ryzen 3700X, Windows)
Seems to, that the patch is needed for some AMD Threaripper CPUs.
Nov 5 2020
Nov 5 2020
The Game still runs on Windows AMD Ryzen 3700X.
Nov 4 2020
Nov 4 2020
I have neither noticed any issues or performance improvements. Windows, AMD Radeon
Oct 11 2020
Oct 11 2020
Reupload of the newest version.
Vendor check bevor skipping the validation.
The workaround is to specific, as it seems to work only for the Ryzens. The Threadrippers still fail. A correct implementation, or a more general solution/workaround is needed.
Aug 22 2020
Aug 22 2020
Patch works for me (Windows 10, RX 5700).
@vladislavbelov your former patch had AA for trees too, without a big performance hit. Does the new performance impact came from the water bugfix?
Also I don't think, we should care about performance hits by trees, as we have FXAA for lower hardware. I see no need for the implementation in the current state, as FXAA + sharpening looks better than MSAA, as long as trees have aliasing. Especially as FXAA and sharpening has no measurable performance impact compared to MSAA.
Aug 8 2020
Aug 8 2020
OptimusShepard added a comment to D2936: Allow limiting the max number of corpses simultaneously visible in the game.
In D2936#128281, @Stan wrote:Dying animation is the corpse ^^
OptimusShepard added a comment to D2936: Allow limiting the max number of corpses simultaneously visible in the game.
A dying animation is not possible here? Or at least a simple visual plop effect?
Aug 7 2020
Aug 7 2020
OptimusShepard awarded D2938: GL_ARB instancing to reduce draw calls a Like token.
In D2938#128125, @wraitii wrote:gpuskinning is an option in default.cfg that you have to activate using the config file. You could try (de)/activating that
Setting it disabled doesn't change anything.
What are you reporting as being 10/20% slower exactly?
The whole frame.
Edit: I feel like your GC might be too good. Can you try capping max_matrix_uniform to maybe 64 in hwdetect.js ?
That's it. Thx
Render: red non patch, green patched.
In D2938#128120, @wraitii wrote:It should be a straight improvement on svn, so this seems odd. How did you replay exactly? Were you using GPU skinning? Were you using prefer glsl? What map?
I always use a 200 units replay on sahydian buttes, camera will not moved or zoomed. What is GPU skinning? GSLS enabled.
Edit -> Also what is your graphics card?
Radeon RX 5700
I also got this compiler warnings
\source\renderer/ModelVertexRenderer.h(158): warning C4100: "model": Unreferenzierter formaler Parameter (Quelldatei wird kompiliert ..\..\..\source\renderer\InstancingModelRenderer.cpp)
\source\renderer/ModelVertexRenderer.h(158): warning C4100: "shader": Unreferenzierter formaler Parameter (Quelldatei wird kompiliert ..\..\..\source\renderer\InstancingModelRenderer.cpp)
\source\renderer/ModelVertexRenderer.h(158): warning C4100: "model": Unreferenzierter formaler Parameter (Quelldatei wird kompiliert ..\..\..\source\renderer\HWLightingModelRenderer.cpp)
\source\renderer/ModelVertexRenderer.h(158): warning C4100: "shader": Unreferenzierter formaler Parameter (Quelldatei wird kompiliert ..\..\..\source\renderer\HWLightingModelRenderer.cpp)
\source\renderer/ModelVertexRenderer.h(158): warning C4100: "model": Unreferenzierter formaler Parameter (Quelldatei wird kompiliert ..\..\..\source\renderer\ModelRenderer.cpp)
\source\renderer/ModelVertexRenderer.h(158): warning C4100: "shader": Unreferenzierter formaler Parameter (Quelldatei wird kompiliert ..\..\..\source\renderer\ModelRenderer.cpp)
I made some quick profiling. I noticed that the amount of draw calls is halfed. I also tested value 1 against 64. 64 is much faster. But compared to the non patched replay, there was a performance hit by ~10-20%.
Does this hit come from your new profiler option "saved draw calls"? I think some calculations of profiler are active, even if you don't have open the profiler panel. Do you have a non profiler version to test?
Aug 6 2020
Aug 6 2020
OptimusShepard added inline comments to D2936: Allow limiting the max number of corpses simultaneously visible in the game.
OptimusShepard added inline comments to D2936: Allow limiting the max number of corpses simultaneously visible in the game.
Jul 25 2020
Jul 25 2020
Jul 24 2020
Jul 24 2020
Edited the sharpness initialization. Disables sharpening, if post progressing, or GLSL, is disabled.
Jul 23 2020
Jul 23 2020
In D14#125536, @wraitii wrote:The smallest "unit" of computation is a single path, so to get an improvement with 16 threads you'd need more than 16 paths per turn. This is possible in intense combat situation (e.g. combat demo huge), but not very likely.
In D14#125537, @Stan wrote:He has 8 cores 8threads :)
Jul 22 2020
Jul 22 2020
Updated the versioning of the used GLSL version.
I made some profiling with different numbers (1, 2, 3, 4, 5, 6, 8, 12, 16) of threads. If I compare the results, I guess, more than 16 threads wont improve the performance further.
pathfinder.zip32 MBDownload
Jul 18 2020
Jul 18 2020
In D2857#123243, @Stan wrote:Can you figure out what's causing the spikes?
Didn't find the cause yet. But I recognized, that the profiler gains these pikes very much. Without the framedrops were much lower.
Jul 10 2020
Jul 10 2020
In D2857#123243, @Stan wrote:I like your approach better but interestingly it's done slightly differently in https://github.com/0ad/0ad/blob/d3e68a99e7f715ad7921a81e959f8ac51dfa1248/source/graphics/Color.cpp
Oh, I think they changing the instructions by runtime, doesn't they? A bit ugly, I think, but we could use FMA :)
Can you figure out what's causing the spikes?
I will try, but on an first look, the profiler doesn't show me something useful.
Updated the year of the license header.
Jul 9 2020
Jul 9 2020
Including the SSE header, because Vulcan fails for the none Windows tests.
Removed the AVX and FMA version, as we don't be able, to change instructions by runtime. Furthermore the AVX instructions aren't faster than SSE here.
I have rewrite the patch, so it uses only SSE. That I have used for the profiling. I will upload it later this day.
I tested the build flags, SSE seems to be the only flag with an positiv impact. AVX2 makes everything worse. I also made some profiling.
Current version:
Jul 3 2020
Jul 3 2020
The hardware request doesn't work, like it should.
Jun 18 2020
Jun 18 2020
In D2780#119381, @vladislavbelov wrote:Also isn't PostProc locked by GLSL? Since we don't have effects for ARB (iirc).
No it isn't. It's currently independent. I have also tried to lock postproc, to lock antialiasing, but that doesn't work. It would only lock postproc. A bit strange I think.
Edit some comments.
Jun 14 2020
Jun 14 2020
Krinkle awarded D2789: SSE Vector3D a Orange Medal token.
Wildfire Games · Phabricator