Using CFixedVector3D as example, it has 3 member, X, Y and Z. A member function Length() computes length as sqrt(x*x+y*y+z*z). The sqrt is computed in 64-bit precision, in code, and I see no way to optimize it off the bat. What I did was to add a fourth data member, L, that caches the computed length. A value of 0 for L means it probably needs to be updated. If it is zero when a call of Length() comes, I first check that X, Y and Z are not all zero; if they are, then L=0 is correct; otherwise L needs to be computed. If L is not zero I return it. Any non-const functions clear L to zero, to indicate that the cached value needs updating.
Details
- Reviewers
- None
I'm new here, and fairly new to testing. However, it does compile, link and pass the standard 351 tests suite; plus it plays quite well.
Lint may give a warning of member L being mutable. It has to be, as the function Length() is declared const, but recalculation of L happens in Length().
Other than that, what this needs is profiling. The profiling could be done counting calls to isqrt64() over a minute. However, I'm not sure if a game playback would be a good way, as probably pathfinding is turned off during playbacks. I'm just guessing. Profiling of general performance of the engine might also be called for. FixedVector3D had no alignment padding in it; the new member, L, makes its size power of two, which should improve performance.
Diff Detail
- Lint
Lint Skipped - Unit
Unit Tests Skipped
Event Timeline
Congrats on your first patch :)
A few general remarks:
- Not everything is specified, but we have some general conventions here: https://trac.wildfiregames.com/wiki/Coding_Conventions
- We write comments capitalised + dotted
- from reading your forum posts, I believe you're used to 'older' C++, this has no real impact here but we're using C++17
- we don't really do the "author: some comment" thing, though I guess you might intend those for review only. In that case, you can also leave inline comments on Phabricator.
- you appear to have a few whitespace issues with the patch.
Now onto the patch, I think the idea has potential, but I don't think we'll merge it as is, it's a large diff that might have negative performance impacts because of the bigger size of the vector, and the performance improvements are muddied by that + the getters are just visual noise in many cases.
What I think you should try is to make another Fixed class that does this length maintenance, and use that one where your optimisation would bring benefits (of course, you'd have to find likely candidates).
To profile the game, you have a few options. The first thing to note is that replaying is deterministic and does run all the engine stuff that the game usually runs (replays are just a list of commands).
Because of that, you can use pyrogenesis -replay='' or pyrogenesis -replay-visual='' and a link to a replay file (see https://trac.wildfiregames.com/wiki/GameDataPaths).
The first will do the replay non-visually, i.e. without GUI/graphics, concentrating only on the engine itself, which is usually what you need for profiling these kind of changes.
Note that AI replays work, so you can use that, they're just slower because the AI is also recomputed.
Then on to actually profiling. You can of course use your perf of choice, there are also bundled profilers. Profiler2 is the one I'd recommend using: https://trac.wildfiregames.com/wiki/Profiler2
It comes with some functions already profiled, and then you can add your own calls if you need to.
0ad/source/maths/FixedVector2D.h | ||
---|---|---|
53 | We're using c++17 now, so you can alignas to specify this: https://docs.microsoft.com/fr-fr/cpp/cpp/alignment-cpp-declarations?view=msvc-160 |
- Removed comments.
- Fixed the tabs that were spaces issue, courtesy of CodeLite
- Added alignment directives
- Removed padding member
- svn diff'ed from the instructions in the wiki
Can you please remove the "0ad" at the beginning of your file paths and start at "source" please?
I made some performance profiling, one time with heavy graphics load and one time with heavy movement load. I didn't see any performance improvements. Are there any other scenarios I should test?
Optimus, I didn't even know I could use svn inside subfolders. I used svn before, for two years, but
it was through a gui, and all I did was check out and commit.
So, I tried making a diff, but I forgot that my code's current state is far from compiling.
I'm turning all the vector classes (AND my point classes below them) into template classes, where the
template parameter is a "Frame of Reference" abstract type, so as to prevent, for example, subtracting a
screen-coordinate vector from a terrain point and interpreting the result as a mini-map coord. Mixing
frames of reference won't be allowed except through deliberate channels. Here's a sneak preview:
////////////
"Frames Of Reference" common abstract type:
////////////
struct FOR // --Frames Of Reference' common abstract type
{
protected:
virtual ~FOR() = 0;
}
////////////
/ Abstract Frames of Reference to be used as template params
////////////
The first FOR is a null stub for the second to reference.
struct C____NO____FOR : public FOR //The "NO" frame of reference
{
typedef C____NO____FOR prevFOR_t; typedef C____NO____FOR representation_t; // --none-- typedef C____NO____FOR fwdXform_t; //NO transform typedef C____NO____FOR bakXform_t; //NO transform
private: virtual ~C____NO____FOR() = 0; can't be instanced or inherited
}
struct CModelSpaceFOR : public FOR No need to id models; only one used at a time
{
typedef C____NO____FOR prevFOR_t; // preceded by nothing typedef CVector3D<CModelSpaceFOR> representation_t; //float vec 3 typedef CTransform<C____NO____FOR,CModelSpaceFOR> fwdXform_t; //NO transform typedef CTransform<CModelSpaceFOR,C____NO____FOR> bakXform_t; //NO transform
private: virtual ~CModelSpaceFOR() = 0;
}
struct CWorldSpaceFOR : public FOR //This is the virtual world's 3D space
{
typedef CModelSpaceFOR prevFOR_t; // preceded by Models' space typedef CFixedVector3D<CWorldSpaceFOR> representation_t; //fixed-point vec 3 typedef CTransform<CModelSpaceFOR,CWorldSpaceFOR> fwdXform_t; //one for each entity typedef CTransform<CWorldSpaceFOR,CModelSpaceFOR> bakXform_t; //what's at location queries, // or what part of a model is hit by a mouse-pointer ray
private: virtual ~CWorldSpaceFOR() = 0;
}
struct CViewFrustrFOR : public FOR //This is view frustrum space, CPU-GPU
{
typedef CWorldSpaceFOR prevFOR_t; // preceded by world space typedef CVector3D<CViewFrustrFOR> representation_t; //float vec 3 typedef CTransform<CWorldSpaceFOR,CViewFrustrFOR> fwdXform_t; //camera/water-reflect typedef CTransform<CViewFrustrFOR,CWorldSpaceFOR> bakXform_t; //mouse cursor query
private: virtual ~CMiniMapCoorFOR() = 0;
}
struct CMiniMapCoorFOR : public FOR
{
typedef CWorldSpaceFOR prevFOR_t; typedef CFixedVector2D<CMiniMapCoorFOR> representation_t; //fixed-point vec 2 typedef CTransform<CWorldSpaceFOR,CMiniMapCoorFOR> fwdXform_t; //Scaling from world X,Z typedef CTransform<CMiniMapCoorFOR,CWorldSpaceFOR> bakXform_t; //from minimap-to-world
private: virtual ~CMiniMapCoorFOR() = 0;
}
struct CScreenPerspFOR : public FOR
{
typedef CViewFrustrFOR prevFOR_t; typedef CFixedVector2D<CScreenPerspFOR> representation_t; //fixed-point vec 2 typedef CTransform<CViewFrustrFOR,CScreenPerspFOR> fwdXform_t; //mostly GPU work typedef CTransform<CScreenPerspFOR,CViewFrustrFOR> bakXform_t; //mouse cursor query
private: virtual ~CScreenPerspFOR() = 0;
}
struct CScrnOverlayFOR : public FOR
{
typedef CMiniMapCoorFOR prevFOR_t; typedef CFixedVector2D<CScrnOverlayFOR> representation_t; //fixed-point vec 2 typedef CTransform<CScrnOverlayFOR,CScrnOverlayFOR> fwdXform_t; //2D scale/translate typedef CTransform<CScrnOverlayFOR,CScrnOverlayFOR> bakXform_t; //mouse-to-minimap
private: virtual ~CScrnOverlayFOR() = 0;
}
.....
Thanks for profiling it, I really appreciate the info, even if it's bad news. I have no idea what scenarios.
Most of what I did in regards to performance relates to a time-consuming integer square root, isqrt64().
It is used to compute length of CFixedVector vectors, which I believe are used for game world coords.
That's all I know. If the cacheing of vector length did nothing, at least my using a double sqrt() should
have... But maybe the function doesn't get called as much as I was led to believe. What a pity.
Ah, wait! Did you profile this patch here or the one I submitted in the forum? The one here only has
the length cacheing feature. The one I attached to a forum reply (2.1 Megabytes) has the trick of
using the hardware, double precision sqrt() function inside isqrt64(). Sorry I did not update the
patch here; I was looking for this place and couldn't find it. Now I got the link in the email. But
that other patch is also done from the 0ad folder; and now I can't make another.
Thanks again. When my present work is finished I will submit a new patch, from the sources
folder, and submit it here.
I profiled this patch here.
Thanks again. When my present work is finished I will submit a new patch, from the sources
folder, and submit it here.
Thank you. I'm exited to test your new patch :)