Page MenuHomeWildfire Games

Modified CFixedVector2D/3D that cache length to reduce calls to isqrt64(), plus a few other fixes
Needs ReviewPublic

Authored by DanW58 on Jan 23 2021, 8:17 AM.
This revision needs review, but there are no reviewers specified.



Using CFixedVector3D as example, it has 3 member, X, Y and Z. A member function Length() computes length as sqrt(x*x+y*y+z*z). The sqrt is computed in 64-bit precision, in code, and I see no way to optimize it off the bat. What I did was to add a fourth data member, L, that caches the computed length. A value of 0 for L means it probably needs to be updated. If it is zero when a call of Length() comes, I first check that X, Y and Z are not all zero; if they are, then L=0 is correct; otherwise L needs to be computed. If L is not zero I return it. Any non-const functions clear L to zero, to indicate that the cached value needs updating.

Test Plan

I'm new here, and fairly new to testing. However, it does compile, link and pass the standard 351 tests suite; plus it plays quite well.
Lint may give a warning of member L being mutable. It has to be, as the function Length() is declared const, but recalculation of L happens in Length().
Other than that, what this needs is profiling. The profiling could be done counting calls to isqrt64() over a minute. However, I'm not sure if a game playback would be a good way, as probably pathfinding is turned off during playbacks. I'm just guessing. Profiling of general performance of the engine might also be called for. FixedVector3D had no alignment padding in it; the new member, L, makes its size power of two, which should improve performance.

Diff Detail

Lint Skipped
Unit Tests Skipped

Event Timeline

DanW58 requested review of this revision.Jan 23 2021, 8:17 AM
DanW58 created this revision.
Freagarach set the repository for this revision to rP 0 A.D. Public Repository.Jan 23 2021, 8:18 AM
DanW58 updated this revision to Diff 15634.Jan 23 2021, 8:32 AM

Congrats on your first patch :)
A few general remarks:

  • Not everything is specified, but we have some general conventions here:
    • We write comments capitalised + dotted
  • from reading your forum posts, I believe you're used to 'older' C++, this has no real impact here but we're using C++17
  • we don't really do the "author: some comment" thing, though I guess you might intend those for review only. In that case, you can also leave inline comments on Phabricator.
  • you appear to have a few whitespace issues with the patch.

Now onto the patch, I think the idea has potential, but I don't think we'll merge it as is, it's a large diff that might have negative performance impacts because of the bigger size of the vector, and the performance improvements are muddied by that + the getters are just visual noise in many cases.
What I think you should try is to make another Fixed class that does this length maintenance, and use that one where your optimisation would bring benefits (of course, you'd have to find likely candidates).

To profile the game, you have a few options. The first thing to note is that replaying is deterministic and does run all the engine stuff that the game usually runs (replays are just a list of commands).
Because of that, you can use pyrogenesis -replay='' or pyrogenesis -replay-visual='' and a link to a replay file (see
The first will do the replay non-visually, i.e. without GUI/graphics, concentrating only on the engine itself, which is usually what you need for profiling these kind of changes.
Note that AI replays work, so you can use that, they're just slower because the AI is also recomputed.

Then on to actually profiling. You can of course use your perf of choice, there are also bundled profilers. Profiler2 is the one I'd recommend using:
It comes with some functions already profiled, and then you can add your own calls if you need to.


We're using c++17 now, so you can alignas to specify this:

Stan set the repository for this revision to rP 0 A.D. Public Repository.Jan 23 2021, 11:44 AM
DanW58 updated this revision to Diff 15639.Jan 23 2021, 1:26 PM
  1. Removed comments.
  2. Fixed the tabs that were spaces issue, courtesy of CodeLite
  3. Added alignment directives
  4. Removed padding member
  5. svn diff'ed from the instructions in the wiki

Can you please remove the "0ad" at the beginning of your file paths and start at "source" please?
I made some performance profiling, one time with heavy graphics load and one time with heavy movement load. I didn't see any performance improvements. Are there any other scenarios I should test?

Optimus, I didn't even know I could use svn inside subfolders. I used svn before, for two years, but
it was through a gui, and all I did was check out and commit.
So, I tried making a diff, but I forgot that my code's current state is far from compiling.
I'm turning all the vector classes (AND my point classes below them) into template classes, where the
template parameter is a "Frame of Reference" abstract type, so as to prevent, for example, subtracting a
screen-coordinate vector from a terrain point and interpreting the result as a mini-map coord. Mixing
frames of reference won't be allowed except through deliberate channels. Here's a sneak preview:

"Frames Of Reference" common abstract type:
struct FOR // --Frames Of Reference' common abstract type

virtual ~FOR() = 0;

/ Abstract Frames of Reference to be used as template params
The first FOR is a null stub for the second to reference.
struct C____NO____FOR : public FOR //The "NO" frame of reference

typedef            C____NO____FOR                     prevFOR_t;
typedef            C____NO____FOR              representation_t;  // --none--
typedef            C____NO____FOR                    fwdXform_t;  //NO transform
typedef            C____NO____FOR                    bakXform_t;  //NO transform

private: virtual ~C____NO____FOR() = 0; can't be instanced or inherited
struct CModelSpaceFOR : public FOR
No need to id models; only one used at a time

typedef C____NO____FOR                               prevFOR_t;  // preceded by nothing
typedef CVector3D<CModelSpaceFOR>             representation_t;  //float vec 3
typedef CTransform<C____NO____FOR,CModelSpaceFOR>   fwdXform_t;  //NO transform
typedef CTransform<CModelSpaceFOR,C____NO____FOR>   bakXform_t;  //NO transform

private: virtual ~CModelSpaceFOR() = 0;
struct CWorldSpaceFOR : public FOR //This is the virtual world's 3D space

typedef CModelSpaceFOR                               prevFOR_t;  // preceded by Models' space
typedef CFixedVector3D<CWorldSpaceFOR>        representation_t;  //fixed-point vec 3
typedef CTransform<CModelSpaceFOR,CWorldSpaceFOR>   fwdXform_t;  //one for each entity
typedef CTransform<CWorldSpaceFOR,CModelSpaceFOR>   bakXform_t;  //what's at location queries,
    // or what part of a model is hit by a mouse-pointer ray

private: virtual ~CWorldSpaceFOR() = 0;
struct CViewFrustrFOR : public FOR //This is view frustrum space, CPU-GPU

typedef CWorldSpaceFOR                               prevFOR_t;  // preceded by world space
typedef CVector3D<CViewFrustrFOR>             representation_t;  //float vec 3
typedef CTransform<CWorldSpaceFOR,CViewFrustrFOR>   fwdXform_t;  //camera/water-reflect
typedef CTransform<CViewFrustrFOR,CWorldSpaceFOR>   bakXform_t;  //mouse cursor query

private: virtual ~CMiniMapCoorFOR() = 0;
struct CMiniMapCoorFOR : public FOR

typedef CWorldSpaceFOR                               prevFOR_t;
typedef CFixedVector2D<CMiniMapCoorFOR>       representation_t;  //fixed-point vec 2
typedef CTransform<CWorldSpaceFOR,CMiniMapCoorFOR>  fwdXform_t;  //Scaling from world X,Z
typedef CTransform<CMiniMapCoorFOR,CWorldSpaceFOR>  bakXform_t;  //from minimap-to-world

private: virtual ~CMiniMapCoorFOR() = 0;
struct CScreenPerspFOR : public FOR

typedef CViewFrustrFOR                               prevFOR_t;
typedef CFixedVector2D<CScreenPerspFOR>       representation_t;  //fixed-point vec 2
typedef CTransform<CViewFrustrFOR,CScreenPerspFOR>  fwdXform_t;  //mostly GPU work
typedef CTransform<CScreenPerspFOR,CViewFrustrFOR>  bakXform_t;  //mouse cursor query

private: virtual ~CScreenPerspFOR() = 0;
struct CScrnOverlayFOR : public FOR

typedef CMiniMapCoorFOR                              prevFOR_t;
typedef CFixedVector2D<CScrnOverlayFOR>       representation_t;  //fixed-point vec 2
typedef CTransform<CScrnOverlayFOR,CScrnOverlayFOR> fwdXform_t;  //2D scale/translate
typedef CTransform<CScrnOverlayFOR,CScrnOverlayFOR> bakXform_t;  //mouse-to-minimap

private: virtual ~CScrnOverlayFOR() = 0;

Thanks for profiling it, I really appreciate the info, even if it's bad news. I have no idea what scenarios.
Most of what I did in regards to performance relates to a time-consuming integer square root, isqrt64().
It is used to compute length of CFixedVector vectors, which I believe are used for game world coords.
That's all I know. If the cacheing of vector length did nothing, at least my using a double sqrt() should
have... But maybe the function doesn't get called as much as I was led to believe. What a pity.

Ah, wait! Did you profile this patch here or the one I submitted in the forum? The one here only has
the length cacheing feature. The one I attached to a forum reply (2.1 Megabytes) has the trick of
using the hardware, double precision sqrt() function inside isqrt64(). Sorry I did not update the
patch here; I was looking for this place and couldn't find it. Now I got the link in the email. But
that other patch is also done from the 0ad folder; and now I can't make another.

Thanks again. When my present work is finished I will submit a new patch, from the sources
folder, and submit it here.

Ah, wait! Did you profile this patch here or the one I submitted in the forum?

I profiled this patch here.

Thanks again. When my present work is finished I will submit a new patch, from the sources
folder, and submit it here.

Thank you. I'm exited to test your new patch :)