Page MenuHomeWildfire Games

[RC] Thread the pathfinder computations
Needs ReviewPublic

Authored by wraitii on Dec 27 2016, 1:51 PM.

Details

Reviewers
Kuba386
Group Reviewers
Restricted Owners Package(Owns No Changed Paths)
Trac Tickets
#4324
Summary

This implements threading of the pathfinder workers.

Details of the architecture:

  • Workers are dispatched paths in a round-robin-ish fashion. It's naive, but it works well enough for now.
  • Paths are then fetched in reverse order and messages are sent.

The Long pathfinder grid is owned by the pathfinder and so safe to use across turns. The obstructions manager' shapes aren't, so protected behind a mutex.

Test Plan

May review code and architecture. Am interested in performance improvements that people would report. An MP game would be great.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
source/simulation2/components/CCmpPathfinder.cpp
921

Too unsafe for me. I'd replace the function by:

template <typename T, typename U>
void CCmpPathfinder::PushRequestsToWorkers(std::vector<T>& from, std::deque<U>& into)

With probably a type validation like std::is_same<U, Async*PathRequest>::value.

source/simulation2/components/CCmpPathfinder_Common.h
183

It'd be good to fix names (to follow CC). Also as it'd be good to use "cache" word as the variables are used inside const methods.

wraitii added inline comments.Apr 29 2019, 10:24 AM
source/simulation2/components/CCmpPathfinder.cpp
98

Sorry, I don't understand what you mean here. You're saying we should implement a "Thread Manager" class at a 'top-level' global that tells us how many threads we want for pathfinding?
I'm not clear how that's much better for now.

Also I somewhat intend to remove this as if we had proper mutex protection it wouldn't be needed.

107

Indeed.

751

You can't use a condition_variable alone to wait on multiple threads. Adding the proper synchronisation would require more code, which has a higher chance of mistakes now and down the line.
I agree that this spins at 100% which is a bit of a waste. What I hope is that using std::this_thread::yield() fixes this issue with an acceptable latency.

921

One can't actually, because I need to call either the longRequest or the shortRequest of the worker in the inner loop. This is basically rewritable as

CCmpPathfinder::PushRequestsToWorkers(std::vector<T>& from, bool toLongRequests)
[...]
if (toLongRequests)
	worker->m_LongRequests.insert([...])
else
	worker->m_ShortRequests.insert([...])

If I pass the worker, I lose the whole point of the function as the for-loop has to be outside the function.

source/simulation2/components/CCmpPathfinder_Common.h
129

Earlier version of the code used a condition_variable, I've changed it since. The thing is I need a m_Computing predicate anyways for the condition-variable, and possibly a mutex (I had issues which look a bit like this).
Again, I probably need a std::this_thread::yield() here, but I think the busy-waiting on an atomic variable is simpler and safer.

183

These were simply moved, and I think I'll refactor this in a VertexPathfinder class anyways that this calls (like the LongPathfinder) instead of what I've done here.

Overall, I think going for the code that's simplest to reason about and most explicit is best. That code is imo 'lock-free atomic code' in this particular instance, and not std::condition_variable which honestly I find absurdly complicated and tricky.

Stan added a comment.Apr 29 2019, 10:58 AM

Sometimes I wonder if we could put examples of our code on stack overflow then watch them fight over it and get the best code :D

Angen added inline comments.Apr 29 2019, 2:47 PM
source/simulation2/components/CCmpPathfinder.cpp
98

well, in this part of code threads are going to be created, they do not exists at this point so no concurency here.

wantedThreads is local (in scope of function) variable - > no problem even with threads as every would have its own unique variable with no option to see or change the one from another thread

g_Atlas & & g_Atlas->running is only reading, as far as I know you can safely read from variable with as many threads as you wish and it is safe but again this function is running in main thread only

vladislavbelov added inline comments.Apr 29 2019, 6:12 PM
source/simulation2/components/CCmpPathfinder.cpp
98

Sorry, I don't understand what you mean here. You're saying we should implement a "Thread Manager" class at a 'top-level' global that tells us how many threads we want for pathfinding?

I mean that the component shouldn't decide about the number of threads and shouldn't communicate with the Atlas directly. We may use ThreadUtil to calculate right number of threads. Because we might have separate threads for other purposes, and we need to manage them somehow.

g_Atlas & & g_Atlas->running is only reading, as far as I know you can safely read from variable with as many threads as you wish and it is safe but again this function is running in main thread only

It's not the syntax error but the semantic for me.

751

Yes, in a direct way, but you can wrap it. Like https://stackoverflow.com/questions/24465533/implementing-boostbarrier-in-c11/24465624#24465624. Or use something like boost::barrier.

Adding the proper synchronisation would require more code, which has a higher chance of mistakes now and down the line.

It's not much more code and it's relatively simple. I'd prefer user's comfort instead of not saving few lines of code. In fact you may freeze the user machine (if it's not so good to predict busy-waiting). Also it may slow down other threads, I mean like renderer one, if it'd exist.

std::this_thread::yield

It's useful but it doesn't guarantee a good stable work. It may wake up every tick or sleep for a long time.

Could you measure how many percents it and threads do busy waiting?

921

If you wanna leave the function loop, I'd prefer to add more code but with an explicit access. Like:

template<typename T>
CCmpPathfinder::PushRequestsToWorkers(std::vector<T>& from)
{
    // ...
    worker->AddRequests<T>(
        std::make_move_iterator(from.end() - amount),
        std::make_move_iterator(from.end())
    );
    // ...
}

template<typename T>
void AsyncPathfinderWorkerThread::AddRequests(... it_begin, ... it_end)
{
    auto& container = RequestsContainer<T>();
    container.insert(container.begin(), std::make_move_iterator(it_begin), std::make_move_iterator(it_end));
}

template<typename T>
void AsyncPathfinderWorkerThread::RequestsContainer();
template<>
std::deque<AsyncShortPathRequest>& RequestsContainer() { return m_ShortRequests; }
// ...

It'd be the same by performance, but safer I think. As you can't pass an arbitrary container member.

wraitii added inline comments.Apr 29 2019, 11:00 PM
source/simulation2/components/CCmpPathfinder.cpp
98

OK, makes sense for ThreadUtil.

751

I've ran some benchmarks:

  • Yield() isn't useful on OS X, and when spin looping hard we are consuming 100% CPU which seems wasteful.
  • condition variables and barriers seem almost faster than spinlocking.

Overall I'll switch to using condition_variable. I think it's a complicated construct, but at the end of the day we can't occupy 100% of a process spin locking.

921

I suppose ultimately this would be if-constexpr-ed. I'll go with your way as it does seem more readable and I didn't think of it.

wraitii updated this revision to Diff 7886.Apr 30 2019, 11:41 PM
wraitii marked 14 inline comments as done.
wraitii retitled this revision from Thread the pathfinder computations to reduce latency to Thread the pathfinder computations.

Updated. This addressed most comments I had.

I've switched to using condition variables and a Barrier-like structure for the threading. As you can see in the following screenshot, this makes thread wakeup a little more random (on my system it appears to be up-to-a-few ms, which should not be a problem if we can do other things in the main thread).

If this is a problem, we could go use a more complicated binary spinlock/condition_variable system I suppose.

Also this moves the Vertex pathfinder into its own class. This is a cleanup that we should probably do anyways and that I will probably upload in a separate diff. It's pretty straightforward though.

wraitii added inline comments.Apr 30 2019, 11:45 PM
source/simulation2/components/CCmpPathfinder.cpp
921

Done in a slightly simpler manner.

source/simulation2/components/ICmpPathfinder.h
45 ↗(On Diff #7886)

Not sure this is actually needed there in this version of the patch.

Successful build - Chance fights ever on the side of the prudent.

Linter detected issues:
Executing section Source...

source/ps/ThreadUtil.h
|   1| /*·Copyright·(C)·2010·Wildfire·Games.
|    | [NORMAL] LicenseYearBear:
|    | License should have "2019" year instead of "2010"

source/simulation2/helpers/Pathfinding.h
|   1| /*·Copyright·(C)·2016·Wildfire·Games.
|    | [NORMAL] LicenseYearBear:
|    | License should have "2019" year instead of "2016"

source/simulation2/helpers/Render.cpp
|   1| /*·Copyright·(C)·2017·Wildfire·Games.
|    | [NORMAL] LicenseYearBear:
|    | License should have "2019" year instead of "2017"

source/simulation2/helpers/VertexPathfinder.cpp
|   1| /*·Copyright·(C)·2019·Wildfire·Games.
|    | [NORMAL] LicenseYearBear:
|    | License should have "2017" year instead of "2019"

source/simulation2/Simulation2.cpp
|   1| /*·Copyright·(C)·2018·Wildfire·Games.
|    | [NORMAL] LicenseYearBear:
|    | License should have "2019" year instead of "2018"

source/ps/ThreadUtil.cpp
|   1| /*·Copyright·(C)·2010·Wildfire·Games.
|    | [NORMAL] LicenseYearBear:
|    | License should have "2019" year instead of "2010"

source/simulation2/components/CCmpPathfinder.cpp
|   1| /*·Copyright·(C)·2018·Wildfire·Games.
|    | [NORMAL] LicenseYearBear:
|    | License should have "2019" year instead of "2018"
Executing section JS...
Executing section cli...

Link to build: https://jenkins.wildfiregames.com/job/differential/1309/display/redirect

Way better. Can't see any performance loss compared to svn anymore. 👍

Stan added inline comments.May 1 2019, 10:49 AM
source/ps/ThreadUtil.h
23

Capital F ?

25

Separate file ?

source/simulation2/Simulation2.cpp
543 ↗(On Diff #7142)

Maybe it deserves a comment ?

source/simulation2/components/CCmpPathfinder.cpp
725

Merge loops ?

741

What does this function do ? Name seems misleading.

763

Why the scope ?

source/simulation2/components/CCmpPathfinder_Common.h
129

Should or is ?

155

Initialization in Init ?

source/simulation2/components/tests/test_Pathfinder.h
194 ↗(On Diff #7142)

?

source/simulation2/helpers/HierarchicalPathfinder.cpp
402 ↗(On Diff #7886)

Shouldn't that have been added back ?

451 ↗(On Diff #7886)

Shouldn't that have been added back ?

source/simulation2/helpers/Render.cpp
38 ↗(On Diff #7886)

Shouldn't that have been added back ?

source/simulation2/helpers/VertexPathfinder.cpp
458

Couldn't we pass an object/ struct here instead ?

596–597

@vladislavbelov Good usage for clamp below, no ?

wraitii updated this revision to Diff 7891.May 1 2019, 10:52 AM

Cleaned-up of D1853, D1954 and D1855. This needs those patches to apply.

wraitii marked 5 inline comments as done.May 1 2019, 10:59 AM
wraitii added inline comments.
source/ps/ThreadUtil.h
25

I think ThreadUtil needs a complete refactoring (it redefines std::mutex, basically) and I'll do it then I suppose.

source/simulation2/components/CCmpPathfinder.cpp
741

It's just there to be specialised, and it pushes requests so I don't think so?

763

lock gets release automatically at the end of the scope.

source/simulation2/components/CCmpPathfinder_Common.h
129

Should. It also is, but it should be and remain so in the future.

155

I think it's better to init in the definition whenever possible going forward. Not sure if that's a convention we have or not?

source/simulation2/helpers/HierarchicalPathfinder.cpp
402 ↗(On Diff #7886)

nah, who cares about the time profiling debug overlay honestly.

source/simulation2/helpers/Render.cpp
38 ↗(On Diff #7886)

Likewise: I don't think it really matters to profile this unless you're actually looking at how long this takes, but it's a function used by debug code and profiling that seems useless.

Stan added inline comments.May 1 2019, 11:12 AM
source/simulation2/components/CCmpPathfinder.cpp
741

That line doesn't pushes request does it ?

source/simulation2/helpers/Render.cpp
38 ↗(On Diff #7886)

Ah okay :)

Build failure - The Moirai have given mortals hearts that can endure.

Link to build: https://jenkins.wildfiregames.com/job/differential/1314/display/redirect

wraitii marked 6 inline comments as done.May 1 2019, 11:35 AM
wraitii added inline comments.
source/simulation2/components/CCmpPathfinder.cpp
741

It inserts them at the end (by moving), so that's similar to pushing.

Stan added inline comments.May 1 2019, 11:40 AM
source/simulation2/components/CCmpPathfinder.cpp
741

static_assert(sizeof(T) == 0, "Only specializations can be used"); What part of that statement does that ? Oo There is only one line in that function

wraitii marked 2 inline comments as done.May 1 2019, 12:06 PM
wraitii added inline comments.
source/simulation2/components/CCmpPathfinder.cpp
741

As said above, this is just the template function which gets specialised by L709/714. This base template cannot be used (it's not implemented but there's no clean way to write that because of issues in XCode/VS, see https://stackoverflow.com/a/37593094).

Stan added inline comments.May 1 2019, 12:07 PM
source/simulation2/components/CCmpPathfinder.cpp
741

Ah thanks for the explanation :)

wraitii updated this revision to Diff 8148.May 25 2019, 10:56 PM
wraitii marked an inline comment as done.

Cleaned up of D1918.

I've switched to using static, thread-local variables for cached variables. This is a tad ugly until C++ 17 get around the corner (as we won't have to initialise the static members out of line then).

The pro is that it makes it irrelevant to the helpers whether we are threaded or not, making their implementation much easier and foolproof. The con is that one cannot use debug with the threading (well one can but it would never show anything), so you have to de-activate threading for that. Possibly this could be done at runtime, though this isn't implemented yet.

I suspect this has a small performance overhead - I'm assuming it's irrelevant here but it should be profiled.

Build failure - The Moirai have given mortals hearts that can endure.

Link to build: https://jenkins.wildfiregames.com/job/differential/1504/display/redirect

wraitii added inline comments.Jun 10 2019, 9:16 AM
source/simulation2/components/CCmpPathfinder.cpp
842

Realised last night that I need to make sure the results are ordered consistently for threaded/non-threaded/different # of threads here. I think they are by accident right now, but and "ENSURE (is_sorted)" is probably worth adding.

wraitii updated this revision to Diff 9019.Jul 20 2019, 7:06 PM
wraitii added a reviewer: Restricted Owners Package.

Rebased, cleaned up slightly.

Works, still need a mutex for Obstructions but it works, it's only for sanity.

So anybody willing - please test this.

Build failure - The Moirai have given mortals hearts that can endure.

Link to build: https://jenkins.wildfiregames.com/job/docker-differential/152/display/redirect

wraitii updated this revision to Diff 9042.Jul 21 2019, 7:23 PM

Mutex obstructions. This is RC, but we need to agree on whether or not this is acceptable.

wraitii added inline comments.Jul 21 2019, 7:25 PM
binaries/data/mods/public/gui/options/options.json
89

erf, also need to change this.

source/ps/ThreadUtil.cpp
50

well, RC except for this bit which I can now remove :p

Build failure - The Moirai have given mortals hearts that can endure.

Link to build: https://jenkins.wildfiregames.com/job/docker-differential/176/display/redirect

i tried doesnt work for me i watch pyrogenesis thread in top program of linux and have put general options number of thread to 3 but it doesnt show more than one thread at 100% cpu in top so i dont get it to work. any suggestions wut to do more to get them running? i put D1918 and D14 as diff on my a24 svn version r22642.

ffffffff added a comment.EditedMon, Aug 12, 10:18 AM

ok its working now but not as expected but i havent look deeply into code now just can say that one thread (main thread) is going when testing pathfinding with 6-16 threads with 1000-2000 units moving in a map very fast to 100% but the other threads are showing only small up usually 1-3% cpu usage some times 20% but very rare and though because main thread is at 100% exhausting and seem to not get more cpu time above 100% it starts lagging ofc and fps dropping even if non visual window (window hidden to taskbar). So i was expecting that the pathfinder work would be go well across all threads and especially not touching the main thread any more because there is also then graphics computation and other game logic stuff hitted which results again in fps droppage and lag game which should be avoided. So i think that would be the first step. then we have to look.

As an FYI: I confirm this is working on my machine, but don't expect all 4 cores to be 100% CPU, pathfinding is only a small part of it.

This screenshot of Profiler2 shows 3 pathfinder threads computing paths: as you can see, it's only a fraction of the total time.


This is the summary of the pathfinder thread activity.

This patch doesn't fix lag, nor does it magically make 0 A.D. multi-threaded, but it helps significantly against pathing lag spikes.

wraitii retitled this revision from Thread the pathfinder computations to [RC] Thread the pathfinder computations.Mon, Aug 12, 6:41 PM
wraitii edited the summary of this revision. (Show Details)
wraitii updated the Trac tickets for this revision.