Page MenuHomeWildfire Games

Thread the pathfinder computations
Needs ReviewPublic

Authored by wraitii on Dec 27 2016, 1:51 PM.

Details

Reviewers
Kuba386
Summary

Ported from #4324 .

See comments for warning.

Test Plan

May review code and architecture. Am interested in performance improvements that people would report. An MP game would be great.

Diff Detail

Repository
rP 0 A.D. Public Repository
Branch
D14_thread_pathfinder
Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 7610
Build 12412: Vulcan BuildJenkins
Build 12411: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Build is unstable, some tests have failed - The Moirai have given mortals hearts that can endure.

Linter detected issues:
Executing section Source...

source/simulation2/helpers/Render.cpp
|   1| /*·Copyright·(C)·2017·Wildfire·Games.
|    | [NORMAL] LicenseYearBear:
|    | License should have "2019" year instead of "2017"

source/simulation2/components/CCmpPathfinder.cpp
|   1| /*·Copyright·(C)·2018·Wildfire·Games.
|    | [NORMAL] LicenseYearBear:
|    | License should have "2019" year instead of "2018"

source/simulation2/Simulation2.cpp
|   1| /*·Copyright·(C)·2018·Wildfire·Games.
|    | [NORMAL] LicenseYearBear:
|    | License should have "2019" year instead of "2018"
Executing section JS...
Executing section cli...

Link to build: https://jenkins.wildfiregames.com/job/differential/1307/display/redirect

In D14#76586, @Imarok wrote:

Hmm, seems like it takes far longer to load a map.
Just quickly tested with Corinthian Istmus (4):
Svn took about 7 seconds.
With D14 it took about 18 seconds and I got a notice from my OS that it seems like the program has hung up and if I want to go on or quit the program.

That's weird. Is it stuck at 100%? The only thing I can see taking more time is the thread creation, but that really shouldn't add 9 seconds

Edit: It also feels laggier than without the patch even though the patch utilizes all my cores at 100%

I need to check if I've reimplemented same-turn stuff correctly, I feel like paths are computed on the next turn from svn.

binaries/data/config/default.cfg
408

You can overwrite and have 20 threads or just 2 even on an 8-core machine if for whatever reason you prefer that.

source/simulation2/components/CCmpPathfinder.cpp
906

This is a bit of a weird syntax in C++, but it's basically worker[into].insert() in JS. The alternative in C++ would be a macro or an explicit IF, but I think that's fine to use.

source/simulation2/helpers/HierarchicalPathfinder.cpp
375 ↗(On Diff #7884)

forgot to remove this before uploading it seems :p

Itms added a comment.Apr 28 2019, 6:24 PM

UGH why can't it display the test results. Back to you.

Itms added inline comments.Apr 28 2019, 6:30 PM
source/simulation2/helpers/HierarchicalPathfinder.cpp
375 ↗(On Diff #7884)

For some reason Jenkins wasn't able to parse this: 😝

LANCELOT 1
LANCELOT 2
LANCELOT 4
LANCELOT 1
LANCELOT 2
LANCELOT 4
LANCELOT 1
LANCELOT 2
LANCELOT 4
<?xml version="1.0" encoding="UTF-8" ?>
<testsuite name="cxxtest" timestamp="Sun Apr 28 18:20:29 2019" tests="318" errors="0" failures="0" time="0" >
...
In D14#76593, @wraitii wrote:
In D14#76586, @Imarok wrote:

Hmm, seems like it takes far longer to load a map.
Just quickly tested with Corinthian Istmus (4):
Svn took about 7 seconds.
With D14 it took about 18 seconds and I got a notice from my OS that it seems like the program has hung up and if I want to go on or quit the program.

That's weird. Is it stuck at 100%? The only thing I can see taking more time is the thread creation, but that really shouldn't add 9 seconds

Yeah, sorry. I measured the time I was stuck at 100%.

In D14#76599, @Imarok wrote:

Yeah, sorry. I measured the time I was stuck at 100%.

Mh, I think it might be because the pathfinder threads are spin-waiting, which will take up a lot of CPU. I guess I should make them sleep for a few ms and/or initialise them later.

Stan added inline comments.Apr 28 2019, 6:52 PM
binaries/data/config/default.cfg
408

Well then the comment is lying :P If you say 0 is auto 1 is no thread which sounds weird but why not it doesn't imply 10 sets 10 threads :)

I didn't review a data flow yet. I'll try to do an analysis of possible locks/concurrency accesses later.

source/simulation2/components/CCmpPathfinder.cpp
96

It's not good to access atlas stuff from simulation. You need to use a common source, like a thread manager. The same for the number of threads.

105

Use std::unique_ptr instead, raw pointers are dangerous.

717

Use std::unique_ptr.

750

I don't think that the busy waiting is good here, why not to use std::condition_variable?

906

Too unsafe for me. I'd replace the function by:

template <typename T, typename U>
void CCmpPathfinder::PushRequestsToWorkers(std::vector<T>& from, std::deque<U>& into)

With probably a type validation like std::is_same<U, Async*PathRequest>::value.

source/simulation2/components/CCmpPathfinder_Common.h
115

Sections should be in order: public > protected > private.

126

Why comments states that the function uses conditional_variable but in fact it uses a simple busy waiting.

180

It'd be good to fix names (to follow CC). Also as it'd be good to use "cache" word as the variables are used inside const methods.

wraitii added inline comments.Apr 29 2019, 10:24 AM
source/simulation2/components/CCmpPathfinder.cpp
96

Sorry, I don't understand what you mean here. You're saying we should implement a "Thread Manager" class at a 'top-level' global that tells us how many threads we want for pathfinding?
I'm not clear how that's much better for now.

Also I somewhat intend to remove this as if we had proper mutex protection it wouldn't be needed.

105

Indeed.

750

You can't use a condition_variable alone to wait on multiple threads. Adding the proper synchronisation would require more code, which has a higher chance of mistakes now and down the line.
I agree that this spins at 100% which is a bit of a waste. What I hope is that using std::this_thread::yield() fixes this issue with an acceptable latency.

906

One can't actually, because I need to call either the longRequest or the shortRequest of the worker in the inner loop. This is basically rewritable as

CCmpPathfinder::PushRequestsToWorkers(std::vector<T>& from, bool toLongRequests)
[...]
if (toLongRequests)
	worker->m_LongRequests.insert([...])
else
	worker->m_ShortRequests.insert([...])

If I pass the worker, I lose the whole point of the function as the for-loop has to be outside the function.

source/simulation2/components/CCmpPathfinder_Common.h
126

Earlier version of the code used a condition_variable, I've changed it since. The thing is I need a m_Computing predicate anyways for the condition-variable, and possibly a mutex (I had issues which look a bit like this).
Again, I probably need a std::this_thread::yield() here, but I think the busy-waiting on an atomic variable is simpler and safer.

180

These were simply moved, and I think I'll refactor this in a VertexPathfinder class anyways that this calls (like the LongPathfinder) instead of what I've done here.

Overall, I think going for the code that's simplest to reason about and most explicit is best. That code is imo 'lock-free atomic code' in this particular instance, and not std::condition_variable which honestly I find absurdly complicated and tricky.

Stan added a comment.Apr 29 2019, 10:58 AM

Sometimes I wonder if we could put examples of our code on stack overflow then watch them fight over it and get the best code :D

Angen added inline comments.Apr 29 2019, 2:47 PM
source/simulation2/components/CCmpPathfinder.cpp
96

well, in this part of code threads are going to be created, they do not exists at this point so no concurency here.

wantedThreads is local (in scope of function) variable - > no problem even with threads as every would have its own unique variable with no option to see or change the one from another thread

g_Atlas & & g_Atlas->running is only reading, as far as I know you can safely read from variable with as many threads as you wish and it is safe but again this function is running in main thread only

vladislavbelov added inline comments.Apr 29 2019, 6:12 PM
source/simulation2/components/CCmpPathfinder.cpp
96

Sorry, I don't understand what you mean here. You're saying we should implement a "Thread Manager" class at a 'top-level' global that tells us how many threads we want for pathfinding?

I mean that the component shouldn't decide about the number of threads and shouldn't communicate with the Atlas directly. We may use ThreadUtil to calculate right number of threads. Because we might have separate threads for other purposes, and we need to manage them somehow.

g_Atlas & & g_Atlas->running is only reading, as far as I know you can safely read from variable with as many threads as you wish and it is safe but again this function is running in main thread only

It's not the syntax error but the semantic for me.

750

Yes, in a direct way, but you can wrap it. Like https://stackoverflow.com/questions/24465533/implementing-boostbarrier-in-c11/24465624#24465624. Or use something like boost::barrier.

Adding the proper synchronisation would require more code, which has a higher chance of mistakes now and down the line.

It's not much more code and it's relatively simple. I'd prefer user's comfort instead of not saving few lines of code. In fact you may freeze the user machine (if it's not so good to predict busy-waiting). Also it may slow down other threads, I mean like renderer one, if it'd exist.

std::this_thread::yield

It's useful but it doesn't guarantee a good stable work. It may wake up every tick or sleep for a long time.

Could you measure how many percents it and threads do busy waiting?

906

If you wanna leave the function loop, I'd prefer to add more code but with an explicit access. Like:

template<typename T>
CCmpPathfinder::PushRequestsToWorkers(std::vector<T>& from)
{
    // ...
    worker->AddRequests<T>(
        std::make_move_iterator(from.end() - amount),
        std::make_move_iterator(from.end())
    );
    // ...
}

template<typename T>
void AsyncPathfinderWorkerThread::AddRequests(... it_begin, ... it_end)
{
    auto& container = RequestsContainer<T>();
    container.insert(container.begin(), std::make_move_iterator(it_begin), std::make_move_iterator(it_end));
}

template<typename T>
void AsyncPathfinderWorkerThread::RequestsContainer();
template<>
std::deque<AsyncShortPathRequest>& RequestsContainer() { return m_ShortRequests; }
// ...

It'd be the same by performance, but safer I think. As you can't pass an arbitrary container member.

wraitii added inline comments.Apr 29 2019, 11:00 PM
source/simulation2/components/CCmpPathfinder.cpp
96

OK, makes sense for ThreadUtil.

750

I've ran some benchmarks:

  • Yield() isn't useful on OS X, and when spin looping hard we are consuming 100% CPU which seems wasteful.
  • condition variables and barriers seem almost faster than spinlocking.

Overall I'll switch to using condition_variable. I think it's a complicated construct, but at the end of the day we can't occupy 100% of a process spin locking.

906

I suppose ultimately this would be if-constexpr-ed. I'll go with your way as it does seem more readable and I didn't think of it.

wraitii updated this revision to Diff 7886.Apr 30 2019, 11:41 PM
wraitii marked 14 inline comments as done.
wraitii retitled this revision from Thread the pathfinder computations to reduce latency to Thread the pathfinder computations.

Updated. This addressed most comments I had.

I've switched to using condition variables and a Barrier-like structure for the threading. As you can see in the following screenshot, this makes thread wakeup a little more random (on my system it appears to be up-to-a-few ms, which should not be a problem if we can do other things in the main thread).

If this is a problem, we could go use a more complicated binary spinlock/condition_variable system I suppose.

Also this moves the Vertex pathfinder into its own class. This is a cleanup that we should probably do anyways and that I will probably upload in a separate diff. It's pretty straightforward though.

wraitii added inline comments.Apr 30 2019, 11:45 PM
source/simulation2/components/CCmpPathfinder.cpp
906

Done in a slightly simpler manner.

source/simulation2/components/ICmpPathfinder.h
45 ↗(On Diff #7886)

Not sure this is actually needed there in this version of the patch.

Successful build - Chance fights ever on the side of the prudent.

Linter detected issues:
Executing section Source...

source/ps/ThreadUtil.h
|   1| /*·Copyright·(C)·2010·Wildfire·Games.
|    | [NORMAL] LicenseYearBear:
|    | License should have "2019" year instead of "2010"

source/simulation2/helpers/Pathfinding.h
|   1| /*·Copyright·(C)·2016·Wildfire·Games.
|    | [NORMAL] LicenseYearBear:
|    | License should have "2019" year instead of "2016"

source/simulation2/helpers/Render.cpp
|   1| /*·Copyright·(C)·2017·Wildfire·Games.
|    | [NORMAL] LicenseYearBear:
|    | License should have "2019" year instead of "2017"

source/simulation2/helpers/VertexPathfinder.cpp
|   1| /*·Copyright·(C)·2019·Wildfire·Games.
|    | [NORMAL] LicenseYearBear:
|    | License should have "2017" year instead of "2019"

source/simulation2/Simulation2.cpp
|   1| /*·Copyright·(C)·2018·Wildfire·Games.
|    | [NORMAL] LicenseYearBear:
|    | License should have "2019" year instead of "2018"

source/ps/ThreadUtil.cpp
|   1| /*·Copyright·(C)·2010·Wildfire·Games.
|    | [NORMAL] LicenseYearBear:
|    | License should have "2019" year instead of "2010"

source/simulation2/components/CCmpPathfinder.cpp
|   1| /*·Copyright·(C)·2018·Wildfire·Games.
|    | [NORMAL] LicenseYearBear:
|    | License should have "2019" year instead of "2018"
Executing section JS...
Executing section cli...

Link to build: https://jenkins.wildfiregames.com/job/differential/1309/display/redirect

Way better. Can't see any performance loss compared to svn anymore. 👍

Stan added inline comments.May 1 2019, 10:49 AM
source/ps/ThreadUtil.h
105

Capital F ?

107

Separate file ?

source/simulation2/Simulation2.cpp
543 ↗(On Diff #7142)

Maybe it deserves a comment ?

source/simulation2/components/CCmpPathfinder.cpp
723

Merge loops ?

739

What does this function do ? Name seems misleading.

761

Why the scope ?

source/simulation2/components/CCmpPathfinder_Common.h
126

Should or is ?

151

Initialization in Init ?

source/simulation2/components/tests/test_Pathfinder.h
194 ↗(On Diff #7142)

?

source/simulation2/helpers/HierarchicalPathfinder.cpp
402 ↗(On Diff #7886)

Shouldn't that have been added back ?

451 ↗(On Diff #7886)

Shouldn't that have been added back ?

source/simulation2/helpers/Render.cpp
38 ↗(On Diff #7886)

Shouldn't that have been added back ?

source/simulation2/helpers/VertexPathfinder.cpp
458

Couldn't we pass an object/ struct here instead ?

593

@vladislavbelov Good usage for clamp below, no ?

wraitii updated this revision to Diff 7891.May 1 2019, 10:52 AM

Cleaned-up of D1853, D1954 and D1855. This needs those patches to apply.

wraitii marked 5 inline comments as done.May 1 2019, 10:59 AM
wraitii added inline comments.
source/ps/ThreadUtil.h
107

I think ThreadUtil needs a complete refactoring (it redefines std::mutex, basically) and I'll do it then I suppose.

source/simulation2/components/CCmpPathfinder.cpp
739

It's just there to be specialised, and it pushes requests so I don't think so?

761

lock gets release automatically at the end of the scope.

source/simulation2/components/CCmpPathfinder_Common.h
126

Should. It also is, but it should be and remain so in the future.

151

I think it's better to init in the definition whenever possible going forward. Not sure if that's a convention we have or not?

source/simulation2/helpers/HierarchicalPathfinder.cpp
402 ↗(On Diff #7886)

nah, who cares about the time profiling debug overlay honestly.

source/simulation2/helpers/Render.cpp
38 ↗(On Diff #7886)

Likewise: I don't think it really matters to profile this unless you're actually looking at how long this takes, but it's a function used by debug code and profiling that seems useless.

Stan added inline comments.May 1 2019, 11:12 AM
source/simulation2/components/CCmpPathfinder.cpp
739

That line doesn't pushes request does it ?

source/simulation2/helpers/Render.cpp
38 ↗(On Diff #7886)

Ah okay :)

Build failure - The Moirai have given mortals hearts that can endure.

Link to build: https://jenkins.wildfiregames.com/job/differential/1314/display/redirect

wraitii marked 6 inline comments as done.May 1 2019, 11:35 AM
wraitii added inline comments.
source/simulation2/components/CCmpPathfinder.cpp
739

It inserts them at the end (by moving), so that's similar to pushing.

Stan added inline comments.May 1 2019, 11:40 AM
source/simulation2/components/CCmpPathfinder.cpp
739

static_assert(sizeof(T) == 0, "Only specializations can be used"); What part of that statement does that ? Oo There is only one line in that function

wraitii marked 2 inline comments as done.May 1 2019, 12:06 PM
wraitii added inline comments.
source/simulation2/components/CCmpPathfinder.cpp
739

As said above, this is just the template function which gets specialised by L709/714. This base template cannot be used (it's not implemented but there's no clean way to write that because of issues in XCode/VS, see https://stackoverflow.com/a/37593094).

Stan added inline comments.May 1 2019, 12:07 PM
source/simulation2/components/CCmpPathfinder.cpp
739

Ah thanks for the explanation :)

wraitii updated this revision to Diff 8148.Sat, May 25, 10:56 PM
wraitii marked an inline comment as done.

Cleaned up of D1918.

I've switched to using static, thread-local variables for cached variables. This is a tad ugly until C++ 17 get around the corner (as we won't have to initialise the static members out of line then).

The pro is that it makes it irrelevant to the helpers whether we are threaded or not, making their implementation much easier and foolproof. The con is that one cannot use debug with the threading (well one can but it would never show anything), so you have to de-activate threading for that. Possibly this could be done at runtime, though this isn't implemented yet.

I suspect this has a small performance overhead - I'm assuming it's irrelevant here but it should be profiled.

Build failure - The Moirai have given mortals hearts that can endure.

Link to build: https://jenkins.wildfiregames.com/job/differential/1504/display/redirect

wraitii added inline comments.Mon, Jun 10, 9:16 AM
source/simulation2/components/CCmpPathfinder.cpp
840

Realised last night that I need to make sure the results are ordered consistently for threaded/non-threaded/different # of threads here. I think they are by accident right now, but and "ENSURE (is_sorted)" is probably worth adding.