[WIP] eventify: towards treating all threads uniformly in Eventify #18

Merged

i.kabadshow merged 10 commits from m.zych/fmsolvr:WIP/parallelization/intra-node/lgpl21+minimize into WIP/parallelization/intra-node/lgpl21+minimize

2022-04-26 01:23:00 +02:00

m.zych commented

2022-03-24 16:18:15 +01:00

Member

No description provided.

m.zych added 10 commits

2022-03-24 16:18:16 +01:00

[WIP] eventify: moved finalizing Executors to the IntraNodeContainer 985789b707

The ThreadingWrapper class, as its name suggests,
is meant to be a handle to an OS thread, following the RAII idiom.

Therefore,
the ThreadingWrapper should be decoupled from the Executor class and
do only one thing - manage an OS thread.

This change is the first step towards that goal, since
it frees ThreadingWrapper's destructor from finalizing an Executor.


Even though,
the order of finalizing Executors and joining Threads has been altered:

 - before = [ Finalize() , Join() , ... , Finalize() , Join() ]

 - after  = [ Finalize() , ... , Finalize() , Join() , ... , Join() ]

this change is correct,
because the behavior of Eventify's thread_pool is still unchanged.

Moreover,
finalizing all Executors before joining all Threads is more efficient,

because invoking Executor::Finalize() causes
the corresponding worker Thread to exit from its main event loop
only after execution of the current task has been finished,

and therefore the new order utilizes classical latency hiding.


Note that, the IntraNodeContainer's destructor assumes that
all tasks have already been executed,

that is, IntraNodeContainer::FinishAndSoftJoin() has been called,
and therefore no work will be dropped.

[WIP] eventify: removed Executor reference from the ThreadingWrapper 1e02f82c6b

This change completes the process of
decoupling the ThreadingWrapper from the Executor class,

by shifting the responsibility of defining thread's main function
to a ThreadingWrapper's client,

here the Eventify's thread_pool, hidden in the IntraNodeContainer.


It is worth noting that,
the ThreadingWrapper is not a class template anymore,

instead, it defines a constructor template,
which accepts any callable value as thread's main function,

mimicking std::thread's interface.


Therefore, the ThreadingWrapper, as well as std::thread,
are type-erased polymorphic value_types,

which both represent a handle to
a separate thread of execution, invoking any callable value.

[WIP] eventify: removed redundant ThreadingWrapperType type alias cd1900e1f7

[WIP] eventify: passed empty lambda as main_thread's main function 05555e805b

The ThreadingWrapper will create a std::thread object,
which does not represent a thread of execution,

provided that, the 'primary' parameter in ThreadingWrapper's constructor
has been set to true,

which is a special case, intended exclusively for the main_thread.


Therefore, the Executor::Run() function will always
be invoked only by the worker threads in the Eventify's thread_pool.

In fact,
the main_thread will invoke the Executor::RunAll() function instead,

while waiting, in the IntraNodeContainer::FinishAndSoftJoin() function,
for the FMM task_graph to finish executing.


To reflect this fact and to avoid misleading Evenify developers,

this change defines an empty lambda as a main function
for the main_thread's ThreadingWrapper.

[WIP] eventify: stopped creating ThreadingWrapper for the main_thread 26f92e87bd

This change goes even further than
the "eventify: passed empty lambda as main_thread's main function",

since it completely removes
creating ThreadingWrapper's instance for the main_thread.


Superficially, creating instances of the ThreadingWrapper class
exclusively for the worker threads seems as not ideal,

because treating all threads, including the main_thread, uniformly
would increase code reuse and regularity of the implementation.


However, in programming languages, such as C++ or Rust,

there exists an inherent asymmetry
between the main_thread and threads spawned programmatically,

that is, the main_thread is created automatically by the compiler,
leaving relatively little control over that process to the programmer.


In contrast, in OpenGL and Vulkan APIs, shaders written in GLSL
are executed multiple times, for each vertex or fragment,

and therefore the main functions, defined by each shader,
are executed concurrently, per thread, in the SIMT execution model,

which makes this asymmetry between threads disappear.


Trying to hide this inherent asymmetry introduces serious problems:

 - How to define a std::thread object representing the main_thread?

 - Which thread should: submit a task_graph to the thread_pool and
                        receive computational results of its execution?

Unfortunately,
these issues have not been meaningfully resolved in the Eventify.

[WIP] eventify: handle uniformly all threads in the ThreadingWrapper 258687beaf

This change removes
special treatment of the main thread from the ThreadingWrapper,

since Eventify creates instances of the ThreadingWrapper class
exclusively for the worker threads in the Eventify's thread_pool.


Note that, a default-constructed std::thread object,
which does not represent a thread of execution, is not joinable().

[WIP] eventify: assume FinishAndSoftJoin() is called from the main_thread dcdd611c2c

From static analysis of the fmmtest's call graph,
it is clear that the IntraNodeContainer::FinishAndSoftJoin() function

is invoked from the main_thread, by the by fmsolvr::pass12345().


Therefore,
it is perfectly legitimate to rely on current FMSolvr's code structure,
to ensure that FinishAndSoftJoin() is always called from the main_thread,

that is, to transform this redundant run-time check into a precondition
and assume that the mentioned requirement has been met.


Again, as of right now, the Eventify's implementation
expects that FinishAndSoftJoin() is always called from the main_thread

and worker threads, from Eventify's thread_pool, will never invoke it.


However, this assumption can be lifted in the future,

since Eventify could be made oblivious of the main_thread,
that is, allow any thread, including the main_thread and worker threads,

to schedule
asynchronous task_graph execution and wait for or query its completion.

[WIP] eventify: removed unused code detecting the primary/main thread f78dba98f1

[WIP] eventify: stopped using ThreadingWrapper in the IntraNodeContainer e4b17ebca8

The ThreadingWrapper is a handle to an OS thread,
which follows the RAII idiom,
that is, it implicitly calls .join() in its destructor.


However, this behaviour is problematic,

    [ISO C++] - Working Draft, C++20 Standard
     ~ http://open-std.org/jtc1/sc22/wg21/docs/papers/2020/n4868.pdf

    32.4.3.4 Thread support library: Class thread - Destructor

    [Note: (...) implicitly (...) joining
           a joinable() thread in its destructor could result in
           difficult to debug (...) performance (...) bugs
           encountered only when an exception is thrown.

           These bugs can be avoided by ensuring that
           the destructor is never executed
           while the thread is still joinable.]

unless cancellation (a stop request) is supported.

    [C++ reference] - std::jthread
     ~ https://en.cppreference.com/w/cpp/thread/jthread

    The class std::jthread represents a single thread of execution.
    It has the same general behavior as std::thread,

    except that std::jthread automatically joins on destruction,
    and can be cancelled/stopped in certain situations.


Therefore, it is reasonable to conclude that Eventify's thread_pool
suffers from the same exact problem as the ThreadingWrapper class,

after all, its destructor blocks current thread of execution
until the worker threads have finished executing all submitted tasks.


Although, that conclusion is technically correct,

Eventify clients are expected to usually create only one thread_pool,
per a whole software system, to utilize compute resources efficiently.

This idiomatic usage of the Eventify's thread_pool

effectively ensures that its lifetime will strictly enclose
the duration of execution of all submitted tasks by the worker threads,

and therefore the mentioned performance problem will be avoided.


However, the fact that in the vast majority of cases
there will exist only one instance of the Eventify's thread_pool

does not mean, that Eventify clients should not be able
to create multiple independent thread_pool instances.

[WIP] eventify: removed the ThreadingWrapper class 54ab047fdf