[WIP] eventify: towards treating all threads uniformly in Eventify #18

Member
No description provided.
The ThreadingWrapper class, as its name suggests,
is meant to be a handle to an OS thread, following the RAII idiom.

Therefore,
the ThreadingWrapper should be decoupled from the Executor class and
do only one thing - manage an OS thread.

This change is the first step towards that goal, since
it frees ThreadingWrapper's destructor from finalizing an Executor.


Even though,
the order of finalizing Executors and joining Threads has been altered:

 - before = [ Finalize() , Join() , ... , Finalize() , Join() ]

 - after  = [ Finalize() , ... , Finalize() , Join() , ... , Join() ]

this change is correct,
because the behavior of Eventify's thread_pool is still unchanged.

Moreover,
finalizing all Executors before joining all Threads is more efficient,

because invoking Executor::Finalize() causes
the corresponding worker Thread to exit from its main event loop
only after execution of the current task has been finished,

and therefore the new order utilizes classical latency hiding.


Note that, the IntraNodeContainer's destructor assumes that
all tasks have already been executed,

that is, IntraNodeContainer::FinishAndSoftJoin() has been called,
and therefore no work will be dropped.
This change completes the process of
decoupling the ThreadingWrapper from the Executor class,

by shifting the responsibility of defining thread's main function
to a ThreadingWrapper's client,

here the Eventify's thread_pool, hidden in the IntraNodeContainer.


It is worth noting that,
the ThreadingWrapper is not a class template anymore,

instead, it defines a constructor template,
which accepts any callable value as thread's main function,

mimicking std::thread's interface.


Therefore, the ThreadingWrapper, as well as std::thread,
are type-erased polymorphic value_types,

which both represent a handle to
a separate thread of execution, invoking any callable value.
The ThreadingWrapper will create a std::thread object,
which does not represent a thread of execution,

provided that, the 'primary' parameter in ThreadingWrapper's constructor
has been set to true,

which is a special case, intended exclusively for the main_thread.


Therefore, the Executor::Run() function will always
be invoked only by the worker threads in the Eventify's thread_pool.

In fact,
the main_thread will invoke the Executor::RunAll() function instead,

while waiting, in the IntraNodeContainer::FinishAndSoftJoin() function,
for the FMM task_graph to finish executing.


To reflect this fact and to avoid misleading Evenify developers,

this change defines an empty lambda as a main function
for the main_thread's ThreadingWrapper.
This change goes even further than
the "eventify: passed empty lambda as main_thread's main function",

since it completely removes
creating ThreadingWrapper's instance for the main_thread.


Superficially, creating instances of the ThreadingWrapper class
exclusively for the worker threads seems as not ideal,

because treating all threads, including the main_thread, uniformly
would increase code reuse and regularity of the implementation.


However, in programming languages, such as C++ or Rust,

there exists an inherent asymmetry
between the main_thread and threads spawned programmatically,

that is, the main_thread is created automatically by the compiler,
leaving relatively little control over that process to the programmer.


In contrast, in OpenGL and Vulkan APIs, shaders written in GLSL
are executed multiple times, for each vertex or fragment,

and therefore the main functions, defined by each shader,
are executed concurrently, per thread, in the SIMT execution model,

which makes this asymmetry between threads disappear.


Trying to hide this inherent asymmetry introduces serious problems:

 - How to define a std::thread object representing the main_thread?

 - Which thread should: submit a task_graph to the thread_pool and
                        receive computational results of its execution?

Unfortunately,
these issues have not been meaningfully resolved in the Eventify.
This change removes
special treatment of the main thread from the ThreadingWrapper,

since Eventify creates instances of the ThreadingWrapper class
exclusively for the worker threads in the Eventify's thread_pool.


Note that, a default-constructed std::thread object,
which does not represent a thread of execution, is not joinable().
From static analysis of the fmmtest's call graph,
it is clear that the IntraNodeContainer::FinishAndSoftJoin() function

is invoked from the main_thread, by the by fmsolvr::pass12345().


Therefore,
it is perfectly legitimate to rely on current FMSolvr's code structure,
to ensure that FinishAndSoftJoin() is always called from the main_thread,

that is, to transform this redundant run-time check into a precondition
and assume that the mentioned requirement has been met.


Again, as of right now, the Eventify's implementation
expects that FinishAndSoftJoin() is always called from the main_thread

and worker threads, from Eventify's thread_pool, will never invoke it.


However, this assumption can be lifted in the future,

since Eventify could be made oblivious of the main_thread,
that is, allow any thread, including the main_thread and worker threads,

to schedule
asynchronous task_graph execution and wait for or query its completion.
The ThreadingWrapper is a handle to an OS thread,
which follows the RAII idiom,
that is, it implicitly calls .join() in its destructor.


However, this behaviour is problematic,

    [ISO C++] - Working Draft, C++20 Standard
     ~ http://open-std.org/jtc1/sc22/wg21/docs/papers/2020/n4868.pdf

    32.4.3.4 Thread support library: Class thread - Destructor

    [Note: (...) implicitly (...) joining
           a joinable() thread in its destructor could result in
           difficult to debug (...) performance (...) bugs
           encountered only when an exception is thrown.

           These bugs can be avoided by ensuring that
           the destructor is never executed
           while the thread is still joinable.]

unless cancellation (a stop request) is supported.

    [C++ reference] - std::jthread
     ~ https://en.cppreference.com/w/cpp/thread/jthread

    The class std::jthread represents a single thread of execution.
    It has the same general behavior as std::thread,

    except that std::jthread automatically joins on destruction,
    and can be cancelled/stopped in certain situations.


Therefore, it is reasonable to conclude that Eventify's thread_pool
suffers from the same exact problem as the ThreadingWrapper class,

after all, its destructor blocks current thread of execution
until the worker threads have finished executing all submitted tasks.


Although, that conclusion is technically correct,

Eventify clients are expected to usually create only one thread_pool,
per a whole software system, to utilize compute resources efficiently.

This idiomatic usage of the Eventify's thread_pool

effectively ensures that its lifetime will strictly enclose
the duration of execution of all submitted tasks by the worker threads,

and therefore the mentioned performance problem will be avoided.


However, the fact that in the vast majority of cases
there will exist only one instance of the Eventify's thread_pool

does not mean, that Eventify clients should not be able
to create multiple independent thread_pool instances.
i.kabadshow merged commit 54ab047fdf into WIP/parallelization/intra-node/lgpl21+minimize 2022-04-26 01:23:00 +02:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: ATML-CAP/fmsolvr#18
No description provided.