WIP [do not merge] test pull request ... #17

Closed
i.kabadshow wants to merge 87 commits from a.beckmann/fmsolvr:parallelization-v1 into WIP/parallelization/intra-node/lgpl21+minimize
Owner
No description provided.
fmmtest segfaults if linked with ld.gold, -pthread and -Wl,--as-needed
- fmsolvr/p2p/pthreads.hpp
 - simd/avx_complex_double2.h
The Boost.Pool supports move semantics from C++11 since version 1.64,
which means FMSolver's modified version is an unnecessary maintenance burden.

 ~ https://github.com/boostorg/pool/blob/boost-1.64.0/include/boost/pool/pool_alloc.hpp

Since modifications of boost::pool_allocator in FMSolver
do not differ significantly from the mainline Boost version, aside from:

 - pool_allocator::construct() calls placement new operator from the global namespace
 - destory() member function is a function template
 - member functions construct() and destory() are marked as static

the removal of FMSolver's modified version of the pool_alloc.hpp header is safe.
The FMSolvr is a generic library, implementing the Fast Multiple Method,
which doesn't rely on particular allocation strategy,
threading facilities or representation of floating point numbers.

This change decouples the default_allocator<> from FMSolvr,
hence removes FMSolver's dependency on the Boost library,
because default_allocator<> is defined for the purpose of testing
the FMSolvr library using boost::fast_pool_allocator or std::allocator.
This change introduces USE_DOUBLE macro for consistency with
existing USE_FLOAT, USE_LONGDOUBLE and USE_FLOAT128 macros.

In order for the Real type to be defined,
one of the above macros must be defined.

Also, this change allows microbench to be configured with
long double or float128 as the Real type.
The header file <fmsolvr/util/float128.hpp>
implements C++ facilities supporting __float128:
 - <cmath> functions
 - <iostream> operators
 - std::numeric_limits<> specialization

Unfortunately float128.hpp opens the std namespace,
by adding definitions to it, which is an undefined behaviour.

As a temporary workaround, this change makes sure that
all C++ source files #include <fmsolvr/util/float128.hpp>
before C++ standard library headers <cmath>, <iostream> and <limits>,
which ensures that declarations of C++ facilities supporting __float128
are visible before client code has an oportunity to use them.
The function std::numeric_limits<__float128>::max()
returns the largest positive 128-bit floating point number,
which is (1 + 1/2 + 1/4 + 1/8 + ... + 1/2^112) * 2^16383.

[IEEE 754] - Quadruple-precision floating point number format

  1-bit
    |      (-1)^sign * 1.fraction * 2^(exponent-0x3FFF)
    V
  +----+------------+------------------------------------+
  |sign|  exponent  |              fraction              |
  +----+------------+------------------------------------+
       <-- 15-bit --><------------- 112-bit ------------->
  <---------------------- 128-bit ----------------------->
- marked unused variables in communicate-ff
 - removed unused #include of x86 SIMD header file in coulomb
 - added missing #include for POSIX threads in coulomb
 - removed pthreads.hpp leftovers in fmm_handle
 - removed inline keyword from functions marked as noinline
 - removed redundant fmsolver:: namespace prefix
 - removed usage of the register keyword, since it's deprecated
 - fixed misplaced __AVX__ #ifdef in super_accumulate
 - added #include missing, when AVX is disabled in microbench
 - made unit-test intrinsics_x86 always compilable
The compiler flag '-march=native' is not guaranteed to always be available
in GCC and Clang C++ compilers. In fact, when targeting ARM CPUs, support
for '-march=native' is available only in GCC on the GNU/Linux platform:

 - GCC 8.2 ARM64    -> https://godbolt.org/z/bq88n6
   error: unknown value 'native' for -march
   note: valid arguments are: armv8-a armv8.1-a armv8.2-a armv8.3-a armv8.4-a

 - GCC 6.4 ARM      -> https://godbolt.org/z/b6rco7
   error: unrecognized argument in option '-march=native'
   note: valid arguments to '-march=' are: ... armv6 ... armv7 ... armv8-a ...

 - Clang 11 ARMv7-a -> https://godbolt.org/z/Pdafcr
   error: the clang compiler does not support '-march=native'

The lack of availability of the '-march=native' compiler flag on ARM
is unlike guaranteed availability on x86:

 - [GCC] x86 Options (https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html)

   -march=cpu-type
      Generate instructions for the machine type 'cpu-type'.

      'native'
         This selects the CPU to generate code for at compilation time
         by determining the processor type of the compiling machine.
         Using '-march=native' enables all instruction subsets supported by
         the local machine (hence the result might not run on different machines).

 - [GCC] ARM Options (https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html)

   -march=name[+extension...]
      This specifies the name of the target ARM architecture.
      GCC uses this name to determine what kind of instructions
      it can emit when generating assembly code.

      -march=native
         Causes the compiler to auto-detect the architecture of the build computer.
         At present, this feature is only supported on GNU/Linux,
         and not all architectures are recognized.
         If the auto-detect is unsuccessful the option has no effect.

Moreover, when targetting ARM CPUs, unlike when targetting x86 CPUs,
instead of using "-march" compiler flag, it is recommended to use "-mcpu":

 - [GCC] x86 Options (https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html)

   -mcpu=cpu-type
      A deprecated synonym for -mtune.

 - [GCC] ARM Options (https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html)

   -mcpu=name[+extension...]
      This specifies the name of the target ARM processor.
      GCC uses this name to derive the name of the target ARM architecture
      (as if specified by '-march') and the ARM processor type
      for which to tune for performance (as if specified by '-mtune').
      Where this option is used in conjunction with '-march' or '-mtune',
      those options take precedence over the appropriate part of this option.

      -mcpu=native
         Causes the compiler to auto-detect the CPU of the build computer.
         At present, this feature is only supported on GNU/Linux,
         and not all architectures are recognized.
         If the auto-detect is unsuccessful the option has no effect.

 - [arm Community] John Linford - Compiler flags across architectures:
                                  '-march', '-mtune', and '-mcpu'.
   (https://community.arm.com/developer/tools-software/tools/b/tools-software-ides-blog/posts/compiler-flags-across-architectures-march-mtune-and-mcpu)

   Automatic Target Detection
      GCC and Clang support passing the special parameter "native" to these flags.
      The "native" value tells the compiler to detect the architecture of
      the machine on which the compiler is executing and use that architecture
      as the parameter to '-march', '-mtune', or '-mcpu' as appropriate.

      Assuming architecture detection works for your platform,
      passing "native" is usually the best choice,
      if you are not cross-compiling and all you care about is performance.

      With GCC, all three flags can accept "native" as a parameter
      so '-march=native', '-mtune=native', and '-mcpu=native' are all valid.

      Clang only supports "native" for the '-mcpu' and '-mtune' flags.
      You cannot use '-march=native' with Clang.

      If you are not cross-compiling, always use only '-mcpu=native',
      to maximize optimization and compatibility across compilers,
      and actively avoid using '-mtune' or '-march'.

   Summary
      -march=arch:
          Tells the compiler that 'arch' is
          the minimal architecture the binary must run on.
          The compiler is free to use architecture-specific instructions.
          This flag behaves differently on ARM and x86.
          On ARM, '-march' does not override '-mtune',
          but on x86 '-march' will override both '-mtune' and '-mcpu'.

      -mtune=arch:
          Tells the compiler to optimize for architecture 'arch',
          but does not allow the compiler to change the ABI
          or make assumptions about available instructions.
          This flag has the more-or-less the same meaning on Arm and x86.

      -mcpu=arch:
          On ARM, this flag is a combination of '-march' and '-mtune'.
          It simultaneously specifies the target architecture
          and optimizes for a given architecture.
          On x86 this flag is a deprecated synonym for '-mtune'.

To provide binary compatibility with wide range of ARM CPUs
this change could use compiler flags '-march=armv6' and '-march=armv7'.

However, since FMSolver developers value performance over binary compatibility,
this change uses '-mcpu=native' compiler flag and assumes its availability.
Type punning in C++ requires obeying alignment rules,
which means the stack allocation for the double[] array
must be aligned at least as 256-bit vector of 4 doubles.

[С++ meetup PRAGUE, С++ Russia] Timur Doumler - Type punning in modern C++
 - https://www.youtube.com/watch?v=Nn7zugKc32Q
 - https://www.youtube.com/watch?v=5A9NZADhTwc
 - https://2019.cppconf-piter.ru/en/2019/spb/talks/ydewyakvq6nsvfm8o5ysx
Removal of overflowed lattice sums on ARM should be safe,
since the higher-order lattice sums are only required for highly precise calculations,
not supported by the 64-bit floating point type - long double - on 32-bit ARM CPUs.
This change also removes printing COMPILER_FLAGS,
since CMAKE_EXPORT_COMPILE_COMMANDS already covers this functionality.
Support for the __float128 type in C++ RTTI facilities
(typeid() operator and std::type_info class)
was not implemented until GCC 5: - https://godbolt.org/z/bqcPj4

[GCC Bugzilla] - Incomplete C++ library support for __float128
 - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43622

This change drops support for GCC 4.9 and older versions, since it
removes function template specialization get_name_of_type<__float128>().
Structs float4 and double4 are not utilizing SIMD instructions optimally,
since 4th vector element is unused, wasting memory and computation resources.
This change enables execution of GNU Make directives specific to
a particular CPU architecture, which is required by upcoming changes.
This change introduces the COMPILER GNU Make variable,
enabling detection of C++ compiler in FMSolvr's GNU Make build system,
which is required by upcoming changes.
The FMSolvr is a generic library, implementing the Fast Multiple Method,
which doesn't rely on particular allocation strategy,
threading facilities or representation of floating point numbers.

This change decouples the default_allocator<> from FMSolvr,
because default_allocator<> is defined for the purpose of testing
the FMSolvr library using std::allocator.
This change introduces USE_DOUBLE macro for consistency
with existing USE_FLOAT and USE_LONGDOUBLE macros.

In order for the Real type to be defined,
one of the above macros must be defined.

Also, this change adds support for choosing floating point type,
used for FMM computations, to the FMSolvr's GNU Make build system.
The compiler flag '-march=native' is not guaranteed to always be available
in GCC and Clang C++ compilers. In fact, when targeting ARM CPUs, support
for '-march=native' is available only in GCC on the GNU/Linux platform:

 - GCC 8.2 ARM64    -> https://godbolt.org/z/bq88n6
   error: unknown value 'native' for -march
   note: valid arguments are: armv8-a armv8.1-a armv8.2-a armv8.3-a armv8.4-a

 - GCC 6.4 ARM      -> https://godbolt.org/z/b6rco7
   error: unrecognized argument in option '-march=native'
   note: valid arguments to '-march=' are: ... armv6 ... armv7 ... armv8-a ...

 - Clang 11 ARMv7-a -> https://godbolt.org/z/Pdafcr
   error: the clang compiler does not support '-march=native'

The lack of availability of the '-march=native' compiler flag on ARM
is unlike guaranteed availability on x86:

 - [GCC] x86 Options (https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html)

   -march=cpu-type
      Generate instructions for the machine type 'cpu-type'.

      'native'
         This selects the CPU to generate code for at compilation time
         by determining the processor type of the compiling machine.
         Using '-march=native' enables all instruction subsets supported by
         the local machine (hence the result might not run on different machines).

 - [GCC] ARM Options (https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html)

   -march=name[+extension...]
      This specifies the name of the target ARM architecture.
      GCC uses this name to determine what kind of instructions
      it can emit when generating assembly code.

      -march=native
         Causes the compiler to auto-detect the architecture of the build computer.
         At present, this feature is only supported on GNU/Linux,
         and not all architectures are recognized.
         If the auto-detect is unsuccessful the option has no effect.

Moreover, when targetting ARM CPUs, unlike when targetting x86 CPUs,
instead of using "-march" compiler flag, it is recommended to use "-mcpu":

 - [GCC] x86 Options (https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html)

   -mcpu=cpu-type
      A deprecated synonym for -mtune.

 - [GCC] ARM Options (https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html)

   -mcpu=name[+extension...]
      This specifies the name of the target ARM processor.
      GCC uses this name to derive the name of the target ARM architecture
      (as if specified by '-march') and the ARM processor type
      for which to tune for performance (as if specified by '-mtune').
      Where this option is used in conjunction with '-march' or '-mtune',
      those options take precedence over the appropriate part of this option.

      -mcpu=native
         Causes the compiler to auto-detect the CPU of the build computer.
         At present, this feature is only supported on GNU/Linux,
         and not all architectures are recognized.
         If the auto-detect is unsuccessful the option has no effect.

 - [arm Community] John Linford - Compiler flags across architectures:
                                  '-march', '-mtune', and '-mcpu'.
   (https://community.arm.com/developer/tools-software/tools/b/tools-software-ides-blog/posts/compiler-flags-across-architectures-march-mtune-and-mcpu)

   Automatic Target Detection
      GCC and Clang support passing the special parameter "native" to these flags.
      The "native" value tells the compiler to detect the architecture of
      the machine on which the compiler is executing and use that architecture
      as the parameter to '-march', '-mtune', or '-mcpu' as appropriate.

      Assuming architecture detection works for your platform,
      passing "native" is usually the best choice,
      if you are not cross-compiling and all you care about is performance.

      With GCC, all three flags can accept "native" as a parameter
      so '-march=native', '-mtune=native', and '-mcpu=native' are all valid.

      Clang only supports "native" for the '-mcpu' and '-mtune' flags.
      You cannot use '-march=native' with Clang.

      If you are not cross-compiling, always use only '-mcpu=native',
      to maximize optimization and compatibility across compilers,
      and actively avoid using '-mtune' or '-march'.

   Summary
      -march=arch:
          Tells the compiler that 'arch' is
          the minimal architecture the binary must run on.
          The compiler is free to use architecture-specific instructions.
          This flag behaves differently on ARM and x86.
          On ARM, '-march' does not override '-mtune',
          but on x86 '-march' will override both '-mtune' and '-mcpu'.

      -mtune=arch:
          Tells the compiler to optimize for architecture 'arch',
          but does not allow the compiler to change the ABI
          or make assumptions about available instructions.
          This flag has the more-or-less the same meaning on Arm and x86.

      -mcpu=arch:
          On ARM, this flag is a combination of '-march' and '-mtune'.
          It simultaneously specifies the target architecture
          and optimizes for a given architecture.
          On x86 this flag is a deprecated synonym for '-mtune'.

To provide binary compatibility with wide range of ARM CPUs
this change could use compiler flags '-march=armv6' and '-march=armv7'.

However, since FMSolver developers value performance over binary compatibility,
this change uses '-mcpu=native' compiler flag and assumes its availability.
This change also removes printing COMPILER_FLAGS,
since CMAKE_EXPORT_COMPILE_COMMANDS already covers this functionality.
Reviewed-on: ATML-SE/fmsolvr#2
I added the MIT license header to all files.
This implicitly accepts the publication under MIT license.
dummy merge to mark the branches as 'in sync'
i.kabadshow closed this pull request 2022-03-18 06:33:14 +01:00

Pull request closed

Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: ATML-CAP/fmsolvr#17
No description provided.