WIP [do not merge] test pull request ... #17

Closed

i.kabadshow wants to merge 87 commits from a.beckmann/fmsolvr:parallelization-v1 into WIP/parallelization/intra-node/lgpl21+minimize

i.kabadshow commented

2022-03-18 06:27:28 +01:00

Owner

No description provided.

i.kabadshow added 87 commits

2022-03-18 06:27:29 +01:00

add support for Makefile.$(HOSTNAME) and Makefile.$(SYSTEMNAME) fb00847371

override the linker to ld.bfd for juwels and jureca 7a6c464034

fmmtest segfaults if linked with ld.gold, -pthread and -Wl,--as-needed

do not omit -pthread in gcc build 138a768bbf

fix divide-by-zero for p=0 7b6ca57f7d

report "assert() is enabled" d161ceb2c2

add LGPL 2.1 header to tables/*.h 30938fff22

add error control constants for ws=1, open boundaries e48cdaf1bf

particle_handle: add method to compute ∑|q| 09aedc627a

compute ws, d, p according to "Analytic A Priory Error Bounds ..." 561e0b72a8

fmmtest: print a list of suggested ws/d/p 318a84b0cb

add error control constants for ws=2, open boundaries 01f83361f0

the lattice operator is not defined for ws=0 7531c7e365

compute_ws_d_p: never return ws=0 90a8342f09

report assert(enabled) 1fd2dfca35

[WIP] fmsolvr: removed outdated benchmarking results 1d601f1b4b

[WIP] fmsolvr: removed unused header files e1e5057b8a

- fmsolvr/p2p/pthreads.hpp
 - simd/avx_complex_double2.h

[WIP] fmsolvr: removed modified Boost.Pool header file 51d961d2f0

The Boost.Pool supports move semantics from C++11 since version 1.64,
which means FMSolver's modified version is an unnecessary maintenance burden.

 ~ https://github.com/boostorg/pool/blob/boost-1.64.0/include/boost/pool/pool_alloc.hpp

Since modifications of boost::pool_allocator in FMSolver
do not differ significantly from the mainline Boost version, aside from:

 - pool_allocator::construct() calls placement new operator from the global namespace
 - destory() member function is a function template
 - member functions construct() and destory() are marked as static

the removal of FMSolver's modified version of the pool_alloc.hpp header is safe.

[WIP] fmsolvr: excluded default_allocator<> from FMSolvr's interface 5de88cda5f

The FMSolvr is a generic library, implementing the Fast Multiple Method,
which doesn't rely on particular allocation strategy,
threading facilities or representation of floating point numbers.

This change decouples the default_allocator<> from FMSolvr,
hence removes FMSolver's dependency on the Boost library,
because default_allocator<> is defined for the purpose of testing
the FMSolvr library using boost::fast_pool_allocator or std::allocator.

[WIP] fmsolvr: disabled GCC warnings regarding ignored attributes 3bb46deea7

[WIP] fmsolvr: fixed some of compilation issues in benchmarks and tests 36f6eb4a2c

[WIP] fmsolvr: minor tweaks removing warnings 958d5365d5

[WIP] fmsolvr: created overload set fmsolvr::split_complex_mul_acc() 45ed6de2bb

[WIP] fmsolvr: converted double literals to Real temporaries a22b286d24

[WIP] fmsolvr: improved consistency of the Real type 40a58b9e53

This change introduces USE_DOUBLE macro for consistency with
existing USE_FLOAT, USE_LONGDOUBLE and USE_FLOAT128 macros.

In order for the Real type to be defined,
one of the above macros must be defined.

Also, this change allows microbench to be configured with
long double or float128 as the Real type.

[WIP] fmsolvr: replaced FP_TYPE macro with Real type in coulomb test 6a8d0b9290

[WIP] fmsolvr: forced #include of float128.hpp to be at the top f7b20d0d0a

The header file <fmsolvr/util/float128.hpp>
implements C++ facilities supporting __float128:
 - <cmath> functions
 - <iostream> operators
 - std::numeric_limits<> specialization

Unfortunately float128.hpp opens the std namespace,
by adding definitions to it, which is an undefined behaviour.

As a temporary workaround, this change makes sure that
all C++ source files #include <fmsolvr/util/float128.hpp>
before C++ standard library headers <cmath>, <iostream> and <limits>,
which ensures that declarations of C++ facilities supporting __float128
are visible before client code has an oportunity to use them.

[WIP] fmsolvr: added missing C++ functions supporting __float128 a3c77e26d4

The function std::numeric_limits<__float128>::max()
returns the largest positive 128-bit floating point number,
which is (1 + 1/2 + 1/4 + 1/8 + ... + 1/2^112) * 2^16383.

[IEEE 754] - Quadruple-precision floating point number format

  1-bit
    |      (-1)^sign * 1.fraction * 2^(exponent-0x3FFF)
    V
  +----+------------+------------------------------------+
  |sign|  exponent  |              fraction              |
  +----+------------+------------------------------------+
       <-- 15-bit --><------------- 112-bit ------------->
  <---------------------- 128-bit ----------------------->

[WIP] fmsolvr: fixed SIMD specialized structs XYZ<float>, XYZ<double> 46433b1c74

[WIP] fmsolvr: made COULOMB_POTENTIAL, COULOMB_ENERGY independent 8475eb58dc

[WIP] fmsolvr: fixed further compilation and warning issues 2d492b1e94

- marked unused variables in communicate-ff
 - removed unused #include of x86 SIMD header file in coulomb
 - added missing #include for POSIX threads in coulomb
 - removed pthreads.hpp leftovers in fmm_handle
 - removed inline keyword from functions marked as noinline
 - removed redundant fmsolver:: namespace prefix
 - removed usage of the register keyword, since it's deprecated
 - fixed misplaced __AVX__ #ifdef in super_accumulate
 - added #include missing, when AVX is disabled in microbench
 - made unit-test intrinsics_x86 always compilable

[WIP] fmsolvr: replaced '-march=native' with '-mcpu=native' on ARM 710fd4e049

The compiler flag '-march=native' is not guaranteed to always be available
in GCC and Clang C++ compilers. In fact, when targeting ARM CPUs, support
for '-march=native' is available only in GCC on the GNU/Linux platform:

- GCC 8.2 ARM64 -> https://godbolt.org/z/bq88n6
error: unknown value 'native' for -march
note: valid arguments are: armv8-a armv8.1-a armv8.2-a armv8.3-a armv8.4-a

- GCC 6.4 ARM -> https://godbolt.org/z/b6rco7
error: unrecognized argument in option '-march=native'
note: valid arguments to '-march=' are: ... armv6 ... armv7 ... armv8-a ...

- Clang 11 ARMv7-a -> https://godbolt.org/z/Pdafcr
error: the clang compiler does not support '-march=native'

The lack of availability of the '-march=native' compiler flag on ARM
is unlike guaranteed availability on x86:

- [GCC] x86 Options (https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html)

-march=cpu-type
Generate instructions for the machine type 'cpu-type'.

'native'
This selects the CPU to generate code for at compilation time
by determining the processor type of the compiling machine.
Using '-march=native' enables all instruction subsets supported by
the local machine (hence the result might not run on different machines).

- [GCC] ARM Options (https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html)

-march=name[+extension...]
This specifies the name of the target ARM architecture.
GCC uses this name to determine what kind of instructions
it can emit when generating assembly code.

-march=native
Causes the compiler to auto-detect the architecture of the build computer.
At present, this feature is only supported on GNU/Linux,
and not all architectures are recognized.
If the auto-detect is unsuccessful the option has no effect.

Moreover, when targetting ARM CPUs, unlike when targetting x86 CPUs,
instead of using "-march" compiler flag, it is recommended to use "-mcpu":

- [GCC] x86 Options (https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html)

-mcpu=cpu-type
A deprecated synonym for -mtune.

- [GCC] ARM Options (https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html)

-mcpu=name[+extension...]
This specifies the name of the target ARM processor.
GCC uses this name to derive the name of the target ARM architecture
(as if specified by '-march') and the ARM processor type
for which to tune for performance (as if specified by '-mtune').
Where this option is used in conjunction with '-march' or '-mtune',
those options take precedence over the appropriate part of this option.

-mcpu=native
Causes the compiler to auto-detect the CPU of the build computer.
At present, this feature is only supported on GNU/Linux,
and not all architectures are recognized.
If the auto-detect is unsuccessful the option has no effect.

- [arm Community] John Linford - Compiler flags across architectures:
'-march', '-mtune', and '-mcpu'.
(https://community.arm.com/developer/tools-software/tools/b/tools-software-ides-blog/posts/compiler-flags-across-architectures-march-mtune-and-mcpu)

Automatic Target Detection
GCC and Clang support passing the special parameter "native" to these flags.
The "native" value tells the compiler to detect the architecture of
the machine on which the compiler is executing and use that architecture
as the parameter to '-march', '-mtune', or '-mcpu' as appropriate.

Assuming architecture detection works for your platform,
passing "native" is usually the best choice,
if you are not cross-compiling and all you care about is performance.

With GCC, all three flags can accept "native" as a parameter
so '-march=native', '-mtune=native', and '-mcpu=native' are all valid.

Clang only supports "native" for the '-mcpu' and '-mtune' flags.
You cannot use '-march=native' with Clang.

If you are not cross-compiling, always use only '-mcpu=native',
to maximize optimization and compatibility across compilers,
and actively avoid using '-mtune' or '-march'.

Summary
-march=arch:
Tells the compiler that 'arch' is
the minimal architecture the binary must run on.
The compiler is free to use architecture-specific instructions.
This flag behaves differently on ARM and x86.
On ARM, '-march' does not override '-mtune',
but on x86 '-march' will override both '-mtune' and '-mcpu'.

-mtune=arch:
Tells the compiler to optimize for architecture 'arch',
but does not allow the compiler to change the ABI
or make assumptions about available instructions.
This flag has the more-or-less the same meaning on Arm and x86.

-mcpu=arch:
On ARM, this flag is a combination of '-march' and '-mtune'.
It simultaneously specifies the target architecture
and optimizes for a given architecture.
On x86 this flag is a deprecated synonym for '-mtune'.

To provide binary compatibility with wide range of ARM CPUs
this change could use compiler flags '-march=armv6' and '-march=armv7'.

However, since FMSolver developers value performance over binary compatibility,
this change uses '-mcpu=native' compiler flag and assumes its availability.

[WIP] fmsolvr: added missing unaligned_load() for SSE fvec4, SSE2 dvec2 2b46846a5e

[WIP] fmsolvr: detect availability of float128 in GNU Make build system fb8d778f50

[WIP] fmsolvr: replaced x86 SIMD code by using SIMD through FMSolvr API 2cc63d20f5

[WIP] fmsolvr: added required array alignment in intrinsics_x86 f6548de1f1

Type punning in C++ requires obeying alignment rules,
which means the stack allocation for the double[] array
must be aligned at least as 256-bit vector of 4 doubles.

[С++ meetup PRAGUE, С++ Russia] Timur Doumler - Type punning in modern C++
 - https://www.youtube.com/watch?v=Nn7zugKc32Q
 - https://www.youtube.com/watch?v=5A9NZADhTwc
 - https://2019.cppconf-piter.ru/en/2019/spb/talks/ydewyakvq6nsvfm8o5ysx

[WIP] fmsolvr: renamed boost::pool_allocator macro defining default_allocator b0501095d0

[WIP] fmsolvr: renamed coulomb macros controlling parallelization facilities 8aa819e45f

[WIP] fmsolvr: implemented contracts and utility macros c3919cec20

[WIP] fmsolvr: refined benchmarks and tests using newly created macros d43ed642f5

[WIP] fmsolvr: disabled GCC warnings regarding psABI changes on ARM 1751aeabfb

[WIP] fmsolvr: implemented target CPU architecture macro 4a20c09de8

[WIP] fmsolvr: #ifdef'ed out overflowed lattice sums on ARM 8f3ddff624

Removal of overflowed lattice sums on ARM should be safe,
since the higher-order lattice sums are only required for highly precise calculations,
not supported by the 64-bit floating point type - long double - on 32-bit ARM CPUs.

[WIP] fmsolvr: disabled AVX assembly in split_complex_mul_acc() on x86 CPUs d028b295e9

[WIP] fmsolvr: allowed bench-rdtscp to compile for ARM CPUs 194c9b240c

[WIP] fmsolvr: replaced GITBRANCH:GITREVISION with FMSOLVR_BUILD_ID e2deb160f9

This change also removes printing COMPILER_FLAGS,
since CMAKE_EXPORT_COMPILE_COMMANDS already covers this functionality.

[WIP] fmsolvr: implemented CMake build system for FMSolvr 8ed618764b

[WIP] fmsolvr: added CMake toolchain files for GNU/Linux platform 22976023a1

[WIP] fmsolvr: implemented Continuous Integration script 48117b98f9

[WIP] fmsolvr: removed hack for missing float128 support in C++ RTTI 1de206b503

Support for the __float128 type in C++ RTTI facilities
(typeid() operator and std::type_info class)
was not implemented until GCC 5: - https://godbolt.org/z/bqcPj4

[GCC Bugzilla] - Incomplete C++ library support for __float128
 - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43622

This change drops support for GCC 4.9 and older versions, since it
removes function template specialization get_name_of_type<__float128>().

[WIP] fmsolvr: reduced number of iterations in microbenchmarks 038373e4c1

[WIP] fmsolvr: fake successful runs of bench_rdtscp on ARM CPUs 12ee6f7f81

[WIP] fmsolvr: lowered required CMake version in FindGCCquadmath 2fe9bbf841

[WIP] fmsolvr: removed FMSolvr GNU Make build system efc103b6cc

[WIP] fmsolvr: moved tables text files to the tables directory ab975c28bd

[WIP] fmsolvr: removed obsolete mockup for FMSolvr SW architecture 3e09d13a09

[WIP] fmsolvr: removed float4 and double4 from coulomb 350115a6d5

Structs float4 and double4 are not utilizing SIMD instructions optimally,
since 4th vector element is unused, wasting memory and computation resources.

[WIP] fmsolvr: backported build-<cpu>.mk Makefiles from WIP/lgpl21 branch 981d7c6006

This change enables execution of GNU Make directives specific to
a particular CPU architecture, which is required by upcoming changes.

[WIP] fmsolvr: backported build-<cxx>.mk Makefiles from WIP/lgpl21 branch d1729d7120

This change introduces the COMPILER GNU Make variable,
enabling detection of C++ compiler in FMSolvr's GNU Make build system,
which is required by upcoming changes.

[WIP] fmsolvr: excluded default_allocator<> from FMSolvr's interface 548e612cfb

The FMSolvr is a generic library, implementing the Fast Multiple Method,
which doesn't rely on particular allocation strategy,
threading facilities or representation of floating point numbers.

This change decouples the default_allocator<> from FMSolvr,
because default_allocator<> is defined for the purpose of testing
the FMSolvr library using std::allocator.

[WIP] fmsolvr: minor tweaks removing warnings 15c86f277d

[WIP] fmsolvr: improved consistency of the Real type 5f9fa25f63

This change introduces USE_DOUBLE macro for consistency
with existing USE_FLOAT and USE_LONGDOUBLE macros.

In order for the Real type to be defined,
one of the above macros must be defined.

Also, this change adds support for choosing floating point type,
used for FMM computations, to the FMSolvr's GNU Make build system.

[WIP] fmsolvr: replaced '-march=native' with '-mcpu=native' on ARM bd010058b8