GPU_completeTree #1

Closed
k.dobrzycki wants to merge 112 commits from k.dobrzycki:GPU_completeTree into develop_completeTree
First-time contributor

Implemented GPU Octree (CUDA).
CPU / GPU Octree are differentiated by specialisation of target template parameter (defaulted to DEVICE::CPU).
Merged common tree functions with present CPU Octree with CRTP into OctreeImpl.
Take note of changed interface - CPU Octree now uses out-parameters for consistency with GPU Octree. See the tests for example usage. This is the only interface change of CPU Octree.
GPU Octree storage should be batch-accessed on the host via View to avoid latency bottlenecks.
GPU Octree box functions such as near/far field, parent, child are marked __device__ and must be used through kernel launches. The kernels expect 1-dimensional launch parameters which correspond to SFC indices. A SFC index offset parameter is available to specify a subrange of a tree level. See the cuda_kernels namespace for the interface and tests for usage.
Objects which are used in kernels e.g. via member functions should be used with DeviceCopiedObject for automatic resource lifetime. Similarly, DeviceMemory provides an automatic device pointer.

Implemented GPU Octree (CUDA). CPU / GPU Octree are differentiated by specialisation of `target` template parameter (defaulted to `DEVICE::CPU`). Merged common tree functions with present CPU Octree with CRTP into `OctreeImpl`. Take note of changed interface - CPU Octree now uses out-parameters for consistency with GPU Octree. See the tests for example usage. This is the only interface change of CPU Octree. GPU Octree storage should be batch-accessed on the host via `View` to avoid latency bottlenecks. GPU Octree box functions such as near/far field, parent, child are marked `__device__` and must be used through kernel launches. The kernels expect 1-dimensional launch parameters which correspond to SFC indices. A SFC index offset parameter is available to specify a subrange of a tree level. See the `cuda_kernels` namespace for the interface and tests for usage. Objects which are used in kernels e.g. via member functions should be used with `DeviceCopiedObject` for automatic resource lifetime. Similarly, `DeviceMemory` provides an automatic device pointer.
Author
First-time contributor

Maybe should also specialise all basic tree functions on DEVICE

Maybe should also specialise all basic tree functions on DEVICE
const of the elements, st eg const vector can be passed
new name fits more the general naming
duplicates are allowed and wanted. They define the point range.
A constexpr specifier used in a function declaration implies inline.

https://en.cppreference.com/w/cpp/language/constexpr
splitting
  using uint  = typename sfc_type::sfcindex_type;
into
  using sfcidx_t = typename sfc_type::sfcindex_type;
  using dim_t    = typename sfc_type::dimension_type;
  using depth_t  = typename sfc_type::depth_type;

User could choose different types for the sfc index, dimension index and depth. For each, packed and unpacked sfc.
If sfc_type_in::depth_max() > sfc_type_out::depth_max() would be allowed we could run into undefined behaviors.

Note: e.g. if sfc_type_in::depth_max() = sfc_type_out::depth_max = 21 and dim_t of sfc_type_in is 64 bit and dim_t of sfc_type_out 32 bit, we would cast a 64 bit integer into 32 bit. But since sfc_type_in::depth_max() = 21, only the first 21 bits of each coordinate are used.
The depth of the tree is limited by the sfc type.
The input parameter boxsfcindex_d of the tree constructor is a sorted array of particle SFC indices on depth d. Indices can accrue multiple times, since particle can be assigned to the same box of depth d. The length of the array is equal to the total number of particles in the system. The new name particle_sfc_indices therefore suits more.
Construct index_storage with d_ + 1 copies of elements with value 1.
cost ref to pointrang. Before it was possible to adapt private data.
Use type alias in some member function to avoid multiple use of long type names.
see commit 3452da9da3
# Conflicts:
#	CMakeLists.txt
#	include/jsc-octree/octree.hpp
View has DeviceMemory cons
fixed 'not-found' index bug
FFS test box indices
k.dobrzycki closed this pull request 2024-10-01 15:24:30 +02:00

Pull request closed

Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: i.lilikakis/jsc-octree#1
No description provided.