Benchmarks¶
We conducted benchmarks to evaluate the performance of the \(\mathcal{H}\)-matrix implementation and identify areas for improvement, results are presented here. The benchmarks were conducted on different thread configurations and problem sizes, and we measured:
execution times, and performance of our algorithm.
resulting compression,
obtained using our in-house \(\mathcal{H}\)-matrix implementation.
In addition, we also provide a detailed explanation of how to use the benchmarks in Usage and how to customize them to suit user specific needs in Customization.
Overview¶
Our aim is to study the execution times and the resulting compression given by different functions with several sets of parameters. The studied functions correspond to the following type of benchmark <bench_type>:
The assembly of \(\mathcal{H}\)-matrices
bench_hmatrix_build,The product of \(\mathcal{H}\)-matrices and matrices
bench_hmatrix_matrix_product,The factorization of \(\mathcal{H}\)-matrices
bench_hmatrix_factorization.
Each feature is tested against at least one of the following test cases <test_case_name>:
The problem size
pbl_size,The number of thread
thread,The ratio of the problem size on the number of thread (weak scaling)
ratio.
The results are saved as CSV files. The above-mentioned names are used in the executables and CSV files names in the format <bench_type>_vs_<test_case_name>.*. For example: bench_hmatrix_build_vs_pbl_size, bench_hmatrix_matrix_product_vs_pbl_size, bench_hmatrix_matrix_product_vs_thread, etc…
To simplify our approach and avoid the complexities associated with solving boundary integral equations using \(\mathcal{H}\)-matrices, such as handling singular integrals and complex geometries, we focus on relatively simple problems for the time being. In the following, we introduce the kernel function and geometry used for all our tests.
Kernels¶
The kernel functions are functions \(G(\boldsymbol{x},\boldsymbol{y})\) where \(\boldsymbol{x}, \boldsymbol{y} \in \mathbb{R}^3\). The following are defined:
Laplace like
Helmholtz like
The user can find several implementation of these kernel functions in include/htool_benchmark/generator_types.hpp and use them in <bench_type>.cpp by replacing LaplaceLikeGenerator by the wanted kernel.
Geometry¶
The geometry is a cylinder defined in include/htool_benchmark/generator_fixture.hpp. For clarity’s sake, we define here the geometry in Python:
from math import pi, cos, sin, sqrt
import numpy as np
def createCylinder(n):
step = 1.75*pi/sqrt(n)
pts = np.empty((3,n))
length = 2*pi
pointsPerCircle = length/step
angleStep = 2*pi/pointsPerCircle
for i in range(0,n-1):
x = cos(angleStep*i)
y = sin(angleStep*i)
z = step*i/pointsPerCircle
pts[0,i] = x
pts[1,i] = y
pts[2,i] = z
return pts
Output format¶
In the result CSV files one can find the following columns:
The tolerance which controls the relative error on block approximation
epsilon,The size of the problem
size,The number of threads
number_of_threads,The type of implementation
policy_type(seq,paroromp_task, see remark below),The index of the repetition
id_repof the benchmark,The compression ratio (number of rows times number of columns divided by number of stored coefficients in low-rank and dense blocks)
compression_ratio,The storage saving ratio (1 - 1 /compression_ratio)
space_saving,The execution time
time. If several times are measured, several columns are added with appropriated names, for example:factorization_timeandsolve_timein \(\mathcal{H}\)-matrix factorization benchmarks.The type of clustering
clustering_type, see Available customizations,The type of low rank generator
low_rank_generator_type, see Available customizations,The symmetry
symmetry_type,The hardware used for the result
hardware_type, andThe version of Htool used
version.
Remark : The par execution policy corresponds to the use of shared parallelism with OpenMP, typically with # pragma omp for instructions if possible (products) or task-based parallelism (factorizations) with # pragma omp task instructions otherwise. The omp_task implementation only uses task-based parallelism with # pragma omp task instructions.
Results¶
The results can be found here.
Usage¶
The repository containing the benchmarks is github.com/PierreMarchand20/htool_benchmarks. Once the repository and its submodules are cloned, the user needs to build the library and make the executables with the following commands in htool_benchmark/ directory:
mkdir build
cd build
cmake ..
# you can also use cmake.. -DCMAKE_BUILD_TYPE=Release_w_vecto
# to enable vectorization
make
Then, to run a benchmark based on a specific test case, the user can run of the following commands:
./bench_hmatrix_build,./bench_hmatrix_matrix_product,./bench_hmatrix_factorization.
It should output all the possible options for each specific benchmark.
Customization¶
The benchmark parameters are accessible in the hpp files bench_hmatrix_build, bench_hmatrix_matrix_product and bench_hmatrix_factorization located in the include/htool_benchmark folder. The user will find the following parameters in the custom parameters section of the code :
number_of_repetitions: the number of repetitions of the benchmark (see remark below),List_policy_type: the set of implementations,List_epsilon: the set of epsilon values (the tolerance which controls the relative error on block approximation),eta: constant in the admissibility condition, see Hierarchical clustering,List_pbl_size: the set of problem size values,List_thread: the set of values for the number of threads,number_of_products: the number of timed \(\mathcal{H}\)-matrix products,number_of_solves: the number of timed linear system solves,trans: form of the system of equations \(A * X = B\) or \(A^{**}T * X = B\).