Benchmarks

We conducted benchmarks to evaluate the performance of the \(\mathcal{H}\)-matrix implementation and identify areas for improvement, results are presented here. The benchmarks were conducted on different thread configurations and problem sizes, and we measured:

  • execution times, and performance of our algorithm.

  • resulting compression,

obtained using our in-house \(\mathcal{H}\)-matrix implementation.

In addition, we also provide a detailed explanation of how to use the benchmarks in Usage and how to customize them to suit user specific needs in Customization.

Overview

Our aim is to study the execution times and the resulting compression given by different functions with several sets of parameters. The studied functions correspond to the following type of benchmark <bench_type>:

  • The assembly of \(\mathcal{H}\)-matrices bench_hmatrix_build,

  • The product of \(\mathcal{H}\)-matrices and matrices bench_hmatrix_matrix_product,

  • The factorization of \(\mathcal{H}\)-matrices bench_hmatrix_factorization.

Each feature is tested against at least one of the following test cases <test_case_name>:

  • The problem size pbl_size,

  • The number of thread thread,

  • The ratio of the problem size on the number of thread (weak scaling) ratio.

The results are saved as CSV files. The above-mentioned names are used in the executables and CSV files names in the format <bench_type>_vs_<test_case_name>.*. For example: bench_hmatrix_build_vs_pbl_size, bench_hmatrix_matrix_product_vs_pbl_size, bench_hmatrix_matrix_product_vs_thread, etc…

To simplify our approach and avoid the complexities associated with solving boundary integral equations using \(\mathcal{H}\)-matrices, such as handling singular integrals and complex geometries, we focus on relatively simple problems for the time being. In the following, we introduce the kernel function and geometry used for all our tests.

Kernels

The kernel functions are functions \(G(\boldsymbol{x},\boldsymbol{y})\) where \(\boldsymbol{x}, \boldsymbol{y} \in \mathbb{R}^3\). The following are defined:

  • Laplace like

\[G(\boldsymbol{x},\boldsymbol{y})=\frac{1}{|\boldsymbol{x} - \boldsymbol{y}| + 10^{-5}}\]
  • Helmholtz like

\[G(\boldsymbol{x},\boldsymbol{y})=\frac{e^{i10 |\boldsymbol{x} - \boldsymbol{y}|}}{|\boldsymbol{x} - \boldsymbol{y}| + 10^{-5}}\]

The user can find several implementation of these kernel functions in include/htool_benchmark/generator_types.hpp and use them in <bench_type>.cpp by replacing LaplaceLikeGenerator by the wanted kernel.

Geometry

The geometry is a cylinder defined in include/htool_benchmark/generator_fixture.hpp. For clarity’s sake, we define here the geometry in Python:

from math import pi, cos, sin, sqrt
import numpy as np

def createCylinder(n):
    step = 1.75*pi/sqrt(n)
    pts = np.empty((3,n))
    length          = 2*pi
    pointsPerCircle = length/step
    angleStep       = 2*pi/pointsPerCircle
    for i in range(0,n-1):
        x = cos(angleStep*i)
        y = sin(angleStep*i)
        z = step*i/pointsPerCircle
        pts[0,i] = x
        pts[1,i] = y
        pts[2,i] = z
    return pts

Output format

In the result CSV files one can find the following columns:

  • The tolerance which controls the relative error on block approximation epsilon,

  • The size of the problem size,

  • The number of threads number_of_threads,

  • The type of implementation policy_type (seq, par or omp_task, see remark below),

  • The index of the repetition id_rep of the benchmark,

  • The compression ratio (number of rows times number of columns divided by number of stored coefficients in low-rank and dense blocks) compression_ratio,

  • The storage saving ratio (1 - 1 /compression_ratio) space_saving,

  • The execution time time. If several times are measured, several columns are added with appropriated names, for example: factorization_time and solve_time in \(\mathcal{H}\)-matrix factorization benchmarks.

  • The type of clustering clustering_type, see Available customizations,

  • The type of low rank generator low_rank_generator_type, see Available customizations,

  • The symmetry symmetry_type,

  • The hardware used for the result hardware_type, and

  • The version of Htool used version.

Remark : The par execution policy corresponds to the use of shared parallelism with OpenMP, typically with # pragma omp for instructions if possible (products) or task-based parallelism (factorizations) with # pragma omp task instructions otherwise. The omp_task implementation only uses task-based parallelism with # pragma omp task instructions.

Results

The results can be found here.

Usage

The repository containing the benchmarks is github.com/PierreMarchand20/htool_benchmarks. Once the repository and its submodules are cloned, the user needs to build the library and make the executables with the following commands in htool_benchmark/ directory:

mkdir build
cd build
cmake ..
# you can also use cmake.. -DCMAKE_BUILD_TYPE=Release_w_vecto
# to enable vectorization
make

Then, to run a benchmark based on a specific test case, the user can run of the following commands:

  • ./bench_hmatrix_build,

  • ./bench_hmatrix_matrix_product,

  • ./bench_hmatrix_factorization.

It should output all the possible options for each specific benchmark.

Customization

The benchmark parameters are accessible in the hpp files bench_hmatrix_build, bench_hmatrix_matrix_product and bench_hmatrix_factorization located in the include/htool_benchmark folder. The user will find the following parameters in the custom parameters section of the code :

  • number_of_repetitions: the number of repetitions of the benchmark (see remark below),

  • List_policy_type : the set of implementations,

  • List_epsilon : the set of epsilon values (the tolerance which controls the relative error on block approximation),

  • eta: constant in the admissibility condition, see Hierarchical clustering,

  • List_pbl_size: the set of problem size values,

  • List_thread: the set of values for the number of threads,

  • number_of_products : the number of timed \(\mathcal{H}\)-matrix products,

  • number_of_solves : the number of timed linear system solves,

  • trans : form of the system of equations \(A * X = B\) or \(A^{**}T * X = B\).