Hierarchical matrices¶

We give here a quick overview on hierarchical matrices, and we refer to Hackbusch[1], Bebendorf[2], Börm, Grasedyck, and Hackbusch[3] for a detailed presentation. The presentation and figures are taken from Marchand[4].

Low-rank compression¶

Low-rank matrix¶

Let \(\mathbf{B}\in \mathbb{C}^{M\times N}\) be a low-rank matrix, i.e, there exist \(\mathbf{u}_j\in \mathbb{C}^M\), \(\mathbf{v}_j\in \mathbb{C}^N\) with \(1\leq j \leq r \leq \min (M,N)\) such that

\[\mathbf{B} = \sum_{j=1}^r \mathbf{u}_j \mathbf{v}_j^T.\]

Note

Remark that, using the previous expression of \(\mathbf{B}\), storage cost and complexity of matrix vector product is \((M+N)r\), which is lower than for the usual dense format as soon as \(r< \frac{MN}{M+N}\).

Low-rank approximation¶

Usually, matrices of interest are not low-rank, but they may be well-approximated by low-rank matrices. To build such approximation, one can use a truncated Singular Value Decomposition (SVD):

\[\mathbf{B}^{(r)} = \sum_{j=1}^r \sigma_j \mathbf{u}_j \mathbf{v}_j^T,\]

where \((\sigma_j)_{j=1}^r\) are the singular values of \(\mathbf{B}\) in decreasing order. Then, the approximation is

\[\begin{align*} \Vert\mathbf{B} -\mathbf{B}^{(r)}\Vert_{2}^{2} = \sigma_{r+1}^{2} \quad\textrm{and}\quad \Vert\mathbf{B} -\mathbf{B}^{(r)}\Vert_{\mathrm{F}}^{2} = \sum_{j=r+1}^{n}\sigma_{j}^{2}. \end{align*}\]

In conclusion, a matrix with sufficiently decreasing singular values can be well-approximated by a low-rank approximation.

Note

A truncated SVD is actually the best low-rank approximation possible (see Eckart-Young-Mirsky theorem).

Warning

Computing a SVD requires the whole matrix, which would require storing all the coefficients, and this is exactly what we want to avoid. There exist several techniques to approximate a truncated SVD computing only some coefficients of the initial matrix, for example, Adaptive Cross Approximation (ACA) or randomized SVD.

Hierarchical clustering¶

Generally, matrices of interest are not low-rank matrices, and do not have rapidly decreasing singular values. But they can have sub-blocks with rapidly decreasing singular values. In particular, this is the case of matrices stemming from the discretisation of asymptotically smooth kernel, \(\kappa (x,y)\), i.e, for two clusters of geometric points \(X\) and \(Y\),

\[\rvert \partial_x^{\alpha} \partial_y^{\beta}\kappa (x,y)\lvert \leq C_{\mathrm{as}}\lvert x - y\rvert^{-\lvert \alpha \rvert -\lvert \beta \rvert - s}.\]

In this case, sub-blocks corresponding to the interaction between two clusters \(X\) and \(Y\) satisfying an admissibility condition,

\[\max (\operatorname{diam} (X), \operatorname{diam}(Y)) \leq \eta \operatorname{dist}(X,Y),\]

have exponentially decreasing singular values. Thus, they can be well-approximated by low-rank matrices.

Geometric clustering¶

To identify sub-blocks satisfying the admissibility condition, a hierarchical partition of the geometry is introduced via a cluster tree. Several strategies can be adopted to produce such partitions. A simple example is to partition the geometry by dividing recursively the set of points into two, for example,

where \(Cl_{i}^j\) is the \(i\) th cluster at level \(j\), and each figure shows one level of the cluster tree.

Block cluster tree¶

The geometric clustering described previously defines a block cluster tree as described in the following figure.

Then, hierarchical matrices are built traversing the block cluster tree starting from its root, and

If the current block satisfies the admissibility condition, we approximate it using a low-rank matrix.
If the current block does not satisfy the admissibility condition
- If it is leaf, we compute the block as a dense matrix,
- If it is not a leaf, we check the children of the current block.