Back to overview Show all

Original article (peer-reviewed)

Journal Journal of Scientific Computing
Volume (Issue) 78(2)
Page(s) 1174 - 1206
Title of proceedings Journal of Scientific Computing
DOI 10.1007/s10915-018-0809-4

Open Access

Type of Open Access Repository (Green Open Access)


In this work, we consider the reformulation of hierarchical (H) matrix algorithms for many- core processors with a model implementation on graphics processing units (GPUs). H matrices approximate specific dense matrices, e.g., from discretized integral equations or kernel ridge regression, leading to log-linear time complexity in dense matrix–vector prod- ucts. The parallelization of H matrix operations on many-core processors is difficult due to the complex nature of the underlying algorithms. While previous algorithmic advances for many-core hardware focused on accelerating existing H matrix CPU implementations by many-core processors, we here aim at totally relying on that processor type. As main contribution, we introduce the necessary parallel algorithmic patterns allowing to map the full H matrix construction and the fast matrix–vector product to many-core hardware. Here, crucial ingredients are space filling curves, parallel tree traversal and batching of linear alge- bra operations. The resulting model GPU implementation hmglib is the, to the best of the authors knowledge, first entirely GPU-based Open Source H matrix library of this kind. We investigate application examples as present in kernel ridge regression, Gaussian Process Regression and kernel-based interpolation. In this context, an in-depth performance analysis and a comparative performance study against a standard multi-core CPU H matrix library highlights profound speedups of our many-core parallel approach.