GPU build

This commit is contained in:
Ruipeng Li 2021-05-25 15:11:29 -07:00
parent 3bc7d267ef
commit 0bb67902bb
2 changed files with 46 additions and 11 deletions

View File

@ -163,7 +163,7 @@ include:
GPU build
------------------------------------------------------------------------------
Hypre can support GPUs with CUDA and OpenMP (:math:`{\ge}` 4.5). The related ``configure`` options are
Hypre can support NVIDIA GPUs with CUDA and OpenMP (:math:`{\ge}` 4.5). The related ``configure`` options are
.. code-block:: none
@ -177,15 +177,23 @@ The related environment variables
.. code-block:: none
HYPRE_CUDA_SM (default 60)
HYPRE_CUDA_SM (default 70)
CUDA_HOME the CUDA home directory
need to be set properly.
need to be set properly, which can be also set by
.. code-block:: none
--with-gpu-arch=ARG (e.g., --with-gpu-arch='60 70')
--with-cuda-home=DIR
When configured with ``--with-cuda`` or ``--with-device-openmp``, the memory allocated on the GPUs, by default, is the GPU device memory, which is not accessible from the CPUs.
Hypre's Struct solvers can work fine with only device memory,
whereas BoomerAMG and the SStruct solvers require unified (CUDA managed) memory, for which
Hypre's structured solvers can work fine with device memory,
whereas only selected unstructured solvers can run with device memory. (see [Running on GPUs](https://github.com/hypre-space/hypre/wiki) for details).
In general, BoomerAMG and the SStruct
require unified (CUDA managed) memory, for which
the following option should be added
.. code-block:: none
@ -207,13 +215,36 @@ The ``configure`` options are
To run on the GPUs with RAJA and Kokkos, the options ``--with-cuda`` and ``--with-device-openmp`` are also needed,
and the RAJA and Kokkos libraries should be built with CUDA or OpenMP 4.5 correspondingly.
The other GPU related options include:
The other NVIDIA GPU related options include:
* ``--enable-nvtx``: enable NVTX annotations for CUDA profilers
* ``--enable-device-memory-pool`` : enable the caching GPU memory allocator in hypre
* ``--enable-cusparse`` : choose cuSPARSE for GPU sparse kernels
* ``--enable-cublas`` : choose cuBLAS for GPU dense kernels
* ``--enable-curand`` : generating random numbers on GPUs
* ``--enable-gpu-profiling`` Use NVTX on CUDA, rocTX on HIP (default is NO)
* ``--enable-cusparse`` Use cuSPARSE for GPU sparse kernels (default is YES)
* ``--enable-cublas`` Use cuBLAS for GPU dense kernels (default is NO)
* ``--enable-curand`` Use random numbers generators on GPUs (default is YES)
Allocations and deallocations of GPU memory are expensive. Memory pooling is a common approach to reduce such overhead and improve performance.
hypre provides caching allocators for GPU device memory and unified memory, and also the support for Umpire [Umpire]_.
To enable GPU memory pool, include only one of the following options:
.. code-block:: none
--enable-device-memory-pool Enable the caching GPU memory allocator in hypre (default is NO)
--with-umpire --with-umpire-include=/path-of-umpire-install/include
--with-umpire-lib-dirs=/path-of-umpire-install/lib
--with-umpire-libs=umpire Use Umpire Allocator for device and unified memory (default is NO)
For running on AMD GPUs, configure with
.. code-block:: none
--with-hip Use HIP for AMD GPUs. (default is NO)
--with-gpu-arch=ARG Use appropriate AMD GPU architecture
Currently, only BoomerAMG is supported with HIP. The other AMD GPU related options include:
* ``--enable-gpu-profiling`` Use NVTX on CUDA, rocTX on HIP (default is NO)
* ``--enable-rocsparse`` Use rocSPARSE (default is YES)
* ``--enable-rocblas`` Use rocBLAS (default is NO)
* ``--enable-rocrand`` Use rocRAND (default is YES)
Testing the Library
==============================================================================

View File

@ -51,6 +51,9 @@
.. [CMakeWeb] CMake, a cross-platform open-source build system.
`http://www.cmake.org/ <http://www.cmake.org/>`_.
.. [Umpire] Umpire: Managing Heterogeneous Memory Resources.
`https://github.com/LLNL/Umpire <https://github.com/LLNL/Umpire>`_.
.. [DFNY2008] H. De Sterck, R. Falgout, J. Nolting, and U. M. Yang.
Distance-two interpolation for parallel algebraic multigrid. *Numer. Linear
Algebra Appl.*, 15:115--139, 2008. Also available as LLNL technical report
@ -179,3 +182,4 @@
.. [Yang2010] U. M. Yang. On long range interpolation operators for aggressive
coarsening. *Numer. Linear Algebra Appl.*, 17:453--472, 2010. Also
available as LLNL technical report LLLNL-JRNL-417371.