GPU build

2021-05-25 15:11:29 -07:00 · 2021-05-25 15:11:29 -07:00 · 0bb67902bb
commit 0bb67902bb
parent 3bc7d267ef
2 changed files with 46 additions and 11 deletions
--- a/src/docs/usr-manual/ch-misc.rst
+++ b/src/docs/usr-manual/ch-misc.rst
@ -163,7 +163,7 @@ include:
 GPU build
 ------------------------------------------------------------------------------

-Hypre can support GPUs with CUDA and OpenMP (:math:`{\ge}` 4.5). The related ``configure`` options are
+Hypre can support NVIDIA GPUs with CUDA and OpenMP (:math:`{\ge}` 4.5). The related ``configure`` options are

 .. code-block:: none

@ -177,15 +177,23 @@ The related environment variables

 .. code-block:: none

-   HYPRE_CUDA_SM          (default 60)
+   HYPRE_CUDA_SM          (default 70)

   CUDA_HOME              the CUDA home directory

-need to be set properly.
+need to be set properly, which can be also set by
+
+.. code-block:: none
+
+   --with-gpu-arch=ARG    (e.g., --with-gpu-arch='60 70')
+
+   --with-cuda-home=DIR

 When configured with ``--with-cuda`` or ``--with-device-openmp``, the memory allocated on the GPUs, by default, is the GPU device memory, which is not accessible from the CPUs.
-Hypre's Struct solvers can work fine with only  device memory,
-whereas BoomerAMG and the SStruct solvers require  unified (CUDA managed) memory, for which
+Hypre's structured solvers can work fine with device memory,
+whereas only selected unstructured solvers can run with device memory. (see [Running on GPUs](https://github.com/hypre-space/hypre/wiki) for details).
+In general, BoomerAMG and the SStruct
+require  unified (CUDA managed) memory, for which
 the following option should be added

 .. code-block:: none
@ -207,13 +215,36 @@ The ``configure`` options are
 To run on the GPUs with RAJA and Kokkos, the options ``--with-cuda`` and ``--with-device-openmp`` are also needed,
 and the RAJA and Kokkos libraries should be built with CUDA or OpenMP 4.5 correspondingly.

-The other GPU related options include:
+The other NVIDIA GPU related options include:

-* ``--enable-nvtx``: enable NVTX annotations for CUDA profilers
-* ``--enable-device-memory-pool`` : enable the caching GPU memory allocator in hypre
-* ``--enable-cusparse`` : choose cuSPARSE for GPU sparse kernels
-* ``--enable-cublas`` : choose cuBLAS for GPU dense kernels
-* ``--enable-curand`` : generating random numbers on GPUs
+* ``--enable-gpu-profiling``  Use NVTX on CUDA, rocTX on HIP (default is NO)
+* ``--enable-cusparse``       Use cuSPARSE for GPU sparse kernels (default is YES)
+* ``--enable-cublas``         Use cuBLAS for GPU dense kernels (default is NO)
+* ``--enable-curand``         Use random numbers generators on GPUs (default is YES)
+
+Allocations and deallocations of GPU memory are expensive. Memory pooling is a common approach to reduce such overhead and improve performance.
+hypre provides caching allocators for GPU device memory and unified memory, and also the support for Umpire [Umpire]_.
+To enable GPU memory pool, include only one of the following options:
+
+.. code-block:: none
+  --enable-device-memory-pool  Enable the caching GPU memory allocator in hypre (default is NO)
+
+  --with-umpire --with-umpire-include=/path-of-umpire-install/include
+  --with-umpire-lib-dirs=/path-of-umpire-install/lib
+  --with-umpire-libs=umpire    Use Umpire Allocator for device and unified memory (default is NO)
+
+For running on AMD GPUs, configure with
+.. code-block:: none
+
+  --with-hip              Use HIP for AMD GPUs. (default is NO)
+  --with-gpu-arch=ARG     Use appropriate AMD GPU architecture
+
+Currently, only BoomerAMG is supported with HIP. The other AMD GPU related options include:
+
+* ``--enable-gpu-profiling``  Use NVTX on CUDA, rocTX on HIP (default is NO)
+* ``--enable-rocsparse``      Use rocSPARSE (default is YES)
+* ``--enable-rocblas``        Use rocBLAS (default is NO)
+* ``--enable-rocrand``        Use rocRAND (default is YES)

 Testing the Library
 ==============================================================================
--- a/src/docs/usr-manual/ch-references.rst
+++ b/src/docs/usr-manual/ch-references.rst
@ -51,6 +51,9 @@
 .. [CMakeWeb] CMake, a cross-platform open-source build system.
   `http://www.cmake.org/ <http://www.cmake.org/>`_.

+.. [Umpire] Umpire: Managing Heterogeneous Memory Resources.
+   `https://github.com/LLNL/Umpire <https://github.com/LLNL/Umpire>`_.
+
 .. [DFNY2008] H. De Sterck, R. Falgout, J. Nolting, and U. M. Yang.
   Distance-two interpolation for parallel algebraic multigrid.  *Numer. Linear
   Algebra Appl.*, 15:115--139, 2008.  Also available as LLNL technical report
@ -179,3 +182,4 @@
 .. [Yang2010] U. M. Yang.  On long range interpolation operators for aggressive
   coarsening.  *Numer. Linear Algebra Appl.*, 17:453--472, 2010.  Also
   available as LLNL technical report LLLNL-JRNL-417371.
+