GPU build
This commit is contained in:
parent
3bc7d267ef
commit
0bb67902bb
@ -163,7 +163,7 @@ include:
|
|||||||
GPU build
|
GPU build
|
||||||
------------------------------------------------------------------------------
|
------------------------------------------------------------------------------
|
||||||
|
|
||||||
Hypre can support GPUs with CUDA and OpenMP (:math:`{\ge}` 4.5). The related ``configure`` options are
|
Hypre can support NVIDIA GPUs with CUDA and OpenMP (:math:`{\ge}` 4.5). The related ``configure`` options are
|
||||||
|
|
||||||
.. code-block:: none
|
.. code-block:: none
|
||||||
|
|
||||||
@ -177,15 +177,23 @@ The related environment variables
|
|||||||
|
|
||||||
.. code-block:: none
|
.. code-block:: none
|
||||||
|
|
||||||
HYPRE_CUDA_SM (default 60)
|
HYPRE_CUDA_SM (default 70)
|
||||||
|
|
||||||
CUDA_HOME the CUDA home directory
|
CUDA_HOME the CUDA home directory
|
||||||
|
|
||||||
need to be set properly.
|
need to be set properly, which can be also set by
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
--with-gpu-arch=ARG (e.g., --with-gpu-arch='60 70')
|
||||||
|
|
||||||
|
--with-cuda-home=DIR
|
||||||
|
|
||||||
When configured with ``--with-cuda`` or ``--with-device-openmp``, the memory allocated on the GPUs, by default, is the GPU device memory, which is not accessible from the CPUs.
|
When configured with ``--with-cuda`` or ``--with-device-openmp``, the memory allocated on the GPUs, by default, is the GPU device memory, which is not accessible from the CPUs.
|
||||||
Hypre's Struct solvers can work fine with only device memory,
|
Hypre's structured solvers can work fine with device memory,
|
||||||
whereas BoomerAMG and the SStruct solvers require unified (CUDA managed) memory, for which
|
whereas only selected unstructured solvers can run with device memory. (see [Running on GPUs](https://github.com/hypre-space/hypre/wiki) for details).
|
||||||
|
In general, BoomerAMG and the SStruct
|
||||||
|
require unified (CUDA managed) memory, for which
|
||||||
the following option should be added
|
the following option should be added
|
||||||
|
|
||||||
.. code-block:: none
|
.. code-block:: none
|
||||||
@ -207,13 +215,36 @@ The ``configure`` options are
|
|||||||
To run on the GPUs with RAJA and Kokkos, the options ``--with-cuda`` and ``--with-device-openmp`` are also needed,
|
To run on the GPUs with RAJA and Kokkos, the options ``--with-cuda`` and ``--with-device-openmp`` are also needed,
|
||||||
and the RAJA and Kokkos libraries should be built with CUDA or OpenMP 4.5 correspondingly.
|
and the RAJA and Kokkos libraries should be built with CUDA or OpenMP 4.5 correspondingly.
|
||||||
|
|
||||||
The other GPU related options include:
|
The other NVIDIA GPU related options include:
|
||||||
|
|
||||||
* ``--enable-nvtx``: enable NVTX annotations for CUDA profilers
|
* ``--enable-gpu-profiling`` Use NVTX on CUDA, rocTX on HIP (default is NO)
|
||||||
* ``--enable-device-memory-pool`` : enable the caching GPU memory allocator in hypre
|
* ``--enable-cusparse`` Use cuSPARSE for GPU sparse kernels (default is YES)
|
||||||
* ``--enable-cusparse`` : choose cuSPARSE for GPU sparse kernels
|
* ``--enable-cublas`` Use cuBLAS for GPU dense kernels (default is NO)
|
||||||
* ``--enable-cublas`` : choose cuBLAS for GPU dense kernels
|
* ``--enable-curand`` Use random numbers generators on GPUs (default is YES)
|
||||||
* ``--enable-curand`` : generating random numbers on GPUs
|
|
||||||
|
Allocations and deallocations of GPU memory are expensive. Memory pooling is a common approach to reduce such overhead and improve performance.
|
||||||
|
hypre provides caching allocators for GPU device memory and unified memory, and also the support for Umpire [Umpire]_.
|
||||||
|
To enable GPU memory pool, include only one of the following options:
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
--enable-device-memory-pool Enable the caching GPU memory allocator in hypre (default is NO)
|
||||||
|
|
||||||
|
--with-umpire --with-umpire-include=/path-of-umpire-install/include
|
||||||
|
--with-umpire-lib-dirs=/path-of-umpire-install/lib
|
||||||
|
--with-umpire-libs=umpire Use Umpire Allocator for device and unified memory (default is NO)
|
||||||
|
|
||||||
|
For running on AMD GPUs, configure with
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
--with-hip Use HIP for AMD GPUs. (default is NO)
|
||||||
|
--with-gpu-arch=ARG Use appropriate AMD GPU architecture
|
||||||
|
|
||||||
|
Currently, only BoomerAMG is supported with HIP. The other AMD GPU related options include:
|
||||||
|
|
||||||
|
* ``--enable-gpu-profiling`` Use NVTX on CUDA, rocTX on HIP (default is NO)
|
||||||
|
* ``--enable-rocsparse`` Use rocSPARSE (default is YES)
|
||||||
|
* ``--enable-rocblas`` Use rocBLAS (default is NO)
|
||||||
|
* ``--enable-rocrand`` Use rocRAND (default is YES)
|
||||||
|
|
||||||
Testing the Library
|
Testing the Library
|
||||||
==============================================================================
|
==============================================================================
|
||||||
|
|||||||
@ -51,6 +51,9 @@
|
|||||||
.. [CMakeWeb] CMake, a cross-platform open-source build system.
|
.. [CMakeWeb] CMake, a cross-platform open-source build system.
|
||||||
`http://www.cmake.org/ <http://www.cmake.org/>`_.
|
`http://www.cmake.org/ <http://www.cmake.org/>`_.
|
||||||
|
|
||||||
|
.. [Umpire] Umpire: Managing Heterogeneous Memory Resources.
|
||||||
|
`https://github.com/LLNL/Umpire <https://github.com/LLNL/Umpire>`_.
|
||||||
|
|
||||||
.. [DFNY2008] H. De Sterck, R. Falgout, J. Nolting, and U. M. Yang.
|
.. [DFNY2008] H. De Sterck, R. Falgout, J. Nolting, and U. M. Yang.
|
||||||
Distance-two interpolation for parallel algebraic multigrid. *Numer. Linear
|
Distance-two interpolation for parallel algebraic multigrid. *Numer. Linear
|
||||||
Algebra Appl.*, 15:115--139, 2008. Also available as LLNL technical report
|
Algebra Appl.*, 15:115--139, 2008. Also available as LLNL technical report
|
||||||
@ -179,3 +182,4 @@
|
|||||||
.. [Yang2010] U. M. Yang. On long range interpolation operators for aggressive
|
.. [Yang2010] U. M. Yang. On long range interpolation operators for aggressive
|
||||||
coarsening. *Numer. Linear Algebra Appl.*, 17:453--472, 2010. Also
|
coarsening. *Numer. Linear Algebra Appl.*, 17:453--472, 2010. Also
|
||||||
available as LLNL technical report LLLNL-JRNL-417371.
|
available as LLNL technical report LLLNL-JRNL-417371.
|
||||||
|
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user