Doc updates (#974)
* Updated documentation for clarity and to clean up a few typos. * Add warning messages to FEI, ParaSails, PILUT, Euclid. * Improved and updated GPU information * Added CMake build information
This commit is contained in:
		
							parent
							
								
									57862ef6e2
								
							
						
					
					
						commit
						27b8471742
					
				
							
								
								
									
										2
									
								
								.gitignore
									
									
									
									
										vendored
									
									
								
							
							
						
						
									
										2
									
								
								.gitignore
									
									
									
									
										vendored
									
									
								
							| @ -19,11 +19,13 @@ cmbuild/ | ||||
| ############### | ||||
| # Documentation | ||||
| ############### | ||||
| src/docs/ref-manual/html/ | ||||
| src/docs/ref-manual-html/ | ||||
| src/docs/ref-manual.pdf | ||||
| src/docs/ref-manual/latex/ | ||||
| src/docs/ref-manual/xml/ | ||||
| src/docs/usr-manual-html/ | ||||
| src/docs/usr-manual/html/ | ||||
| src/docs/usr-manual.pdf | ||||
| src/docs/usr-manual/_build/ | ||||
| 
 | ||||
|  | ||||
| @ -10,6 +10,11 @@ | ||||
| Finite Element Interface | ||||
| ****************************************************************************** | ||||
| 
 | ||||
| .. warning:: | ||||
|    FEI is not actively supported by the hypre development team. For similar | ||||
|    functionality, we recommend using :ref:`sec-Block-Structured-Grids-FEM`, which | ||||
|    allows the representation of block-structured grid problems via hypre's | ||||
|    SStruct interface. | ||||
| 
 | ||||
| Introduction | ||||
| ============================================================================== | ||||
| @ -173,4 +178,3 @@ right hand side, a signal is sent to the FEI via | ||||
| .. code-block:: c++ | ||||
| 
 | ||||
|    feiPtr -> loadComplete(); | ||||
| 
 | ||||
|  | ||||
| @ -127,7 +127,7 @@ As previously noted, on most systems hypre can be built by simply typing | ||||
| Alternatively, the CMake system [CMakeWeb]_ can be used, and is the best | ||||
| approach for building hypre on Windows systems in particular.  For more detailed | ||||
| instructions, read the ``INSTALL`` file provided with the hypre distribution or | ||||
| refer to the last chapter in this manual.  Note the following requirements: | ||||
| the :ref:`ch-General` section of this manual.  Note the following requirements: | ||||
| 
 | ||||
| * To run in parallel, hypre requires an installation of MPI. | ||||
| 
 | ||||
|  | ||||
| @ -14,24 +14,21 @@ General Information | ||||
| Getting the Source Code | ||||
| ============================================================================== | ||||
| 
 | ||||
| The hypre distribution tar file is available from the Software link of the hypre | ||||
| web page, http://www.llnl.gov/CASC/hypre/.  The hypre Software distribution page | ||||
| allows access to the tar files of the latest and previous general and beta | ||||
| distributions as well as documentation. | ||||
| 
 | ||||
| The most recent hypre distribution is available at | ||||
| https://github.com/hypre-space/hypre/tags along with previous distribution versions. | ||||
| 
 | ||||
| Building the Library | ||||
| ============================================================================== | ||||
| 
 | ||||
| In this and the following several sections, we discuss the steps to install and | ||||
| use hypre on a Unix-like operating system, such as Linux, AIX, and Mac OS X. | ||||
| Alternatively, the CMake build system [CMakeWeb]_ can be used, and is the best | ||||
| approach for building hypre on Windows systems in particular (see the | ||||
| ``INSTALL`` file for details). | ||||
| use hypre.  First, we focus on the primary method targeting Unix-like operating | ||||
| systems, such as Linux, AIX, and Mac OS X.  Then in `CMake instructions`_, we | ||||
| explain an alternative approach using the CMake build system [CMakeWeb]_, which | ||||
| is the best approach for building hypre on Windows systems in particular. | ||||
| 
 | ||||
| After unpacking the hypre tar file, the source code will be in the ``src`` | ||||
| sub-directory of a directory named hypre-VERSION, where VERSION is the current | ||||
| version number (e.g., hypre-1.8.4, with a "b" appended for a beta release). | ||||
| version number (e.g., hypre-2.29.0). | ||||
| 
 | ||||
| Move to the ``src`` sub-directory to build hypre for the host platform.  The | ||||
| simplest method is to configure, compile and install the libraries in | ||||
| @ -87,7 +84,7 @@ is to display the help package, by executing ``./configure --help``, which also | ||||
| includes the usage information.  The user can mix and match the configure | ||||
| options and variable settings to meet their needs. | ||||
| 
 | ||||
| Some of the commonly used options include: | ||||
| Some commonly used options include: | ||||
| 
 | ||||
| .. code-block:: none | ||||
| 
 | ||||
| @ -102,10 +99,12 @@ Some of the commonly used options include: | ||||
|    --with-openmp                  Use OpenMP. This may affect which compiler is | ||||
|                                   chosen. | ||||
|    --enable-bigint                Use long long int for HYPRE_Int (default is NO). | ||||
|                                   NOTE: This option is not available for Nvidia | ||||
|                                   and AMD GPUs. | ||||
|    --enable-mixedint              Use long long int for HYPRE_BigInt and int for | ||||
|                                   HYPRE_Int. | ||||
|                                   NOTE: This option disables Euclid, ParaSails, | ||||
|                                         pilut and CGC coarsening. | ||||
|                                         PILUT and CGC coarsening. | ||||
| 
 | ||||
| The user can mix and match the configure options and variable settings to meet | ||||
| their needs.  It should be noted that hypre can be configured with external BLAS | ||||
| @ -167,7 +166,7 @@ Hypre can support NVIDIA GPUs with CUDA and OpenMP (:math:`{\ge}` 4.5). The rela | ||||
| 
 | ||||
| .. code-block:: none | ||||
| 
 | ||||
|   --with-cuda             Use CUDA. Require cuda-8.0 or higher (default is | ||||
|   --with-cuda             Use CUDA. Require cuda-9.0 or higher (default is | ||||
|                           NO). | ||||
| 
 | ||||
|   --with-device-openmp    Use OpenMP 4.5 Device Directives. This may affect | ||||
| @ -190,12 +189,11 @@ need to be set properly, which can be also set by | ||||
|    --with-cuda-home=DIR | ||||
| 
 | ||||
| When configured with ``--with-cuda`` or ``--with-device-openmp``, the memory allocated on the GPUs, by default, is the GPU device memory, which is not accessible from the CPUs. | ||||
| Hypre's structured solvers can work fine with device memory, | ||||
| Hypre's structured solvers can run with device memory, | ||||
| whereas only selected unstructured solvers can run with device memory. See | ||||
| Chapter :ref:`ch-boomeramg-gpu` for details. | ||||
| In general, BoomerAMG and the SStruct | ||||
| require  unified (CUDA managed) memory, for which | ||||
| the following option should be added | ||||
| :ref:`ch-boomeramg-gpu` for details. | ||||
| Some solver options for BoomerAMG require unified (CUDA managed) memory. | ||||
| To use these options add the following configure option: | ||||
| 
 | ||||
| .. code-block:: none | ||||
| 
 | ||||
| @ -220,7 +218,7 @@ The other NVIDIA GPU related options include: | ||||
| 
 | ||||
| * ``--enable-gpu-profiling``  Use NVTX on CUDA, rocTX on HIP (default is NO) | ||||
| * ``--enable-cusparse``       Use cuSPARSE for GPU sparse kernels (default is YES) | ||||
| * ``--enable-cublas``         Use cuBLAS for GPU dense kernels (default is NO) | ||||
| * ``--enable-cublas``         Use cuBLAS for GPU dense kernels (default is YES) | ||||
| * ``--enable-curand``         Use random numbers generators on GPUs (default is YES) | ||||
| 
 | ||||
| Allocations and deallocations of GPU memory are expensive. Memory pooling is a common approach to reduce such overhead and improve performance. | ||||
| @ -250,15 +248,44 @@ For running on AMD GPUs, configure with | ||||
|   --with-hip              Use HIP for AMD GPUs. (default is NO) | ||||
|   --with-gpu-arch=ARG     Use appropriate AMD GPU architecture | ||||
| 
 | ||||
| Currently, only BoomerAMG is supported with HIP. The other AMD GPU related options include: | ||||
| The other AMD GPU related options include: | ||||
| 
 | ||||
| * ``--enable-gpu-profiling``  Use NVTX on CUDA, rocTX on HIP (default is NO) | ||||
| * ``--enable-rocsparse``      Use rocSPARSE (default is YES) | ||||
| * ``--enable-rocblas``        Use rocBLAS (default is NO) | ||||
| * ``--enable-rocrand``        Use rocRAND (default is YES) | ||||
| 
 | ||||
| All the options supported by CUDA are also supported with HIP. **Note that the ``--enable-bigint`` option is not supported with CUDA or HIP.** | ||||
| 
 | ||||
| For running on Intel GPUs, configure with | ||||
| 
 | ||||
| .. code-block:: none | ||||
| 
 | ||||
|   --with-sycl             Use SYCL for Intel GPUs. (default is NO). | ||||
|   --with-sycl-target=ARG  User specifies sycl targets for AOT compilation in | ||||
|                           ARG, where ARG is a comma-separated list (enclosed | ||||
|                           in quotes), e.g. "spir64_gen". | ||||
|   --with-sycl-target-backend=ARG | ||||
|                           User specifies additional options for the sycl | ||||
|                           target backend for AOT compilation in ARG, where ARG | ||||
|                           contains the desired options (enclosed in | ||||
|                           double+single quotes), e.g. | ||||
|                           --with-sycl-target-backend="'-device | ||||
|                           12.1.0,12.4.0'". | ||||
| 
 | ||||
| Intel oneMKL functionality is also used by default (and required for certain hypre solvers): | ||||
| 
 | ||||
| .. code-block:: none | ||||
| 
 | ||||
|   --enable-onemklsparse   Use oneMKL sparse (default is YES). | ||||
|   --enable-onemklblas     Use oneMKL blas (default is YES). | ||||
|   --enable-onemklrand     Use oneMKL rand (default is YES). | ||||
| 
 | ||||
| The SYCL backend now supports all GPU-enabled hypre functionality currently supported by CUDA/HIP except for FSAI (work in progress). | ||||
| The ``--enable-bigint`` option is supported with SYCL (not supported for CUDA/HIP). | ||||
| 
 | ||||
| Testing the Library | ||||
| ============================================================================== | ||||
| ------------------------------------------------------------------------------ | ||||
| 
 | ||||
| The ``examples`` subdirectory contains several codes that can be used to test | ||||
| the newly created hypre library.  To create the executable versions, move into | ||||
| @ -266,6 +293,181 @@ the ``examples`` subdirectory, enter ``make`` then execute the codes as | ||||
| described in the initial comments section of each source code. | ||||
| 
 | ||||
| 
 | ||||
| .. _CMake instructions: | ||||
| 
 | ||||
| CMake-based Build Instructions | ||||
| ============================================================================== | ||||
| 
 | ||||
| This section describes hypre's CMake build system, which is particularly useful for building | ||||
| the code on Windows machines. CMake-based installation provides a platform-independent | ||||
| build system. CMake can generate Unix and Linux Makefiles, as well as Visual Studio and | ||||
| (Apple) XCode project files from the same configuration file.  In addition, | ||||
| CMake also provides a GUI front end and which allows an interactive build and | ||||
| installation process. For more detailed information on using CMake, | ||||
| see `CMake's User Interaction Guide <https://cmake.org/cmake/help/latest/guide/user-interaction/index.html>`_. | ||||
| 
 | ||||
| **Note**: Not all options are currently supported when using CMake. This is an | ||||
| on-going effort to support all hypre configure options. | ||||
| 
 | ||||
| Here are the basic steps to configure, make, and install hypre using CMake: | ||||
| 
 | ||||
| #. Ensure that CMake version 3.13.0 or later is installed on the system. | ||||
| #. After unpacking the hypre tar file or cloning, move to the ``src`` sub-directory. | ||||
| #. To build the library, run CMake on the top-level hypre source directory to | ||||
|    generate files appropriate for the native build system.  To prevent writing | ||||
|    over the Makefiles in hypre's configure/make system above, only out-of-source | ||||
|    builds are allowed with CMake, that is, it is required to use a separate build | ||||
|    directory. | ||||
| 
 | ||||
|    The directory ``src/cmbuild`` | ||||
|    is provided in the release for convenience, but | ||||
|    alternative build directories may be created by the user. To configure with | ||||
|    the default options: | ||||
| 
 | ||||
|    - Unix: From the ``src/cmbuild`` directory, type ``cmake ..``. | ||||
| 
 | ||||
|    - Windows Visual Studio: Set the source and build directories to ``src`` and ``src/cmbuild``, | ||||
|      then click on `Configure` following by `Generate`. | ||||
| 
 | ||||
| 
 | ||||
| #. To build the library, compile with the native build system: | ||||
| 
 | ||||
|    - Unix: From the ``src/cmbuild`` directory, type ``make`` or ``make -j 4`` | ||||
|      (for a faster parallel build with 4 threads). | ||||
| 
 | ||||
|    - Windows Visual Studio: Open the 'hypre' VS solution file generated by CMake | ||||
|      and build the `ALL_BUILD` target. | ||||
| 
 | ||||
| #. To install hypre to the installation directory specified in the configuration: | ||||
| 
 | ||||
|    - Unix: From the ``src/cmbuild`` directory, type ``make install``. | ||||
| 
 | ||||
|    - Windows Visual Studio: Open the `hypre` VS solution file generated by CMake | ||||
|      and build the `INSTALL` target. | ||||
| 
 | ||||
|    - *Note*: The default installation location is set to ``src/hypre``. | ||||
|      Use the ``HYPRE_INSTALL_PREFIX`` option to change this location if desired. | ||||
| 
 | ||||
| Changing Default CMake Configuration Options | ||||
| ------------------------------------------------------------------------------ | ||||
| 
 | ||||
| Various configuration options can be set from within CMake (see `CMake options`_). | ||||
| One option is to specify these options in the command-line CMake invocation, | ||||
| e.g., to enabling building of the examples: | ||||
| 
 | ||||
| .. code-block:: none | ||||
| 
 | ||||
|   cmake -DHYPRE_BUILD_EXAMPLES=ON .. | ||||
| 
 | ||||
| Another option is to use the CMake GUI (``ccmake`` or ``cmake-gui``) to change the default options | ||||
| as appropriate, then reconfigure / generate: | ||||
| 
 | ||||
| - Unix: From the ``src/cmbuild`` directory, type ``ccmake ..``. | ||||
| 
 | ||||
|   * Change options to desired settings: | ||||
| 
 | ||||
|     * To set a variable, move the cursor to the variable and press enter. | ||||
|     * If it is a boolean (ON/OFF) it will toggle the value. | ||||
|     * If it is string or file, it will allow editing of the string. | ||||
| 
 | ||||
|   * Then configure (``c`` key). | ||||
|   * Repeat until all values are set as desired and then generate (``g`` key). | ||||
| 
 | ||||
| - Windows Visual Studio: Change options, then click on `Configure` then `Generate`. | ||||
| 
 | ||||
| Then the re-build and re-install with the updated configuration options. | ||||
| 
 | ||||
| .. _CMake options: | ||||
| 
 | ||||
| CMake Configure Options | ||||
| ------------------------------------------------------------------------------ | ||||
| 
 | ||||
| There are many options to allow the user to override and refine | ||||
| the defaults for any system.  The best way to find out what options are available | ||||
| is to use ``cmake``, ``cmake-gui``, or inspect using Windows Visual Studio. | ||||
| 
 | ||||
| 
 | ||||
| Some commonly used options (default value) include: | ||||
| 
 | ||||
| .. code-block:: none | ||||
| 
 | ||||
|  HYPRE_INSTALL_PREFIX (src/hypre) Installation location. | ||||
|  HYPRE_BUILD_EXAMPLES (OFF)       Compile test cases for examples of using the library. | ||||
|  HYPRE_BUILD_TYPE (Release)       Sets compiler flags to generate information. | ||||
|                                   needed for debugging. | ||||
|  HYPRE_ENABLE_SHARED (OFF)        Build shared libraries. | ||||
|  HYPRE_PRINT_ERRORS (OFF)         Print HYPRE errors. | ||||
|  HYPRE_WITH_OPENMP (OFF)          Use OpenMP. | ||||
| 
 | ||||
|  HYPRE_ENABLE_BIGINT (OFF)        Use long long int for HYPRE_Int. | ||||
|  HYPRE_ENABLE_MIXEDINT (OFF)      Use long long int for HYPRE_BigInt and int for | ||||
|                                   HYPRE_Int. | ||||
| 
 | ||||
| GPU CMake Build Options | ||||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||||
| 
 | ||||
| Some of the commonly used options for GPU CMake builds of hypre are listed below. | ||||
| 
 | ||||
| * CUDA support for NVIDIA GPUs relevant options: | ||||
| 
 | ||||
| .. code-block:: none | ||||
| 
 | ||||
|  HYPRE_WITH_CUDA (OFF)            Use CUDA v9.0 or higher. | ||||
|  HYPRE_CUDA_SM (70)               Target CUDA architecture. | ||||
| 
 | ||||
| When configured with CUDA, the memory allocated on the GPUs, by default, is the GPU device memory, which is not accessible from the CPUs. | ||||
| Hypre's structured solvers can run with device memory, | ||||
| whereas only selected unstructured solvers can run with device memory. See | ||||
| :ref:`ch-boomeramg-gpu` for details. | ||||
| Some solver options for BoomerAMG require unified (CUDA managed) memory. | ||||
| To use these options turn the following option on: | ||||
| 
 | ||||
| .. code-block:: none | ||||
| 
 | ||||
|   HYPRE_ENABLE_UNIFIED_MEMORY (OFF)  Use unified memory for allocating the memory. | ||||
| 
 | ||||
| The other NVIDIA GPU related options include: | ||||
| 
 | ||||
| .. code-block:: none | ||||
| 
 | ||||
|  HYPRE_ENABLE_GPU_PROFILING (OFF) Use NVTX. | ||||
|  HYPRE_ENABLE_CUSPARSE (ON)       Use cuSPARSE for GPU sparse kernels. | ||||
|  HYPRE_ENABLE_CUBLAS (OFF)        Use cuBLAS for GPU dense kernels. | ||||
|  HYPRE_ENABLE_CURAND (ON)         Use random numbers generators on GPUs. | ||||
| 
 | ||||
| Allocations and deallocations of GPU memory are expensive. Memory pooling is a common approach to reduce such overhead and improve performance. | ||||
| hypre provides caching allocators for GPU device memory and unified memory, | ||||
| enabled by | ||||
| 
 | ||||
| .. code-block:: none | ||||
| 
 | ||||
|  HYPRE_ENABLE_DEVICE_POOL (OFF)   Enable the caching GPU memory allocator in hypre | ||||
| 
 | ||||
| 
 | ||||
| hypre also supports Umpire [Umpire]_. To enable Umpire pool, include the following options: | ||||
| 
 | ||||
| .. code-block:: none | ||||
| 
 | ||||
|  HYPRE_WITH_UMPIRE (OFF)          Use Umpire Allocator for device and unified memory. | ||||
|  TPL_UMPIRE_LIBRARIES             List of absolute paths to Umpire link libraries. | ||||
|  TPL_UMPIRE_INCLUDE_DIRS          List of absolute paths to Umpire include directories. | ||||
| 
 | ||||
| SYCL support for Intel GPUs relevant options: | ||||
| 
 | ||||
| .. code-block:: none | ||||
| 
 | ||||
|  HYPRE_WITH_SYCL (OFF)            Enable SYCL support. | ||||
|  HYPRE_SYCL_TARGET                Target SYCL architecture, e.g. 'spir64_gen'. | ||||
|  HYPRE_SYCL_TARGET_BACKEND        Additional SYCL backend options, e.g. '-device 12.1.0,12.4.0'. | ||||
| 
 | ||||
| 
 | ||||
| Testing the Library with CMake Build Process | ||||
| ------------------------------------------------------------------------------ | ||||
| 
 | ||||
| The ``examples`` subdirectory contains several codes that can be used to test | ||||
| the newly created hypre library. The CMake option ``HYPRE_BUILD_EXAMPLES`` should | ||||
| be enabled so ensure the executables in the ``examples`` subdirectory are built. | ||||
| 
 | ||||
| Linking to the Library | ||||
| ============================================================================== | ||||
| 
 | ||||
| @ -365,6 +567,12 @@ system, MPI implementation, compiler, and any error messages produced. | ||||
| Using HYPRE in External FEI Implementations | ||||
| ============================================================================== | ||||
| 
 | ||||
| .. warning:: | ||||
|    FEI is not actively supported by the hypre development team. For similar | ||||
|    functionality, we recommend using :ref:`sec-Block-Structured-Grids-FEM`, which | ||||
|    allows the representation of block-structured grid problems via hypre's | ||||
|    SStruct interface. | ||||
| 
 | ||||
| To set up hypre for use in external, e.g. Sandia's, FEI implementations one | ||||
| needs to follow the following steps: | ||||
| 
 | ||||
|  | ||||
| @ -141,6 +141,10 @@ | ||||
|    Approximate Inverse Preconditionings I. Theory. *SIAM J. Matrix Anal. A.*, 14(1):45--58, 1993. | ||||
|    `https://doi.org/10.1137/0614004 <https://doi.org/10.1137/0614004>`_. | ||||
| 
 | ||||
| .. [LiSY2021] R. Li, B. Sjogreen and U. M. Yang. A new class of AMG interpolation | ||||
|    methods based on matrix-matrix multiplications. *SIAM J. Sci. Comput.*, 43(5),  | ||||
|    S540--S564. | ||||
| 
 | ||||
| .. [JaFe2015] C. Janna, M. Ferronato, F. Sartoretto and G. Gambolati. | ||||
|    FSAIPACK: A Software Package for High-Performance Factored Sparse Approximate Inverse | ||||
|    Preconditioning. *ACM T. Math. Software*, 41(2):1–-26, 2015. | ||||
|  | ||||
| @ -10,13 +10,11 @@ BoomerAMG | ||||
| BoomerAMG is a parallel implementation of the algebraic multigrid method | ||||
| [RuSt1987]_.  It can be used both as a solver or as a preconditioner.  The user | ||||
| can choose between various different parallel coarsening techniques, | ||||
| interpolation and relaxation schemes.  While the default settings work fairly | ||||
| well for two-dimensional diffusion problems, for three-dimensional diffusion | ||||
| problems, it is recommended to choose a lower complexity coarsening like HMIS or | ||||
| PMIS (coarsening 10 or 8) and combine it with a distance-two interpolation | ||||
| (interpolation 6 or 7), that is also truncated to 4 or 5 elements per | ||||
| row. Additional reduction in complexity and increased scalability can often be | ||||
| achieved using one or two levels of aggressive coarsening. | ||||
| interpolation and relaxation schemes. The default settings for CPUs, HMIS  | ||||
| (coarsening 8) combined with a distance-two interpolation (6) truncated to 4 | ||||
| or 5 elements per row, should work fairly well for two- and three-dimensional  | ||||
| diffusion problems. Additional reduction in complexity and increased scalability  | ||||
| can often be achieved using one or two levels of aggressive coarsening. | ||||
| 
 | ||||
| 
 | ||||
| Parameter Options | ||||
| @ -42,6 +40,7 @@ techniques can be found in [HeYa2002]_, [Yang2005]_. | ||||
| Various coarsening techniques are available: | ||||
| 
 | ||||
| * the Cleary-Luby-Jones-Plassman (CLJP) coarsening, | ||||
| * parallel versions of the classical RS coarsening described in [HeYa2002]_. | ||||
| * the Falgout coarsening which is a combination of CLJP and the classical RS | ||||
|   coarsening algorithm, | ||||
| * CGC and CGC-E coarsenings [GrMS2006a]_, [GrMS2006b]_, | ||||
| @ -51,14 +50,15 @@ Various coarsening techniques are available: | ||||
|   techniques mentioned above a nd thus achieving much lower complexities and | ||||
|   lower memory use [Stue1999]_. | ||||
| 
 | ||||
| To use aggressive coarsening the user has to set the number of levels to which | ||||
| he wants to apply aggressive coarsening (starting with the finest level) via | ||||
| To use aggressive coarsening users have to set the number of levels to which | ||||
| they want to apply aggressive coarsening (starting with the finest level) via | ||||
| ``HYPRE_BoomerAMGSetAggNumLevels``. Since aggressive coarsening requires long | ||||
| range interpolation, multipass interpolation is always used on levels with | ||||
| aggressive coarsening, unless the user specifies another long-range | ||||
| interpolation suitable for aggressive coarsening. | ||||
| interpolation suitable for aggressive coarsening via  | ||||
| ``HYPRE_BoomerAMGSetAggInterpType``.. | ||||
| 
 | ||||
| Note that the default coarsening is HMIS [DeYH2004]_. | ||||
| Note that the default coarsening for CPUs is HMIS, for GPUs PMIS [DeYH2004]_. | ||||
| 
 | ||||
| 
 | ||||
| Interpolation Options | ||||
| @ -66,18 +66,19 @@ Interpolation Options | ||||
| 
 | ||||
| Various interpolation techniques can be set using ``HYPRE_BoomerAMGSetInterpType``: | ||||
| 
 | ||||
| * the "classical" interpolation as defined in [RuSt1987]_, | ||||
| * direct interpolation [Stue1999]_, | ||||
| * standard interpolation [Stue1999]_, | ||||
| * the "classical" interpolation (0) as defined in [RuSt1987]_, | ||||
| * direct interpolation (3) [Stue1999]_, | ||||
| * standard interpolation (8) [Stue1999]_, | ||||
| * an extended "classical" interpolation, which is a long range interpolation and | ||||
|   is recommended to be used with PMIS and HMIS coarsening for harder problems | ||||
|   [DFNY2008]_, | ||||
| * multipass interpolation [Stue1999]_, | ||||
|   (6) [DFNY2008]_, | ||||
| * distance-two interpolation based on matrix operations (17) [LiSY2021]_, | ||||
| * multipass interpolation (4) [Stue1999]_, | ||||
| * two-stage interpolation [Yang2010]_, | ||||
| * Jacobi interpolation [Stue1999]_, | ||||
| * the "classical" interpolation modified for hyperbolic PDEs. | ||||
| * the "classical" interpolation modified for hyperbolic PDEs (2). | ||||
| 
 | ||||
| Jacobi interpolation is only use to improve certain interpolation operators and | ||||
| Jacobi interpolation is only used to improve certain interpolation operators and | ||||
| can be used with ``HYPRE_BoomerAMGSetPostInterpType``.  Since some of the | ||||
| interpolation operators might generate large stencils, it is often possible and | ||||
| recommended to control complexity and truncate the interpolation operators using | ||||
| @ -85,7 +86,8 @@ recommended to control complexity and truncate the interpolation operators using | ||||
| ``HYPRE_BoomerAMGSetJacobiTruncTheshold`` (for Jacobi interpolation only). | ||||
| 
 | ||||
| Note that the default interpolation is extended+i interpolation [DFNY2008]_ | ||||
| truncated to 4 elements per row. | ||||
| truncated to 4 elements per row, for CPUs, and a version of this interpolation | ||||
| based on matrix operations for GPUs [LiSY2021]_. | ||||
| 
 | ||||
| 
 | ||||
| Non-Galerkin Options | ||||
| @ -112,11 +114,12 @@ Smoother Options | ||||
| A good overview of parallel smoothers and their properties can be found in | ||||
| [BFKY2011]_. Various of the described relaxation techniques are available: | ||||
| 
 | ||||
| * weighted Jacobi relaxation, | ||||
| * a hybrid Gauss-Seidel / Jacobi relaxation scheme, | ||||
| * a symmetric hybrid Gauss-Seidel / Jacobi relaxation scheme, | ||||
| * l1-Gauss-Seidel or Jacobi, | ||||
| * Chebyshev smoothers, | ||||
| * weighted Jacobi relaxation (0), | ||||
| * a hybrid Gauss-Seidel / Jacobi relaxation scheme (3 4), | ||||
| * a symmetric hybrid Gauss-Seidel / Jacobi relaxation scheme (6), | ||||
| * l1-Gauss-Seidel or Jacobi (13 14 18 8), | ||||
| * Chebyshev smoothers (16), | ||||
| * two-stage Gauss-Seidel smoothers (11 12) [BKRHSMTY2021]_, | ||||
| * hybrid block and Schwarz smoothers [Yang2004]_, | ||||
| * Incomplete LU factorization, see :ref:`ilu-amg-smoother`. | ||||
| * Factorized Sparse Approximate Inverse (FSAI), see :ref:`fsai-amg-smoother`. | ||||
| @ -144,6 +147,19 @@ used. Functions that enable the user to access the systems AMG version are | ||||
| ``HYPRE_BoomerAMGSetNumFunctions``, ``HYPRE_BoomerAMGSetDofFunc`` and | ||||
| ``HYPRE_BoomerAMGSetNodal``. | ||||
| 
 | ||||
| There are basically two approaches to deal with matrices derived from systems | ||||
| of PDEs. The unknown-based approach (which is the default) treats variables  | ||||
| corresponding to the same unknown or function separately, i.e., when coarsening  | ||||
| or generating interpolation, connections between variables associated with  | ||||
| different unknowns are ignored. This can work well for weakly coupled PDEs,  | ||||
| but will be problematic for strongly coupled PDEs. For such problems, we recommend  | ||||
| to use hypre's multigrid reduction (MGR) solver. The second approach, called  | ||||
| the nodal approach, considers all unknowns at a physical grid point together  | ||||
| such that coarsening, interpolation and relaxation occur in a point-wise fashion.  | ||||
| It is possible and sometimes prefered to combine nodal coarsening with unknown-based  | ||||
| interpolation. For this case, ``HYPRE_BoomerAMGSetNodal`` should be set > 1.  | ||||
| For details see the reference manual. | ||||
| 
 | ||||
| If the user can provide the near null-space vectors, such as the rigid body | ||||
| modes for linear elasticity problems, an interpolation is available that will | ||||
| incorporate these vectors with ``HYPRE_BoomerAMGSetInterpVectors`` and | ||||
| @ -178,8 +194,25 @@ The currently available  GPU-supported BoomerAMG options include: | ||||
| * Interpolation:  direct (3), BAMG-direct (15), extended (14), extended+i (6) and extended+e (18) | ||||
| * Aggressive coarsening | ||||
| * Second-stage interpolation with aggressive coarsening: extended (5) and extended+e (7) | ||||
| * Smoother: Jacobi (7), l1-Jacobi (18), hybrid Gauss Seidel/SRROR (3 4 6), two-stage Gauss-Seidel (11,12) [BKRHSMTY2021]_ | ||||
| * Relaxation order: must be 0, i.e., lexicographic order | ||||
| * Smoother: Jacobi (7), l1-Jacobi (18), hybrid Gauss Seidel/SSOR (3 4 6), two-stage Gauss-Seidel (11,12) [BKRHSMTY2021]_,  and Chebyshev (16) | ||||
| * Relaxation order can be 0, lexicographic order, or C/F for (7) and (18) | ||||
| 
 | ||||
| Memory locations and execution policies | ||||
| ------------------------------------------------------------------------------ | ||||
| Hypre provides two user-level memory locations, ``HYPRE_MEMORY_HOST`` and ``HYPRE_MEMORY_DEVICE``, where | ||||
| ``HYPRE_MEMORY_HOST`` is always the CPU memory while ``HYPRE_MEMORY_DEVICE`` can be mapped to different memory spaces  | ||||
| based on the configure options of hypre. | ||||
| When built with ``--with-cuda``, ``--with-hip``, ``--with-sycl``, or ``--with-device-openmp``, | ||||
| ``HYPRE_MEMORY_DEVICE`` is the GPU device memory, | ||||
| and when built additionally with ``--enable-unified-memory``, it is the GPU unified memory (UM). | ||||
| For a non-GPU build, ``HYPRE_MEMORY_DEVICE`` is also mapped to the CPU memory. | ||||
| The default memory location of hypre's matrix and vector objects is ``HYPRE_MEMORY_DEVICE``, | ||||
| which can be changed at runtime by ``HYPRE_SetMemoryLocation(...)``. | ||||
| 
 | ||||
| The execution policies define the platform of running computations based on the memory locations of participating objects. | ||||
| The default policy is ``HYPRE_EXEC_HOST``, i.e., executing on the host **if the objects are accessible from the host**. | ||||
| It can be adjusted by ``HYPRE_SetExecutionPolicy(...)``. | ||||
| Clearly, this policy only affects objects in UM, since UM is accessible from **both CPUs and GPUs**. | ||||
| 
 | ||||
| A sample code of setting up IJ matrix :math:`A` and solve :math:`Ax=b` using AMG-preconditioned CG | ||||
| on GPUs is shown below. | ||||
| @ -249,8 +282,6 @@ For best performance, it might be necessary to set certain parameters, which | ||||
| will affect both coarsening and interpolation.  One important parameter is the | ||||
| strong threshold, which can be set using the function | ||||
| ``HYPRE_BoomerAMGSetStrongThreshold``.  The default value is 0.25, which appears | ||||
| to be a good choice for 2-dimensional problems and the low complexity coarsening | ||||
| algorithms.  For 3-dimensional problems a better choice appears to be 0.5, when | ||||
| using the default coarsening algorithm. However, the choice of the strength | ||||
| threshold is problem dependent and therefore there could be better choices than | ||||
| the two suggested ones. | ||||
| to be a good choice for diffusion problems.  The choice of the strength | ||||
| threshold is problem dependent. For example, elasticity problems often require a larger | ||||
| strength threshold. | ||||
|  | ||||
| @ -9,6 +9,12 @@ | ||||
| FEI Solvers | ||||
| ============================================================================== | ||||
| 
 | ||||
| .. warning:: | ||||
|    FEI is not actively supported by the hypre development team. For similar | ||||
|    functionality, we recommend using :ref:`sec-Block-Structured-Grids-FEM`, which | ||||
|    allows the representation of block-structured grid problems via hypre's | ||||
|    SStruct interface. | ||||
| 
 | ||||
| After the FEI has been used to assemble the global linear system (as described | ||||
| in Chapter :ref:`ch-FEI`), a number of hypre solvers can be called to perform | ||||
| the solution.  This is straightforward, if hypre's FEI has been used.  If an | ||||
|  | ||||
| @ -3,6 +3,7 @@ | ||||
| 
 | ||||
|    SPDX-License-Identifier: (Apache-2.0 OR MIT) | ||||
| 
 | ||||
| .. _fsai: | ||||
| 
 | ||||
| FSAI | ||||
| ============================================================================== | ||||
|  | ||||
| @ -7,6 +7,11 @@ | ||||
| ParaSails | ||||
| ============================================================================== | ||||
| 
 | ||||
| .. warning:: | ||||
|    ParaSails is not actively supported by the hypre development team. We recommend using | ||||
|    :ref:`fsai` for parallel sparse approximate inverse algorithms. This new implementation | ||||
|    includes NVIDIA/AMD GPU support through the CUDA/HIP backends. | ||||
| 
 | ||||
| ParaSails is a parallel implementation of a sparse approximate inverse | ||||
| preconditioner, using *a priori* sparsity patterns and least-squares (Frobenius | ||||
| norm) minimization.  Symmetric positive definite (SPD) problems are handled | ||||
| @ -119,4 +124,3 @@ latter may be guaranteed by 1) constructing the sparsity pattern with a | ||||
| symmetric matrix, or 2) if the matrix is structurally symmetric (has symmetric | ||||
| pattern), then thresholding to construct the pattern is not used (i.e., zero | ||||
| value of the ``thresh`` parameter is used). | ||||
| 
 | ||||
|  | ||||
| @ -1280,7 +1280,7 @@ HYPRE_Int HYPRE_BoomerAMGSetPrintLevel(HYPRE_Solver solver, | ||||
| 
 | ||||
| /**
 | ||||
|  * (Optional) Requests additional computations for diagnostic and similar | ||||
|  * data to be logged by the user. Default to 0 for do nothing.  The latest | ||||
|  * data to be logged by the user. Default to 0 to do nothing.  The latest | ||||
|  * residual will be available if logging > 1. | ||||
|  **/ | ||||
| HYPRE_Int HYPRE_BoomerAMGSetLogging(HYPRE_Solver solver, | ||||
| @ -4059,7 +4059,7 @@ HYPRE_MGRSetReservedCpointsLevelToKeep( HYPRE_Solver solver, HYPRE_Int level); | ||||
|  * Currently supports the following flavors of relaxation types | ||||
|  * as described in the \e BoomerAMGSetRelaxType: | ||||
|  * \e relax_type 0, 3 - 8, 13, 14, 18. Also supports AMG (options 1 and 2) | ||||
|  *    and direct solver variants (9, 99, 199). See HYPRE_MGRSetLevelFRelaxType for details. | ||||
|  *    and direct solver variants (9, 99, 199). See \e HYPRE_MGRSetLevelFRelaxType for details. | ||||
|  **/ | ||||
| HYPRE_Int | ||||
| HYPRE_MGRSetRelaxType(HYPRE_Solver solver, | ||||
| @ -4072,7 +4072,7 @@ HYPRE_MGRSetRelaxType(HYPRE_Solver solver, | ||||
|  *    - 0 : Single-level relaxation sweeps for F-relaxation as prescribed by \e MGRSetRelaxType | ||||
|  *    - 1 : Multi-level relaxation strategy for F-relaxation (V(1,0) cycle currently supported). | ||||
|  * | ||||
|  *    NOTE: This function will be removed in favor of /e HYPRE_MGRSetLevelFRelaxType!! | ||||
|  *    NOTE: This function will be removed in favor of \e HYPRE_MGRSetLevelFRelaxType!! | ||||
|  **/ | ||||
| HYPRE_Int | ||||
| HYPRE_MGRSetFRelaxMethod(HYPRE_Solver solver, | ||||
| @ -4148,7 +4148,7 @@ HYPRE_MGRSetRestrictType( HYPRE_Solver solver, | ||||
|                           HYPRE_Int restrict_type); | ||||
| 
 | ||||
| /**
 | ||||
|  * (Optional) This function is an extension of HYPRE_MGRSetRestrictType. It allows setting | ||||
|  * (Optional) This function is an extension of \e HYPRE_MGRSetRestrictType. It allows setting | ||||
|  * the restriction operator strategy for each MGR level. | ||||
|  **/ | ||||
| HYPRE_Int | ||||
| @ -4182,7 +4182,7 @@ HYPRE_MGRSetInterpType( HYPRE_Solver solver, | ||||
|                         HYPRE_Int interp_type ); | ||||
| 
 | ||||
| /**
 | ||||
|  * (Optional) This function is an extension of HYPRE_MGRSetInterpType. It allows setting | ||||
|  * (Optional) This function is an extension of \e HYPRE_MGRSetInterpType. It allows setting | ||||
|  * the prolongation (interpolation) operator strategy for each MGR level. | ||||
|  **/ | ||||
| HYPRE_Int | ||||
| @ -4198,7 +4198,7 @@ HYPRE_MGRSetNumRelaxSweeps( HYPRE_Solver solver, | ||||
|                             HYPRE_Int nsweeps ); | ||||
| 
 | ||||
| /**
 | ||||
|  * (Optional) This function is an extension of HYPRE_MGRSetNumRelaxSweeps. It allows setting | ||||
|  * (Optional) This function is an extension of \e HYPRE_MGRSetNumRelaxSweeps. It allows setting | ||||
|  * the number of single-level relaxation sweeps for each MGR level. | ||||
|  **/ | ||||
| HYPRE_Int | ||||
| @ -4287,10 +4287,8 @@ HYPRE_MGRSetCoarseGridPrintLevel( HYPRE_Solver solver, | ||||
|                                   HYPRE_Int print_level ); | ||||
| 
 | ||||
| /**
 | ||||
|  * (Optional) Set the threshold to compress the coarse grid at each level | ||||
|  * Use threshold = 0.0 if no truncation is applied. Otherwise, set the threshold | ||||
|  * value for dropping entries for the coarse grid. | ||||
|  * The default is 0.0. | ||||
|  * (Optional) Set the threshold for dropping small entries on the coarse grid at each level. | ||||
|  * No dropping is applied if \e threshold = 0.0 (default).  | ||||
|  **/ | ||||
| HYPRE_Int | ||||
| HYPRE_MGRSetTruncateCoarseGridThreshold( HYPRE_Solver solver, | ||||
| @ -4299,7 +4297,7 @@ HYPRE_MGRSetTruncateCoarseGridThreshold( HYPRE_Solver solver, | ||||
| /**
 | ||||
|  * (Optional) Requests logging of solver diagnostics. | ||||
|  * Requests additional computations for diagnostic and similar | ||||
|  * data to be logged by the user. Default to 0 for do nothing.  The latest | ||||
|  * data to be logged by the user. Default is 0, do nothing.  The latest | ||||
|  * residual will be available if logging > 1. | ||||
|  **/ | ||||
| HYPRE_Int | ||||
| @ -4338,8 +4336,8 @@ HYPRE_Int | ||||
| HYPRE_MGRSetLevelSmoothIters( HYPRE_Solver solver, | ||||
|                               HYPRE_Int *smooth_iters ); | ||||
| /**
 | ||||
|  * (Optional) Set the smoothing order for global smoothing at each level. | ||||
|  * Options for \e level_smooth_order are: | ||||
|  * (Optional) Set the cycle for global smoothing. | ||||
|  * Options for \e global_smooth_cycle are: | ||||
|  *    - 1 : Pre-smoothing - Down cycle (default) | ||||
|  *    - 2 : Post-smoothing - Up cycle | ||||
|  **/ | ||||
|  | ||||
| @ -533,9 +533,9 @@ HYPRE_MGRSetLevelSmoothIters( HYPRE_Solver solver, | ||||
|  * HYPRE_MGRSetGlobalsmoothType | ||||
|  *--------------------------------------------------------------------------*/ | ||||
| HYPRE_Int | ||||
| HYPRE_MGRSetGlobalSmoothType( HYPRE_Solver solver, HYPRE_Int iter_type ) | ||||
| HYPRE_MGRSetGlobalSmoothType( HYPRE_Solver solver, HYPRE_Int smooth_type ) | ||||
| { | ||||
|    return hypre_MGRSetGlobalSmoothType(solver, iter_type); | ||||
|    return hypre_MGRSetGlobalSmoothType(solver, smooth_type); | ||||
| } | ||||
| /*--------------------------------------------------------------------------
 | ||||
|  * HYPRE_MGRSetLevelsmoothType | ||||
|  | ||||
		Loading…
	
		Reference in New Issue
	
	Block a user
	 Wayne Mitchell
						Wayne Mitchell