Further unification of GPU implementation across cuda/hip/sycl.
Implements the parallel matrix matrix product in sycl.
HYPRE_CUDA_LAUNCH and HYPRE_SYCL_LAUNCH macros have
been unified under HYPRE_GPU_LAUNCH for kernel launches.
Replace HYPRE_SetSpGemmUseCusparse with HYPRE_SetSpGemmUseVendor.
This PR extends the (semi)-struct matrix/vector IO functions added on #583 with GPU support. Additionally:
* Fix regression tests on Lassen.
* Read data values into host memory
* Update Umatrix read algorithm when the ParCSRMatrix is expected to live on the device
* Reset deallocated pointers at hypre_IJMatrixDestroyParCSR to NULL
* Clone rownnz info if present on a CSRMatrix
* Reduce memory transfer and remove unused variables
* Fix bug with -print option
* Build rownnz info also when the ParCSRMatrix is in assembled state
* Remove a few instances of "return ierr"
* Refactor (s)struct IO - code works with cuda and without UM
* Add executables to gitignore
This adds matvec, matrix transpose, and vector operations (axpy, inner product, etc.)
with sycl backend (via oneMKL and oneDPL) for running on Intel GPUs. Thus, the AMG
solve phase can now execute entirely on Intel GPUs.
This PR adds automatic indentation using Artistic Style (astyle). The script config/astyle-apply.sh runs the indentation using the configuration file config/astylerc. The script also runs headers in all of the directories that automatically generate internal _hypre_*.h header files. Much of this was borrowed from the MFEM project. A pre-commit git hook was also added.
This PR changes AMG defaults regarding GPUs at various places, adds regression tests on GPUs, simplifies CUDA boxloop implementations.
Co-authored-by: Sarah Virginia Osborn <osborn9@llnl.gov>
Co-authored-by: PaulMullowney <pmullown@nrel.gov>
Co-authored-by: Daniel Osei-Kuffuor <oseikuffuor1@llnl.gov>
Co-authored-by: Ruipeng Li <li50@euler.llnl.gov>
Co-authored-by: Ruipeng Li <coe0141@redwood.cm.cluster>
The main objective of this PR is to improve the support of matrices with a large number of zero rows in hypre. More specifically:
* Improve IJMatrixAssemble for matrices with a large number of zero rows.
* Add hypre_AuxParCSRMatrixSetRownnz to build array of nonzero rows in the auxiliary matrix. This saves allocation time for building ParCSRMatrices.
* Improve (Par)CSRMatrix transpose, addition and multiplication operations for matrices with a large number of zero rows.
Secondary changes made in this PR are:
* Update SpMV paths in `csr_matvec` in order to make the calculation of A*x more concise.
* Extend OpenMP support to hypre_CSRMatrixSumElts, hypre_CSRMatrixFnorm and hypre_CSRMatrixReorder.
* Clean gcc-9 warnings
* Update saved files and delete unused variable
Co-authored-by: Ruipeng Li <li50@llnl.gov>
This PR (@pbauman #329) addresses #309, which allows each `hypre_csrmatrix` has a GPU matrix descriptor.
Co-authored-by: Paul T. Bauman <ptbauman@gmail.com>
* This commit has CUDA based smoothers for AMG based on the triangular parts of sparse matrices. This includes an Gauss-Seidel (relax_type==3), which uses CUSPARSE triangular solvers to invert L. Symmetric Gauss Seidel is implemented in relax_type==6 also via CUSPARSE. Finally, 2 new smoothers are added. THe first is a 2 stage approximation to Gauss Seidel using a parallel MatVec and L (relax_type==11). The second (relax_type==12) is a less effective version of 11. It uses A_diag instead of L for the smoothing. CPU implementations of these new smoothers are also provided. For the two stage algorithms, L and U are NOT explicitly created. This seems faster and saves memory. In the two stage preconditioner, multiply by invdiag rather than divide by diagonal reduces register pressure and yields full occupancy.
Co-authored-by: Paul Mullowney <pmullown@nrel.gov>
Co-authored-by: PaulMullowney <60452402+PaulMullowney@users.noreply.github.com>
This includes the implementation of the AMG-DD algorithm, a variant of BoomerAMG designed to limit communication.
AMG-DD may be used as a standalone solver or a preconditioner for Krylov methods (note that AMG-DD is a non-symmetric preconditioner). For an example of how to set up and use AMG-DD, see the IJ driver (src/test/ij.c).
A list with the parameters of AMG-DD is given below:
Padding (recommended default 1): HYPRE_BoomerAMGDDSetPadding(...)
Number of ghost layers (recommended default 1): HYPRE_BoomerAMGDDSetNumGhostLayers(...)
Number of inner FAC cycles per AMG-DD iteration (default 2): HYPRE_BoomerAMGDDSetFACNumCycles(...)
FAC cycle type: HYPRE_BoomerAMGDDSetFACCycleType(...)
1 = V-cycle (default)
2 = W-cycle
3 = F-cycle
Number of relaxations on each level during FAC cycle: HYPRE_BoomerAMGDDSetFACNumRelax(...)
Type of local relaxation during FAC cycle: HYPRE_BoomerAMGDDSetFACRelaxType(...)
0 = Jacobi
1 = Gauss-Seidel
2 = ordered Gauss-Seidel
3 = C/F L1-scaled Jacobi (default)
For more details of the algorithm, see Mitchell W.B., R. Strzodka, and R.D. Falgout (2020), Parallel Performance of Algebraic Multigrid Domain Decomposition (AMG-DD).