* Add new device functions needed by multivectors (`hypreDevice_IntStridedCopy` and `hypreDevice_IVAMXPMY`)
* Extend `hypre_SeqVectorElmdivpy` to work with multivectors.
This PR includes optimizations for hypre's SpGEMM and ParSpGEMM kernels
Co-authored-by: Wayne Mitchell <mitchell82@llnl.gov>
Co-authored-by: Paul T. Bauman <ptbauman@gmail.com>
Co-authored-by: Sarah Osborn <30503782+osborn9@users.noreply.github.com>
Add -auxfromfile option for reading an auxiliary matrix from file, which is then used to build the preconditioner. This is useful, for example, for the case when a filtered version of A is used to build the preconditioner.
Enable GPU setup for MGR solver.
* Added device specific functionality for interpolation
* Made device and host calls to interpolation consistent
* Edited IJ driver to use GPU capable options for MGR
* Updated saved files for new GPU options
* Updated CMakeLists to support new MGR capabilities
Co-authored-by: Ruipeng Li <li50@llnl.gov>
Co-authored-by: Daniel Osei-Kuffuor <oseikuffuor1@llnl.gov>
This adds matvec, matrix transpose, and vector operations (axpy, inner product, etc.)
with sycl backend (via oneMKL and oneDPL) for running on Intel GPUs. Thus, the AMG
solve phase can now execute entirely on Intel GPUs.
This PR adds automatic indentation using Artistic Style (astyle). The script config/astyle-apply.sh runs the indentation using the configuration file config/astylerc. The script also runs headers in all of the directories that automatically generate internal _hypre_*.h header files. Much of this was borrowed from the MFEM project. A pre-commit git hook was also added.
This PR improves the performance of hypre's sparse matrix-matrix on NVIDIA GPUs, and fixes it on AMD GPUs with hip.
Co-authored-by: Ruipeng Li <coe0141@redwood.cm.cluster>
Co-authored-by: Paul T. Bauman <ptbauman@gmail.com>
This PR (by @pbauman #430) is a hook to be able to call rocsparse_dcsrmv_analysis when using rocSPARSE on AMD GPUs.
Co-authored-by: Paul T. Bauman <ptbauman@gmail.com>
This PR ports the Neumann version of AIR to the GPU. New features include:
1. Construction of Neumann AIR restriction operator R on the GPU
2. Construction of one-point interpolation on the GPU
3. Construction of an absolute value version of the strength of connection matrix on the GPU
4. CF relaxation for Jacobi (relax7) and L1 Jacobi (relax18) on the GPU - note that this does redundant computation since a full matvec is called when only relaxing either C- or F-points
5. Regression tests for AIR
6. Filtering for ParCSR matrices based on tol*row_norm for 1-, 2-, and infinity-norm on the GPU
This PR reimplements hypre_ParCSRMaxEigEstimate using Gershgorin discs, which ensures that max_eig and min_eig are both allreduced across all ranks so that the return value of the function is the same for all ranks.
Co-authored-by: Ruipeng Li <li50@llnl.gov>
Co-authored-by: li50@llnl.gov <liruipengblue@gmail.com>
This PR improves the IJ interface with new functions, performs code reorganization, and simplifies coding by removing ownership info related to the partitioning data members from ParCSRMatrix and ParVector objects. A more comprehensive list of changes is given below:
* Add HYPRE_IJMatrixAdd, HYPRE_IJMatrixNorm and HYPRE_IJMatrixTranspose functions
* Add ParCSRMatrixInfNorm and ParCSRMatrixReorder functions
* Add transpose, add and norm functions to IJMatrix
* Add more caliper annotation to BoomerAMG and ParCSR functions
* Fix typo in assumed partition function and add caliper annotation
* The output matrix from ParTMatmul owns row/col starts.
* Build communication package for A at ParTMatmul if it does not exist.
* Move hypre_Log2 to utilities
* Add HYPRE_ANNOTATE_REGION_[BEGIN,END] to caliper annotation
* Phase out [row,col]_starts ownership info in ParCSR matrices
* Remove partitioning ownership info from vector
* Move partitioning variables to stack memory
This PR adds GPU support for ams, ame and ads, and the following parcsr operations on GPUs, ParCSRAdd, ParCSRTranspose, l1 hybrid G-S/SSOR.
Co-authored-by: Rob Falgout <rfalgout@llnl.gov>
This removes the masked matvec routine previously used for CF L1 Jacobi relaxation in the AMG-DD solver. There was a bug present in the GPU code and the bsrxmv cusparse routine no longer supports our use-case as of cuda 11. In addition, appropriate regression test results were saved for the GPU implementation of AMG-DD.
This PR changes AMG defaults regarding GPUs at various places, adds regression tests on GPUs, simplifies CUDA boxloop implementations.
Co-authored-by: Sarah Virginia Osborn <osborn9@llnl.gov>
Co-authored-by: PaulMullowney <pmullown@nrel.gov>
Co-authored-by: Daniel Osei-Kuffuor <oseikuffuor1@llnl.gov>
Co-authored-by: Ruipeng Li <li50@euler.llnl.gov>
Co-authored-by: Ruipeng Li <coe0141@redwood.cm.cluster>
Add OpenMP support to CSRMatrixAddHost and ParCSRMatrixAdd functions. Minor changes are:
- Changed name ParcsrAdd to ParCSRMatrixAdd
- Add hypre_CSRMatrixAddFirstPass and hypre_CSRMatrixAddSecondPass to reduce code duplication
- Update rownnz support in CSRMatrixAddHost, ParCSRMatrixAdd and hypre_ILUParCSRInverseNSH.
- Refactor SpMV branches
This PR (@pbauman #329) addresses #309, which allows each `hypre_csrmatrix` has a GPU matrix descriptor.
Co-authored-by: Paul T. Bauman <ptbauman@gmail.com>
This PR contains the support of UMPIRE pooling allocators for host and GPU memory. Configure hypre with --with-umpire, device and uvm allocations and deallocations are done with umpire, whereas host pool is not enabled by default. This PR also includes some other minor changes:
Adding .gitignore to the repo
Removing all malloc/calloc/realloc/free and regression testing on finding them
No longer compile ij.c with C++ compiler. It goes back to a C code now.
Introducing HYPRE_USING_GPU, which is equivalent to HYPRE_USING_CUDA || HYPRE_USING_DEVICE_OPENMP
Adding a few user-level interfaces: HYPRE_SetMemoryLocation, HYPRE_SetExecutionPolicy, HYPRE_SetGPUMemoryPoolSize and HYPRE_CSRMatrixSetSpGemmUseCusparse
Co-authored-by: li50@llnl.gov <liruipengblue@gmail.com>
Co-authored-by: Rob Falgout <rfalgout@llnl.gov>
Co-authored-by: Ruipeng Li <li50@llnl.gov>
* This commit has CUDA based smoothers for AMG based on the triangular parts of sparse matrices. This includes an Gauss-Seidel (relax_type==3), which uses CUSPARSE triangular solvers to invert L. Symmetric Gauss Seidel is implemented in relax_type==6 also via CUSPARSE. Finally, 2 new smoothers are added. THe first is a 2 stage approximation to Gauss Seidel using a parallel MatVec and L (relax_type==11). The second (relax_type==12) is a less effective version of 11. It uses A_diag instead of L for the smoothing. CPU implementations of these new smoothers are also provided. For the two stage algorithms, L and U are NOT explicitly created. This seems faster and saves memory. In the two stage preconditioner, multiply by invdiag rather than divide by diagonal reduces register pressure and yields full occupancy.
Co-authored-by: Paul Mullowney <pmullown@nrel.gov>
Co-authored-by: PaulMullowney <60452402+PaulMullowney@users.noreply.github.com>
This merge introduces new features to the parallel ILU solvers in hypre. In particular we have GPU support for BJ-ILU(0) and GMRES-ILU(0). In addition, this merge includes a new option for GMRES-ILU(0) using MILU(0) to build restriction/interpolation operators used to construct the Schur complement matrix by a Galerkin product. This option is also available on the GPU. Key commits include:
* ILU updates with bug fixes for compiling the cuda version
* Update local RCM ordering option to support nonsymmetric matrices
* Update regression tests to test new features
* Reference manual updates, Code cleanup and bug fixes
Co-authored-by: Tianshi Xu <xu16@ray59.coralea.llnl.gov>
Co-authored-by: Tianshi Xu <xu16@lassen708.coral.llnl.gov>
Co-authored-by: Xu <xu16@bellsofireland.llnl.gov>
Co-authored-by: Kote Hitenze <hitenze@jotenshis-MacBook-Pro.local>
Co-authored-by: Tianshi Xu <xuxx1180@umn.edu>
Co-authored-by: Ruipeng Li <li50@llnl.gov>
This includes the implementation of the AMG-DD algorithm, a variant of BoomerAMG designed to limit communication.
AMG-DD may be used as a standalone solver or a preconditioner for Krylov methods (note that AMG-DD is a non-symmetric preconditioner). For an example of how to set up and use AMG-DD, see the IJ driver (src/test/ij.c).
A list with the parameters of AMG-DD is given below:
Padding (recommended default 1): HYPRE_BoomerAMGDDSetPadding(...)
Number of ghost layers (recommended default 1): HYPRE_BoomerAMGDDSetNumGhostLayers(...)
Number of inner FAC cycles per AMG-DD iteration (default 2): HYPRE_BoomerAMGDDSetFACNumCycles(...)
FAC cycle type: HYPRE_BoomerAMGDDSetFACCycleType(...)
1 = V-cycle (default)
2 = W-cycle
3 = F-cycle
Number of relaxations on each level during FAC cycle: HYPRE_BoomerAMGDDSetFACNumRelax(...)
Type of local relaxation during FAC cycle: HYPRE_BoomerAMGDDSetFACRelaxType(...)
0 = Jacobi
1 = Gauss-Seidel
2 = ordered Gauss-Seidel
3 = C/F L1-scaled Jacobi (default)
For more details of the algorithm, see Mitchell W.B., R. Strzodka, and R.D. Falgout (2020), Parallel Performance of Algebraic Multigrid Domain Decomposition (AMG-DD).