Commit Graph

32 Commits

Author SHA1 Message Date
Victor A. Paludetto Magri
662e886881
[Multivec 2/5]: Extend multivector support (#693)
* Add new device functions needed by multivectors (`hypreDevice_IntStridedCopy` and `hypreDevice_IVAMXPMY`)
* Extend `hypre_SeqVectorElmdivpy` to work with multivectors.
2022-07-29 15:37:24 -07:00
Ruipeng Li
e270c561b0
Spgemm (#639)
This PR includes optimizations for hypre's SpGEMM and ParSpGEMM kernels

Co-authored-by: Wayne Mitchell <mitchell82@llnl.gov>
Co-authored-by: Paul T. Bauman <ptbauman@gmail.com>
Co-authored-by: Sarah Osborn <30503782+osborn9@users.noreply.github.com>
2022-06-24 10:42:16 -07:00
Victor A. Paludetto Magri
edb91b4a50
Add -auxfromfile option to IJ driver (#633)
Add -auxfromfile option for reading an auxiliary matrix from file, which is then used to build the preconditioner. This is useful, for example, for the case when a filtered version of A is used to build the preconditioner.
2022-05-26 21:23:31 -04:00
Ruipeng Li
ef3f890d4b
Nvcollab (#591)
This PR contains various GPU optimizations in the collaboration with the NVIDIA team. 

Co-authored-by: Peng Wang <penwang@nvidia.com>
2022-05-24 13:27:32 -07:00
Victor A. Paludetto Magri
e16167fe46
Fix copyright (#615)
This PR updates Copyright headers from "Copyright 1998-2019 ..." to "Copyright (c) 1998 ..."
2022-04-05 16:19:51 -07:00
Quan Bui
734a10fcb7
Mgr setup gpu (#400)
Enable GPU setup for MGR solver.
* Added device specific functionality for interpolation
* Made device and host calls to interpolation consistent
* Edited IJ driver to use GPU capable options for MGR
* Updated saved files for new GPU options
* Updated CMakeLists to support new MGR capabilities

Co-authored-by: Ruipeng Li <li50@llnl.gov>
Co-authored-by: Daniel Osei-Kuffuor <oseikuffuor1@llnl.gov>
2022-02-07 15:54:52 -08:00
Wayne Mitchell
a7bb784a45
SYCL support for AMG solve phase (#549)
This adds matvec, matrix transpose, and vector operations (axpy, inner product, etc.)
with sycl backend (via oneMKL and oneDPL) for running on Intel GPUs. Thus, the AMG
solve phase can now execute entirely on Intel GPUs.
2022-01-31 16:15:30 -08:00
Wayne Mitchell
4232108a4d
Add SYCL support (#431)
This sets up basic infrastructure (e.g. memory management, device setup, etc.)
and implements the boxloops and structure solvers in sycl.
2021-11-22 16:54:22 -08:00
Rob Falgout
805ee77be8
Adding source file indentation with astyle (#498)
This PR adds automatic indentation using Artistic Style (astyle).  The script config/astyle-apply.sh runs the indentation using the configuration file config/astylerc.  The script also runs headers in all of the directories that automatically generate internal _hypre_*.h header files.  Much of this was borrowed from the MFEM project.  A pre-commit git hook was also added.
2021-11-08 19:26:59 -08:00
Ruipeng Li
7f2762cffb
Cusparse spmv (#512)
This PR removes frequent GPU malloc/free in CSRMatvec with cuSPARSE 11. See #507.
2021-11-01 10:33:52 -07:00
Ruipeng Li
eaff5505ed
hypre's GPU SpGemm (#433)
This PR improves the performance of hypre's sparse matrix-matrix on NVIDIA GPUs, and fixes it on AMD GPUs with hip.

Co-authored-by: Ruipeng Li <coe0141@redwood.cm.cluster>
Co-authored-by: Paul T. Bauman <ptbauman@gmail.com>
2021-09-09 08:34:39 -07:00
Ruipeng Li
dd9f1ea31c
Dcsrmv analysis (#458)
This PR (by @pbauman #430) is a hook to be able to call rocsparse_dcsrmv_analysis when using rocSPARSE on AMD GPUs. 

Co-authored-by: Paul T. Bauman <ptbauman@gmail.com>
2021-09-08 13:59:17 -07:00
Wayne Mitchell
e53b5a0270
Add rocsparse triangular solve (#462)
Adds a rocsparse implementation for the upper/lower triangular solve required 
for Gauss-Seidel relaxation when using hip and rocsparse on AMD GPUs.
2021-08-30 13:33:49 -07:00
Wayne Mitchell
59fcd47e6d
Air gpu (#425)
This PR ports the Neumann version of AIR to the GPU. New features include:
   1. Construction of Neumann AIR restriction operator R on the GPU
   2. Construction of one-point interpolation on the GPU
   3. Construction of an absolute value version of the strength of connection matrix on the GPU
   4. CF relaxation for Jacobi (relax7) and L1 Jacobi (relax18) on the GPU - note that this does redundant computation since a full matvec is called when only relaxing either C- or F-points
   5. Regression tests for AIR
   6. Filtering for ParCSR matrices based on tol*row_norm for 1-, 2-, and infinity-norm on the GPU
2021-08-06 16:39:42 -07:00
Luke
cb0c70b163
Fix potentially inconsistent eig estimates (#390) (#410)
This PR reimplements hypre_ParCSRMaxEigEstimate using Gershgorin discs, which ensures that max_eig and min_eig are both allreduced across all ranks so that the return value of the function is the same for all ranks.

Co-authored-by: Ruipeng Li <li50@llnl.gov>
Co-authored-by: li50@llnl.gov <liruipengblue@gmail.com>
2021-08-05 14:03:04 -07:00
Victor A. Paludetto Magri
ffe4f7384b
Update IJ interface with changes in recmat-merge (#392)
This PR improves the IJ interface with new functions, performs code reorganization, and simplifies coding by removing ownership info related to the partitioning data members from ParCSRMatrix and ParVector objects. A more comprehensive list of changes is given below:

* Add HYPRE_IJMatrixAdd, HYPRE_IJMatrixNorm and HYPRE_IJMatrixTranspose functions
* Add ParCSRMatrixInfNorm and ParCSRMatrixReorder functions
* Add transpose, add and norm functions to IJMatrix
* Add more caliper annotation to BoomerAMG and ParCSR functions
* Fix typo in assumed partition function and add caliper annotation
* The output matrix from ParTMatmul owns row/col starts.
* Build communication package for A at ParTMatmul if it does not exist.
* Move hypre_Log2 to utilities
* Add HYPRE_ANNOTATE_REGION_[BEGIN,END] to caliper annotation
* Phase out [row,col]_starts ownership info in ParCSR matrices
* Remove partitioning ownership info from vector
* Move partitioning variables to stack memory
2021-07-28 15:42:23 -07:00
Ruipeng Li
8c9f41a4d0
GPU ams ame ads (#398)
This PR adds GPU support for ams, ame and ads, and the following parcsr operations on GPUs, ParCSRAdd, ParCSRTranspose, l1 hybrid G-S/SSOR.

Co-authored-by: Rob Falgout <rfalgout@llnl.gov>
2021-06-21 14:36:46 -07:00
Wayne Mitchell
5f8472b05c
Amgdd fixes (#386)
This removes the masked matvec routine previously used for CF L1 Jacobi relaxation in the AMG-DD solver. There was a bug present in the GPU code and the bsrxmv cusparse routine no longer supports our use-case as of cuda 11. In addition, appropriate regression test results were saved for the GPU implementation of AMG-DD.
2021-06-15 10:44:46 -07:00
Ruipeng Li
3bc7d267ef
Gpu default (#336)
This PR changes AMG defaults regarding GPUs at various places, adds regression tests on GPUs, simplifies CUDA boxloop implementations. 

Co-authored-by: Sarah Virginia Osborn <osborn9@llnl.gov>
Co-authored-by: PaulMullowney <pmullown@nrel.gov>
Co-authored-by: Daniel Osei-Kuffuor <oseikuffuor1@llnl.gov>
Co-authored-by: Ruipeng Li <li50@euler.llnl.gov>
Co-authored-by: Ruipeng Li <coe0141@redwood.cm.cluster>
2021-05-24 17:16:35 -07:00
Victor A. Paludetto Magri
c38527c455
Add OMP support to Mat/Mat add functions (#341)
Add OpenMP support to CSRMatrixAddHost and ParCSRMatrixAdd functions. Minor changes are:
- Changed name ParcsrAdd to ParCSRMatrixAdd
- Add hypre_CSRMatrixAddFirstPass and hypre_CSRMatrixAddSecondPass to reduce code duplication
- Update rownnz support in CSRMatrixAddHost, ParCSRMatrixAdd and hypre_ILUParCSRInverseNSH.
- Refactor SpMV branches
2021-05-13 17:29:42 -07:00
Ruipeng Li
25646da905
Mat descr (#331)
This PR (@pbauman #329) addresses #309, which allows each `hypre_csrmatrix` has a GPU matrix descriptor.

Co-authored-by: Paul T. Bauman <ptbauman@gmail.com>
2021-04-15 19:00:38 -07:00
Ruipeng Li
b3a4a76a5f
Roc sparse (#316)
This PR (by @pbauman #304) adds the first pass of rocSPARSE support.

Co-authored-by: Paul T. Bauman <ptbauman@gmail.com>
2021-03-25 20:11:53 -07:00
Ramesh Pankajakshan
414fa671be
Umpire (#243)
This PR contains the support of UMPIRE pooling allocators for host and GPU memory. Configure hypre with --with-umpire, device and uvm allocations and deallocations are done with umpire, whereas host pool is not enabled by default. This PR also includes some other minor changes:

Adding .gitignore to the repo
Removing all malloc/calloc/realloc/free and regression testing on finding them
No longer compile ij.c with C++ compiler. It goes back to a C code now.
Introducing HYPRE_USING_GPU, which is equivalent to HYPRE_USING_CUDA || HYPRE_USING_DEVICE_OPENMP
Adding a few user-level interfaces: HYPRE_SetMemoryLocation, HYPRE_SetExecutionPolicy, HYPRE_SetGPUMemoryPoolSize and HYPRE_CSRMatrixSetSpGemmUseCusparse

Co-authored-by: li50@llnl.gov <liruipengblue@gmail.com>
Co-authored-by: Rob Falgout <rfalgout@llnl.gov>
Co-authored-by: Ruipeng Li <li50@llnl.gov>
2021-02-03 12:31:25 -08:00
Ruipeng Li
2186a8fb34
triangular solve on GPUs; runcheck (#256)
This PR fixes triangular solve on GPUs, and runcheck.sh

Co-authored-by: Daniel Osei-Kuffuor <oseikuffuor1@llnl.gov>
2021-01-15 20:46:59 -08:00
Ruipeng Li
b49727f16b
Cuda triangular smoothers (#240)
* This commit has CUDA based smoothers for AMG based on the triangular parts of sparse matrices. This includes an Gauss-Seidel (relax_type==3), which uses CUSPARSE triangular solvers to invert L. Symmetric Gauss Seidel is implemented in relax_type==6 also via CUSPARSE. Finally, 2 new smoothers are added. THe first is a 2 stage approximation to Gauss Seidel using a parallel MatVec and L (relax_type==11). The second (relax_type==12) is a less effective version of 11. It uses A_diag instead of L for the smoothing. CPU implementations of these new smoothers are also provided. For the two stage algorithms, L and U are NOT explicitly created. This seems faster and saves memory. In the two stage preconditioner, multiply by invdiag rather than divide by diagonal reduces register pressure and yields full occupancy.
Co-authored-by: Paul Mullowney <pmullown@nrel.gov>
Co-authored-by: PaulMullowney <60452402+PaulMullowney@users.noreply.github.com>
2020-12-17 19:37:59 -08:00
Ruipeng Li
804609b6c4
Reorg relax (#237)
This PR refactors the relaxation routines on CPUs and modularize the various Jacobi and Gauss-Seidel (G-S) methods in two "core" kernels.
2020-12-07 09:05:36 -08:00
Daniel Osei-Kuffuor
56012897e1
Ilu dev 2019 (#160)
This merge introduces new features to the parallel ILU solvers in hypre. In particular we have GPU support for BJ-ILU(0) and GMRES-ILU(0). In addition, this merge includes a new option for GMRES-ILU(0) using MILU(0) to build restriction/interpolation operators used to construct the Schur complement matrix by a Galerkin product. This option is also available on the GPU. Key commits include:

* ILU updates with bug fixes for compiling the cuda version

* Update local RCM ordering option to support nonsymmetric matrices

* Update regression tests to test new features

* Reference manual updates, Code cleanup and bug fixes 

Co-authored-by: Tianshi Xu <xu16@ray59.coralea.llnl.gov>
Co-authored-by: Tianshi Xu <xu16@lassen708.coral.llnl.gov>
Co-authored-by: Xu <xu16@bellsofireland.llnl.gov>
Co-authored-by: Kote Hitenze <hitenze@jotenshis-MacBook-Pro.local>
Co-authored-by: Tianshi Xu <xuxx1180@umn.edu>
Co-authored-by: Ruipeng Li <li50@llnl.gov>
2020-11-22 22:16:56 -06:00
Luke
22f4d3f8c6
Cuda 11 API (#163)
This PR adds CUDA-11 support.
2020-11-05 20:57:57 -08:00
Ruipeng Li
aaf5aa564a
Aggressive coarsening and 2- stage MM-ext Interpolations on GPUs (#195)
This PR contains the following changes:
* Aggressive coarsening, i.e, 2nd SoC on GPUs
* 2-stage MM-ext Interpolations (MM-ext, MM-ext+e) on GPUs
* Enhanced abilities of extracting strong FF/FC/CF/CC submatrix with given SoC matrix
* Bug fix in device PMIS
Co-authored-by: Bjorn Sjogreen <sjogreen2@llnl.gov>
Co-authored-by: ulrikeyang <yang11@llnl.gov>
2020-09-23 17:13:23 -07:00
Wayne Mitchell
0b80656ce9
AMG-DD implementation (#145)
This includes the implementation of the AMG-DD algorithm, a variant of BoomerAMG designed to limit communication. 

AMG-DD may be used as a standalone solver or a preconditioner for Krylov methods (note that AMG-DD is a non-symmetric preconditioner). For an example of how to set up and use AMG-DD, see the IJ driver (src/test/ij.c).

A list with the parameters of AMG-DD is given below:

Padding (recommended default 1): HYPRE_BoomerAMGDDSetPadding(...)
Number of ghost layers (recommended default 1): HYPRE_BoomerAMGDDSetNumGhostLayers(...)
Number of inner FAC cycles per AMG-DD iteration (default 2): HYPRE_BoomerAMGDDSetFACNumCycles(...)
FAC cycle type: HYPRE_BoomerAMGDDSetFACCycleType(...)
1 = V-cycle (default)
2 = W-cycle
3 = F-cycle
Number of relaxations on each level during FAC cycle: HYPRE_BoomerAMGDDSetFACNumRelax(...)
Type of local relaxation during FAC cycle: HYPRE_BoomerAMGDDSetFACRelaxType(...)
0 = Jacobi
1 = Gauss-Seidel
2 = ordered Gauss-Seidel
3 = C/F L1-scaled Jacobi (default)

For more details of the algorithm, see Mitchell W.B., R. Strzodka, and R.D. Falgout (2020), Parallel Performance of Algebraic Multigrid Domain Decomposition (AMG-DD).
2020-09-02 17:52:20 -07:00
Ruipeng Li
ce7ef08496 new sparse mat-mat-dist, triple-mat-dist 2020-06-01 18:10:41 -07:00
Ruipeng Li
6f8f513164 added protos.h; bug fix 2020-05-12 23:24:35 -07:00