* Updated documentation for clarity and to clean up a few typos.
* Add warning messages to FEI, ParaSails, PILUT, Euclid.
* Improved and updated GPU information
* Added CMake build information
This solves an out-of-bounds memory error during `hypre_BoomerAMGSetup` when called multiple times without a call to `hypre_BoomerAMGDestroy` interleaved. This pull request makes sure that `smooth_num_levels` is reset to `hypre_ParAMGDataSmoothNumLevels(amg_data)` before the smoothers variable is allocated.
This PR adds new Print and Read functions for matrices and vectors to be stored/read in binary format. A detailed list of changes is given below:
* Add IJMatrix/ParCSRMatrix routines for binary I/O
* Add IJVector/ParVector routines for binary I/O
* Add typedefs for unsigned integer types and single-precision floating-point
* Change char sizes to HYPRE_MAX_FILE_NAME_LEN
* Add options to IJ driver for reading binary matrices/vectors
* Add regression tests for IJ input/output
Allow the use of MAGMA as local linear solver for FSAI.
Add `HYPRE_FSAISetLocalSolveType` for choosing the local linear solve type used in FSAI and add `HYPRE_BoomerAMGSetFSAILocalSolveType` for the case when FSAI is used as a smoother to BoomerAMG.
This PR adds CUDA and HIP support to FSAI according to a static pattern generation algorithm. The resulting method can also be used as a preconditioner for BoomerAMG. A detailed list of changes is given below:
* Add par_fsai_device.c
* Add hypre_FSAIApply
* Add function to dump local linear systems in dense format
* Implement static FSAI pattern computation via powers of A
* Improve filtering of candidate pattern
* Improve local linear systems extraction
* Add option for a 125pt matrix (27pt squared)
* Add options to control sizes of the memory pools with umpire
* Add hypre_GpuProfiling calls
* Improve candidate pattern truncation times
* Add max_nnz_row member and its private and public functions to FSAI
* Use max_nnz_row in FSAISetupDevice
* Add num_levels member and its private and public functions to FSAI
* Add threshold member and its public/private functions to FSAI
* Expose FSAI algorithm type to BoomerAMG
* Expose options to control FSAI setup
* Add cuSOLVER variables and calls
* Add batched dense linear solver calls to FSAI
* Improve execution time for generating random numbers
* Show FSAI parameters when amg_print_level >= 1
* Improve output of FSAIPrintStats
* Implement warp calls
* Add hypre_mask type and hypre_ballot_sync wrapper function
* Add hypre_popc and hypre_ffs wrapper functions
* Implement warp_allreduce_max calls
* Change: hypreDevice -> hypre_*Device
* Add rocSOLVER calls
* Apply astyle
* Remove redundant line
Add warnings for Euclid and PILUT redirecting users to hypre-ILU.
Rewrite hypre-ILU overview section.
Add new sections to hypre-ILU documentation: "User-level functions", "ILU as smoother for BoomerAMG", and "GPU support".
Include info about new iterative ILU options.
Update BoomerAMG complex smoothers section.
Change name "hypre-ILU" to "ILU"
Modified from #718, this PR squashes out zero columns of the off-diagonal part of a `hypre_ParCSRMatrix`.
The issue was in offd there exist empty columns (columns with no nonzeros), which correspond to "useless" entries in col_map_offd. This caused issues in at coarser grids in the communications with large number of ranks. We added a routine to compress the zero columns out and shorten col_map_offd. This should reduce communication cost even at higher levels.
Two sources of the empty columns have been located and fixed:
- Truncation after building P
- P^T(AP): only the transpose multiplication part.
---------
Co-authored-by: Noel Chalmers <noel.chalmers@gmail.com>
Co-authored-by: Ruipeng Li <li50@llnl.gov>
Co-authored-by: Wayne Mitchell <mitchell82@llnl.gov>
This PR adds HIP support to hypre_ILU (setup and solve phases):
- Algorithm type 0 (BJ-ILU0)
- Algorithm type 10 (GMRES-ILU0)
- Iterative triangular solves for backward and forward substitutions.
---------
Co-authored-by: Paul Mullowney <Paul.Mullowney@nrel.gov>
This allows users to direct hypre's error messages to a memory buffer instead of stderr. With this, there are now three basic ways to use hypre when configured --with-print-errors:
- Default (mode 0): Errors are printed immediately to stderr (there is no processor information available in this print).
- Store errors in memory (mode 1) and call PrintErrorMessages to print them.
- Store errors in memory (mode 1) and call GetErrorMessages to manage the error messages however you like.
* Use unroll_factor=8 for rocm-5.4.3
* Add SortCSRRocsparse back
* Fix Wunused-variable warnings
* Set _hypre_memory_tracker to NULL after destroy
* Update tioga results after changing default rocm version to 5.2.0
The memory leak was happening when:
A complex smoother for BoomerAMG was selected.
The AMG hierarchy consisted of one level.
The BoomerAMG preconditioner was destroyed and recomputed again.
* Add hypre_State type to track initialization state of hypre
* Add HYPRE_Initialized to determine whether hypre has been initialized
* Add HYPRE_Finalized to determine whether hypre has been finalized
* Add private implementations for hypre_initialized/finalized
* Add HYPRE_Initialize
* Update Fortran interface for HYPRE_Initialize
* Use HYPRE_Initialize in test drivers and examples
* Clean-up mentions of HYPRE_Init
* Add HYPRE_DEPRECATED macro to autotools and CMake builds
* Add regression test for library initialization/finalization
* Update HYPRE_WARP_FULL_MASK to 64-bit length for HIP
* Add hypre_mask type depending on the GPU architecture
* Change unsigned -> hypre_uint. Move a few hypre_int to hypre_uint
* Remove SyncCompute call to fix compilation with device omp
* Fix hypre_SeqVectorAxpyzDevice implementation for device omp
* Add warning for function not implemented for device omp
Do not enable HYPRE_USING_GPU when using the device openmp backend.
This allows cuda/hip/sycl implementation throughout the code to be grouped
under the HYPRE_USING_GPU macro instead of always combining the cuda/
hip/sycl macros. In addition some other extraneous macro guards are removed.
This PR adds device support to various MGR options:
Non-galerkin coarse grid correction options (except for option 4)
Block diagonal interpolation (interp_type = 12)
Block Jacobi relaxation (level_smooth_type = 0 for global relaxation and interp_type = 12 for F-relaxation)
The main code changes are listed below:
* Add hypre_ParCSRMatrixExtractBlockDiagDevice
* Add hypre_ParCSRMatrixExtractBlockDiagDevice and respective GPU kernels
* Add hypre_ParCSRMatrixGenerateFFFCHost and respective backend wrapper
* Add device support to hypre_MGRBuildPBlockJacobi
* Add hypre_ParCSRMatrixBlockDiagMatrixDevice
* Add hypre_ParCSRMatrixExtractBlockDiagDevice
* Add MGRBuildPFromWpDevice
* Add implementation for batched matrix transpose on the device
* hypre_ParCSRMatrixDropSmallEntriesDevice: exit if tolerance is zero
* Add hypre_ParCSRMatrixGenerateCCCFDevice
* Port MGR's Non-Galerkin option to device
* Add L1-Jacobi global smoother to MGR
* Add missing comments about MGR's public APIs
* Add hypre_MGRComputeNonGalerkinCGDevice
* Update style of hypre_MGRCycle
* Add sanity checks to hypre_SeqVectorElmdivpyMarked
* Add hypre_MGRBlockRelaxSolveDevice
* Add GPUProfiling to several places
* MGR setup: simplify computation of l1_norms
* MGR solve: make use of ParVectorSetZeros to make residual computations faster
* Exit hypre_SeqVectorElmdivpyMarked earlier for vectors with zero size
* Update caliper region names for MGR
* Add wrappers to cublas batched getrf and getri functions
* General performance improvements for MGR