* Remove SyncCompute call to fix compilation with device omp
* Fix hypre_SeqVectorAxpyzDevice implementation for device omp
* Add warning for function not implemented for device omp
Do not enable HYPRE_USING_GPU when using the device openmp backend.
This allows cuda/hip/sycl implementation throughout the code to be grouped
under the HYPRE_USING_GPU macro instead of always combining the cuda/
hip/sycl macros. In addition some other extraneous macro guards are removed.
This PR adds device support to various MGR options:
Non-galerkin coarse grid correction options (except for option 4)
Block diagonal interpolation (interp_type = 12)
Block Jacobi relaxation (level_smooth_type = 0 for global relaxation and interp_type = 12 for F-relaxation)
The main code changes are listed below:
* Add hypre_ParCSRMatrixExtractBlockDiagDevice
* Add hypre_ParCSRMatrixExtractBlockDiagDevice and respective GPU kernels
* Add hypre_ParCSRMatrixGenerateFFFCHost and respective backend wrapper
* Add device support to hypre_MGRBuildPBlockJacobi
* Add hypre_ParCSRMatrixBlockDiagMatrixDevice
* Add hypre_ParCSRMatrixExtractBlockDiagDevice
* Add MGRBuildPFromWpDevice
* Add implementation for batched matrix transpose on the device
* hypre_ParCSRMatrixDropSmallEntriesDevice: exit if tolerance is zero
* Add hypre_ParCSRMatrixGenerateCCCFDevice
* Port MGR's Non-Galerkin option to device
* Add L1-Jacobi global smoother to MGR
* Add missing comments about MGR's public APIs
* Add hypre_MGRComputeNonGalerkinCGDevice
* Update style of hypre_MGRCycle
* Add sanity checks to hypre_SeqVectorElmdivpyMarked
* Add hypre_MGRBlockRelaxSolveDevice
* Add GPUProfiling to several places
* MGR setup: simplify computation of l1_norms
* MGR solve: make use of ParVectorSetZeros to make residual computations faster
* Exit hypre_SeqVectorElmdivpyMarked earlier for vectors with zero size
* Update caliper region names for MGR
* Add wrappers to cublas batched getrf and getri functions
* General performance improvements for MGR
When hypre handle is not initialized and a call to hypre_handle() is made, report an error and call HYPRE_Init() rather than just calling hypre_HandleCreate() with no error reporting.
This PR has three main goals:
1. Enable ILU to work without UVM on device builds.
2. Add the runtime switch option to ILU. Thus, in device builds, one can choose now whether to execute ILU's setup and solve on the device or host.
3. Improve error handling. e.g, when trying to execute on the device ILU types that are not supported yet.
A summary of the code changes is given below:
* Enable ILU reordering algorithms to work without UVM memory.
* Update hypre_ILUSetupILU0Device with new intArray functions
* Fix memory location of S_offd + style updates
* Simplify hypre_ParILUCusparseILUExtractEBFC
* Remove unused function: hypre_ParILUCusparseExtractDiagonalCSR
* Refactor hypre_ILUGetInteriorExteriorPerm
* Refactor hypre_ILULocalRCM
* Refactor hypre_ILUSortOffdColmap
* Refactor hypre_ILUGetLocalPerm
* Refactor hypre_ILUGetPermddPQ
* Fix memory transfer at hypre_ILUGetInteriorExteriorPerm
* Add error messages for ILUK and ILUT for device runs
* Add error messages for GMRES-ILUK and GMRES-ILUT for device runs
* Add execution policy to ILU setup/solve
* Bug fixes for -exec_host/-memory_host on device builds
* Add ILU reordering option to IJ driver
* Refactor hypre_ILUSetupILUKDevice
* Refactor hypre_ILULocalRCM
* Remove duplicated code
* Bug fix on hypre_ILUGetPermddPQPre
* Update lassen results in accordance with CPU runs
---------
Co-authored-by: Paul Mullowney <Paul.Mullowney@nrel.gov>
This PR changes the type of four variables in the auxiliary matrix data structure to avoid the multiplication of integers and floating-point numbers during `hypre_IJMatrixSetAddValuesParCSRDevice`
Fixes for oneMKL sparse matmat and port of our custom spmv and spgemm routines to sycl. Note that this also involves significant updates to basic handling of kernel launches in sycl due to the need to support multi-dimensional kernels and the use of local shared memory.
This PR adds hypre_SeqVectorResize and hypre_ParVectorResize for resizing sequential and parallel vectors, respectively. This is useful for block-Krylov solvers/eigensolvers using BoomerAMG and multi-component vectors.
* Updated gcc compiler flags for strict-checking build option to throw floating point conversion warnings
* Several minor edits to clean up floating point conversion warnings and minor bugs.
* Updated saved files to reflect changes.
This PR adds hypre_ParCSRMatrixDiagScale for computing left and right parallel matrix scaling. The function also works when one of the scaling factors, which are stored as vectors, are not present. Regression tests have been added for this new function.
Introduce new AddFEMBoxValues() routines to improve system setup time when using the SStruct finite element interface. This initial implementation can produce significant speedups, but there is room for future optimizations.
Co-authored-by: Victor A. P. Magri <paludettomag1@llnl.gov>
This PR adds the function hypre_IntArrayCount for counting the number of occurrences of a value in a hypre_IntArray. Also, it moves device methods to a new file int_array_device.c.
Co-authored-by: Wayne Mitchell <mitchell82@llnl.gov>
This PR cleans the code for the warning Wstrict-prototypes. This flag was also added to the debug build of machine-tux.
Co-authored-by: Pierre Jolivet <pierre@joliv.et>
This PR adds two new search paths for the NVIDIA math libraries (cuSPARSE, cuBLAS, cuSOLVER). This fixes build issues on Polaris and Perlmutter.
* Add two new search paths for the NVIDIA math libs to configure
* Turn off CUDA math libs when CUDA is disabled
This PR fixes a few variable types inconsistencies arisen from the mixedint build. Additionally, it fixes the CUDA-11.1.1 build.
* Fix cuSPARSE version tag for using generic SpMM and new SpMV algorithms
* Bug fixes on hypre_ILU: S_row_starts computation and m -> big_m
* Bug fix: HYPRE_MPI_REAL -> HYPRE_MPI_COMPLEX
* Bug fix: HYPRE_Int -> HYPRE_BigInt
* Bug fix: HYPRE_MPI_INT -> HYPRE_MPI_BIG_INT
Co-authored-by: TotoGaz <49004943+TotoGaz@users.noreply.github.com>
* added changes required for the new AMG benchmark, including a new routine that returns wall clock time and some new parameters which generate cumulative numbers of nonzeros for A, coarse grid and prolongation operators in AMG
Several bug fixes and small changes for the sycl build. Addition of full regression testing on florentia with consistent and correct results for struct and ij tests with sycl backend.
This PR modifies hypre_ParCSRMatrixGenerateFFFC to act as a wrapper between the host and device implementations. Consequently, hypre_ParCSRMatrixGenerateFFFCHost has been added.