(since parallel Boolean matrices don't support it). Switch back to the old version of cycle type 11.
1587 lines
74 KiB
TeX
1587 lines
74 KiB
TeX
%=============================================================================
|
|
%=============================================================================
|
|
|
|
\chapter{Solvers and Preconditioners}
|
|
\label{ch-Solvers}
|
|
|
|
There are several solvers available in \hypre{} via different
|
|
conceptual interfaces (see Table \ref{table-solver-availability}).
|
|
Note that there are a few additional solvers and preconditioners not
|
|
mentioned in the table that can be used only through the FEI interface
|
|
and are described in Paragraph 6.14.
|
|
The procedure for setup and use of solvers and preconditioners is
|
|
largely the same. We will refer to them both as solvers in the sequel
|
|
except when noted. In normal usage, the preconditioner is chosen and
|
|
constructed before the solver, and then handed to the solver as part
|
|
of the solver's setup. In the following, we assume the most common
|
|
usage pattern in which a single linear system is set up and then
|
|
solved with a single righthand side. We comment later on
|
|
considerations for other usage patterns.
|
|
|
|
\begin{table}[h]
|
|
\center
|
|
\begin{tabular}{|l||c|c|c|c|}
|
|
\hline
|
|
& \multicolumn{4}{|c|}{System Interfaces} \\
|
|
\multicolumn{1}{|c||}{Solvers} & Struct & SStruct & FEI & IJ \\
|
|
\hline\hline
|
|
Jacobi & X & X & & \\
|
|
SMG & X & X & & \\
|
|
PFMG & X & X & & \\
|
|
Split & & X & & \\
|
|
SysPFMG & & X & & \\
|
|
FAC & & X & & \\
|
|
Maxwell & & X & & \\
|
|
BoomerAMG & & X & X & X \\
|
|
AMS & & X & X & X \\
|
|
ADS & & X & X & X \\
|
|
MLI & & X & X & X \\
|
|
ParaSails & & X & X & X \\
|
|
Euclid & & X & X & X \\
|
|
PILUT & & X & X & X \\
|
|
PCG & X & X & X & X \\
|
|
GMRES & X & X & X & X \\
|
|
FlexGMRES & X & X & X & X \\
|
|
LGMRES & X & X & & X \\
|
|
BiCGSTAB & X & X & X & X \\
|
|
Hybrid & X & X & X & X \\
|
|
LOBPCG & X & X & & X \\
|
|
\hline
|
|
\end{tabular}
|
|
\caption{%
|
|
Current solver availability via \hypre{} conceptual interfaces.
|
|
}
|
|
\label{table-solver-availability}
|
|
\end{table}
|
|
|
|
%-----------------------------------------------------------------------------
|
|
|
|
\section*{Setup:}
|
|
|
|
\begin{enumerate}
|
|
|
|
\item
|
|
{\bf Pass to the solver the information defining the problem.} In the
|
|
typical user cycle, the user has passed this information into a matrix
|
|
through one of the conceptual interfaces prior to setting up the
|
|
solver. In this situation, the problem definition information is then
|
|
passed to the solver by passing the constructed matrix into the
|
|
solver. As described before, the matrix and solver must be compatible,
|
|
in that the matrix must provide the services needed by the
|
|
solver. Krylov solvers, for example, need only a matrix-vector
|
|
multiplication. Most preconditioners, on the other hand, have
|
|
additional requirements such as access to the matrix coefficients.
|
|
|
|
\item
|
|
{\bf Create the solver/preconditioner} via the \code{Create()} routine.
|
|
|
|
\item
|
|
{\bf Choose parameters for the preconditioner and/or solver.}
|
|
Parameters are chosen through the \code{Set()} calls provided by the
|
|
solver. Throughout \hypre{}, we have made our best effort to
|
|
give all parameters reasonable defaults if not chosen. However,
|
|
for some preconditioners/solvers the best choices for parameters
|
|
depend on the problem to be solved. We give recommendations in the
|
|
individual sections on how to choose these parameters.
|
|
Note that in \hypre{}, convergence
|
|
criteria can be chosen after the preconditioner/solver has been setup.
|
|
For a complete set of all available parameters see the Reference Manual.
|
|
|
|
\item
|
|
{\bf Pass the preconditioner to the solver.} For solvers that are not
|
|
preconditioned, this step is omitted. The preconditioner is passed
|
|
through the \code{SetPrecond()} call.
|
|
|
|
\item
|
|
{\bf Set up the solver.} This is just the \code{Setup()} routine.
|
|
At this point the matrix and right hand side is passed into the solver
|
|
or preconditioner. Note that the actual right hand side is not used
|
|
until the actual solve is performed.
|
|
|
|
\end{enumerate}
|
|
|
|
At this point, the solver/preconditioner is fully constructed and
|
|
ready for use.
|
|
|
|
%-----------------------------------------------------------------------------
|
|
|
|
\section*{Use:}
|
|
|
|
\begin{enumerate}
|
|
|
|
\item
|
|
{\bf Set convergence criteria.} Convergence can be controlled by the
|
|
number of iterations, as well as various tolerances such as relative
|
|
residual, preconditioned residual, etc. Like all parameters,
|
|
reasonable defaults are used. Users are free to change these, though
|
|
care must be taken. For example, if an iterative method is used as a
|
|
preconditioner for a Krylov method, a constant number of iterations is
|
|
usually required.
|
|
|
|
\item
|
|
{\bf Solve the system.} This is just the \code{Solve()} routine.
|
|
|
|
\end{enumerate}
|
|
|
|
%-----------------------------------------------------------------------------
|
|
|
|
\section*{Finalize:}
|
|
|
|
\begin{enumerate}
|
|
|
|
\item
|
|
{\bf Free the solver or preconditioner.} This is done using the
|
|
\code{Destroy()} routine.
|
|
|
|
\end{enumerate}
|
|
|
|
%-----------------------------------------------------------------------------
|
|
|
|
\section* {Synopsis}
|
|
|
|
In general, a solver (let's call it {\tt SOLVER}) is set up and run using the following routines,
|
|
where A is the matrix, b the right hand side and x the solution vector
|
|
of the linear system to be solved:
|
|
|
|
\begin{display}
|
|
\begin{verbatim}
|
|
/* Create Solver */
|
|
int HYPRE_SOLVERCreate(MPI_COMM_WORLD, &solver);
|
|
|
|
/* set certain parameters if desired */
|
|
HYPRE_SOLVERSetTol(solver, 1.e-8);
|
|
.
|
|
.
|
|
/* Set up Solver */
|
|
HYPRE_SOLVERSetup(solver, A, b, x);
|
|
/* Solve the system */
|
|
HYPRE_SOLVERSolve(solver, A, b, x);
|
|
/* Destroy the solver */
|
|
HYPRE_SOLVERDestroy(solver);
|
|
\end{verbatim}
|
|
\end{display}
|
|
|
|
In the following sections, we will give brief descriptions of the available \hypre{} solvers
|
|
with some suggestions on how to choose the parameters as well as references for users
|
|
who are interested in a more detailed description and analysis of the solvers.
|
|
A complete list of all routines that are available can be found in the reference manual.
|
|
|
|
%-----------------------------------------------------------------------------
|
|
|
|
\section{SMG}
|
|
|
|
SMG is a parallel semicoarsening multigrid solver for the linear
|
|
systems arising from finite difference, finite volume, or finite
|
|
element discretizations of the diffusion equation,
|
|
\begin{equation}
|
|
\nabla \cdot ( D \nabla u ) + \sigma u = f
|
|
\end{equation}
|
|
on logically rectangular grids. The code solves both 2D and 3D
|
|
problems with discretization stencils of up to 9-point in 2D and up to
|
|
27-point in 3D. See
|
|
\cite{SSchaffer_1998a,PNBrown_RDFalgout_JEJones_2000,RDFalgout_JEJones_2000}
|
|
for details on the algorithm and its parallel implementation/performance.
|
|
|
|
SMG is a particularly robust method. The algorithm semicoarsens in
|
|
the z-direction and uses plane smoothing. The xy plane-solves are
|
|
effected by one V-cycle of the 2D SMG algorithm, which semicoarsens in
|
|
the y-direction and uses line smoothing.
|
|
|
|
%-----------------------------------------------------------------------------
|
|
|
|
\section{PFMG}
|
|
|
|
PFMG is a parallel semicoarsening multigrid solver similar to SMG.
|
|
See \cite{SFAshby_RDFalgout_1996,RDFalgout_JEJones_2000} for details
|
|
on the algorithm and its parallel implementation/performance.
|
|
|
|
The main difference between the two methods is in the smoother: PFMG
|
|
uses simple pointwise smoothing. As a result, PFMG is not as robust
|
|
as SMG, but is much more efficient per V-cycle.
|
|
|
|
%-----------------------------------------------------------------------------
|
|
|
|
\section{SysPFMG}
|
|
|
|
SysPFMG is a parallel semicoarsening multigrid solver for systems of
|
|
elliptic PDEs. It is a generalization of PFMG, with the interpolation
|
|
defined only within the same variable. The relaxation is of nodal type-
|
|
all variables at a given point location are simultaneously solved for in the
|
|
relaxation.
|
|
|
|
Although SysPFMG is implemented through the SStruct interface, it can
|
|
be used only for problems with one grid part. Ideally, the solver should
|
|
handle any of the seven variable types (cell-, node-, xface-, yface-, zface-,
|
|
xedge-, yedge-, and zedge-based). However, it has been completed only for cell-based
|
|
variables.
|
|
|
|
%-----------------------------------------------------------------------------
|
|
|
|
\section{SplitSolve}
|
|
|
|
SplitSolve is a parallel block Gauss-Seidel solver for semi-structured
|
|
problems with multiple parts. For problems with only one variable, it can be viewed as
|
|
a domain-decomposition solver with no grid overlapping.
|
|
|
|
Consider a multiple part problem given by the linear system $Ax=b$. Matrix $A$ can
|
|
be decomposed into a structured intra-variable block diagonal component $M$ and a
|
|
component $N$ consisting of the inter-variable blocks and any unstructured connections
|
|
between the parts. SplitSolve performs the iteration
|
|
\[ x_{k+1} = \tilde{M}^{-1} (b + N x_k),\]
|
|
where $\tilde{M}^{-1}$ is a decoupled block-diagonal V(1,1) cycle, a separate cycle for each
|
|
part and variable type. There are two V-cycle options, SMG and PFMG.
|
|
|
|
%-----------------------------------------------------------------------------
|
|
|
|
\section{FAC}
|
|
|
|
FAC is a parallel fast adaptive composite grid solver for finite volume,
|
|
cell-centred discretizations of smooth diffusion coefficient problems.
|
|
To be precise, it is a FACx algorithm since the patch solves consist of only
|
|
relaxation sweeps. For details of the basic overall algorithms, see \cite{SFMcCormick_1989a}.
|
|
Algorithmic particularities include formation of non-Galerkin coarse-grid operators
|
|
(i.e., coarse-grid operators underlying refinement patches are automatically generated)
|
|
and non-stored linear/constant interpolation/restriction operators. Implementation particularities
|
|
include a processor redistribution of the generated coarse-grid operators so that intra-level
|
|
communication between adaptive mesh refinement (AMR) levels during the solve phase
|
|
is kept to a minimum. This redistribution is hidden from the user.
|
|
|
|
The user input is essentially a linear system describing the {\it composite}
|
|
operator, and the refinement factors between the AMR levels. To form this composite linear system,
|
|
the AMR grid is described using semi-structured grid parts. Each AMR level grid corresponds to
|
|
a separate part so that this level grid is simply a collection of boxes, all with the same refinement
|
|
factor, i.e., it is a struct grid. However, several restrictions are imposed on the patch (box) refinements.
|
|
First, a refinement box must cover all underlying coarse cells- i.e., refinement of a partial coarse cell
|
|
is not permitted. Also, the refined/coarse indices must follow a mapping: with $[r_1,r_2,r_3]$ denoting the
|
|
refinement factor and $[a_1,a_2,a_3] \times [b_1,b_2,b_3]$ denoting the coarse subbox to be refined, the
|
|
mapping to the refined patch is
|
|
\[ [r_1*a_1,r_2*a_2,r_3*a_3] \times [r_1*b_1+ r_1-1, r_2*b_2+ r_2-1,r_3*b_3+ r_3-1]. \]
|
|
With the AMR grid constructed under these restrictions,
|
|
the composite matrix can be formed. Since the AMR levels correspond to semi-structured grid parts, the
|
|
composite matrix is a semi-structured matrix consisting of structured components within each part, and
|
|
unstructured components describing the coarse-to-fine/fine-to-coarse connections. The structured and
|
|
unstructured components can be set using stencils and the HYPRE$\_$SStructGraphAddEntries routine, respectively.
|
|
The matrix coefficients can be filled after setting these non-zero patterns. Between each pair of
|
|
successive AMR levels, the coarse matrix underlying the refinement patch must be the identity and the corresponding
|
|
rows of the rhs must be zero. These can performed using routines HYPRE$\_$SStructFACZeroCFSten (to zero off
|
|
the stencil values reaching from coarse boxes into refinement boxes), HYPRE$\_$SStructFACZeroFCSten
|
|
(to zero off the stencil values reaching from refinement boxes into coarse boxes),
|
|
HYPRE$\_$SStructFACZeroAMRMatrixData (to set the identity at coarse grid points underlying a refinement patch),
|
|
and HYPRE$\_$SStructFACZeroAMRVectorData (to zero off a vector at coarse grid points underlying a refinement patch).
|
|
These routines can simplify the user's matrix setup. For example, consider two successive AMR levels with the
|
|
coarser level consisting of one box and the finer level consisting of a collection of boxes. Rather than
|
|
distinguishly setting the stencil values and the identity in the appropriate locations, the user can set the
|
|
stencil values on the whole coarse grid using the HYPRE$\_$SStructMatrixSetBoxValues routine and then
|
|
zero off the appropriate values using the above zeroing routines.
|
|
|
|
The coarse matrix underlying these patches are algebraically generated by operator-collapsing the
|
|
refinement patch operator and the fine-to-coarse coefficients (this is why stencil values reaching
|
|
out of a part must be zeroed). This matrix is re-distributed so that each processor has all of its
|
|
coarse-grid operator.
|
|
|
|
To solve the coarsest AMR level, a PFMG V cycle is used. Note that a minimum of two AMR levels
|
|
are needed for this solver.
|
|
|
|
%-----------------------------------------------------------------------------
|
|
|
|
\section{Maxwell}
|
|
|
|
Maxwell is a parallel solver for edge finite element discretization of the
|
|
curl-curl formulation of the Maxwell equation
|
|
\[ \nabla \times \alpha \nabla \times E + \beta E= f, \beta> 0\]
|
|
on semi-structured grids. Details of the algorithm can be found in \cite{JonesLee_2006}.
|
|
The solver can be viewed as an operator-dependent multiple-coarsening algorithm for the Helmholtz
|
|
decomposition of the error correction. Input to this solver consist of only the linear system
|
|
and a gradient operator. In fact, if the orientation of the edge elements conforms to
|
|
a lexicographical ordering of the nodes of the grid, then the gradient operator can be generated
|
|
with the routine HYPRE$\_$MaxwellGrad: at grid points $(i,j,k)$ and $(i-1,j,k),$
|
|
the produced gradient operator takes values $1$ and $-1$ respectively, which
|
|
is the correct gradient operator for the appropriate edge orientation. Since the gradient operator
|
|
is normalized (i.e., $h$ independent) the edge finite element must also be normalized in the
|
|
discretization.
|
|
|
|
This solver is currently developed for perfectly conducting boundary condition (Dirichlet). Hence, the
|
|
rows and columns of the matrix that corresponding to the grid boundary must be set to the identity or
|
|
zeroed off. This can be achieved with the routines HYPRE$\_$SStructMaxwellPhysBdy and
|
|
HYPRE$\_$SStructMaxwellEliminateRowsCols. The former identifies the ranks of the rows that are located on
|
|
the grid boundary, and the latter adjusts the boundary rows and cols. As usual, the rhs of the linear system
|
|
must also be zeroed off at the boundary rows. This can be done using HYPRE$\_$SStructMaxwellZeroVector.
|
|
|
|
With the adjusted linear system and a gradient operator, the user can form the Maxwell multigrid solver using
|
|
several different edge interpolation schemes. For problems with smooth coefficients, the natural Nedelec
|
|
interpolation operator can be used. This is formed by calling HYPRE$\_$SStructMaxwellSetConstantCoef
|
|
with the flag$>0$ before setting up the solver, otherwise the default edge interpolation is an
|
|
operator-collapsing/element-agglomeration scheme. This is suitable for variable coefficients.
|
|
Also, before setting up the solver, the user must pass the gradient operator, whether user or
|
|
HYPRE$\_$MaxwellGrad generated, with HYPRE$\_$SStructMaxwellSetGrad. After these
|
|
preliminary calls, the Maxwell solver can be setup by calling HYPRE$\_$SStructMaxwellSetup.
|
|
|
|
There are two solver cycling schemes that can be used to solve the linear system. To describe these, one
|
|
needs to consider the augmented system operator
|
|
\begin{eqnarray}
|
|
\bf{A}= \left [
|
|
\begin{array}{ll}
|
|
A_{ee} & A_{en} \\ \rule{0cm}{.5cm}
|
|
A_{ne} & A_{nn}
|
|
\end{array}
|
|
\right ],
|
|
\end{eqnarray}
|
|
where $A_{ee}$ is the stiffness matrix corresponding to the above curl-curl formulation, $A_{nn}$
|
|
is the nodal Poisson operator created by taking the Galerkin product of $A_{ee}$ and the gradient operator,
|
|
and $A_{ne}$ and $A_{en}$ are the nodal-edge coupling operators (see \cite{JonesLee_2006}). The algorithm
|
|
for this Maxwell solver is based on forming a multigrid hierarchy to this augmented system using the block-diagonal
|
|
interpolation operator
|
|
\begin{eqnarray*}
|
|
\bf{P}= \left[ \begin{array}{ll}
|
|
P_e & 0 \\ \rule{0cm}{.5cm}
|
|
0 & P_n
|
|
\end{array}
|
|
\right],
|
|
\end{eqnarray*}
|
|
where $P_e$ and $P_n$ are respectively the edge and nodal interpolation operators determined individually
|
|
from $A_{ee}$ and $A_{nn}.$ Taking a Galerkin product between $\bf{A}$ and $\bf{P}$ produces the next coarse
|
|
augmented operator, which also has the nodal-edge coupling operators. Applying this procedure recursively
|
|
produces nodal-edge coupling operators at all levels. Now, the first solver cycling scheme,
|
|
HYPRE$\_$SStructMaxwellSolve, keeps these coupling operators on all levels of the V-cycle. The second,
|
|
cheaper scheme, HYPRE$\_$SStructMaxwellSolve2, keeps the coupling operators only on the finest level, i.e.,
|
|
separate edge and nodal V-cycles that couple only on the finest level.
|
|
|
|
%-----------------------------------------------------------------------------
|
|
|
|
\section{Hybrid}
|
|
|
|
The hybrid solver is designed to detect whether a multigrid preconditioner
|
|
is needed when solving a linear system and possibly avoid the expensive setup
|
|
of a preconditioner if a system can be solved efficiently with a diagonally
|
|
scaled Krylov solver, e.g. a strongly diagonally dominant system.
|
|
It first uses a diagonally scaled Krylov solver, which can be chosen by the user
|
|
(the default is conjugate gradient, but one should use GMRES if the matrix of the
|
|
linear system to be solved is nonsymmetric). It monitors how fast the Krylov solver
|
|
converges.
|
|
If there is not sufficient progress, the algorithm switches to a preconditioned
|
|
Krylov solver.
|
|
|
|
If used through the \code{Struct} interface, the solver is called StructHybrid
|
|
and can be used with the preconditioners SMG and PFMG (default).
|
|
It is called ParCSRHybrid, if used through the \code{IJ} interface and is used here
|
|
with BoomerAMG.
|
|
The user can determine the average convergence speed by setting a convergence tolerance $0 \leq \theta
|
|
< 1$
|
|
via the routine HYPRE$\_$StructHybridSetConvergenceTol or HYPRE$\_$StructParCSRHybridSetConvergenceTol.
|
|
The default setting is 0.9.
|
|
|
|
The average convergence factor $\rho_i = \left({{\| r_i \|} \over {\| r_0 \|}}\right)^{1/i}$ is
|
|
monitored within the chosen Krylov solver, where $r_i = b - Ax_{i}$ is the $i$-th residual.
|
|
Convergence is considered too slow when
|
|
\begin{equation}
|
|
\left( 1 - {{|\rho_i - \rho_{i-1}|} \over { \max(\rho_i, \rho_{i-1})}} \right) \rho_i > \theta .
|
|
\end{equation}
|
|
When this condition is fulfilled the hybrid solver switches from a diagonally scaled
|
|
Krylov solver to a preconditioned solver.
|
|
|
|
%-----------------------------------------------------------------------------
|
|
|
|
\section{BoomerAMG}
|
|
|
|
BoomerAMG is a parallel implementation of the algebraic multigrid
|
|
method \cite{Ruge_Stueben_1987}.
|
|
It can be used
|
|
both as a solver or as a preconditioner. The user can choose between various
|
|
different parallel coarsening techniques, interpolation and relaxation schemes.
|
|
See
|
|
\cite{VEHenson_UMYang_2002,UMYang_2005} for a detailed description of the
|
|
coarsening
|
|
algorithms, interpolation and relaxation schemes as well as numerical results.
|
|
|
|
\subsection{Parameter Options}
|
|
|
|
Various BoomerAMG functions and options are mentioned below. However, for a complete
|
|
listing and description of all available functions, see the reference manual.
|
|
|
|
|
|
BoomerAMG's Create function differs from the synopsis in that it has only one parameter
|
|
\code{HYPRE_BoomerAMGCreate(HYPRE_Solver *solver)}. It uses the communicator
|
|
of the matrix A.
|
|
|
|
Coarsening can be set by the user using the function \code{HYPRE\_BoomerAMGSetCoarsenType}.
|
|
Various coarsening techniques are available:
|
|
\begin{itemize}
|
|
\item the Cleary-Luby-Jones-Plassman (CLJP) coarsening,
|
|
\item the Falgout coarsening which is a combination of CLJP and the
|
|
classical RS coarsening algorithm (default),
|
|
\item CGC and CGC-E coarsenings \cite{Griebela_1, Griebel_2},
|
|
\item PMIS and HMIS coarsening algorithms which lead to coarsenings with lower complexities \cite{DeSterck_Yang_Heys_2004}
|
|
as well as
|
|
\item aggressive coarsening, which can be applied to any of the coarsening techniques mentioned above and thus achieving much lower complexities and lower memory use \cite{Stueben_1999}.
|
|
\end{itemize}
|
|
To use aggressive coarsening the user has to set the number of levels to which he wants to apply
|
|
aggressive coarsening (starting with the finest level)
|
|
via \code{HYPRE_BoomerAMGSetAggNumLevels}. Since aggressive coarsening requires long range
|
|
interpolation, multipass interpolation is always used on levels with aggressive coarsening.
|
|
|
|
Various interpolation techniques can be set using \code{HYPRE_BoomerAMGSetInterpType}:
|
|
\begin{itemize}
|
|
\item the ``classical'' interpolation as defined in \cite{Ruge_Stueben_1987} (default),
|
|
\item direct interpolation \cite{Stueben_1999},
|
|
\item standard interpolation \cite{Stueben_1999},
|
|
\item multipass interpolation \cite{Stueben_1999},
|
|
\item an extended ``classical'' interpolation, which is a long range interpolation and is recommended to be used with PMIS and HMIS coarsening for harder problems,
|
|
\item Jacobi interpolation \cite{Stueben_1999},
|
|
\item the ``classical'' interpolation modified for hyperbolic PDEs.
|
|
\end{itemize}
|
|
Jacobi interpolation is only use to improve certain interpolation operators and can be
|
|
used with \code{HYPRE_BoomerAMGSetPostInterpType}.
|
|
Since some of the interpolation operators might generate large stencils, it is often possible
|
|
and recommended to control complexity and truncate the interpolation operators
|
|
using \code{HYPRE_BoomerAMGSetTruncFactor} and/or \code{HYPRE_BoomerAMGSetPMaxElmts},
|
|
or \code{HYPRE_BoomerAMGSetJacobiTruncTheshold} (for Jacobi interpolation only).
|
|
|
|
Various relaxation techniques are available:
|
|
\begin{itemize}
|
|
\item weighted Jacobi relaxation,
|
|
\item a hybrid Gauss-Seidel / Jacobi relaxation scheme,
|
|
\item a symmetric hybrid Gauss-Seidel / Jacobi relaxation scheme, and
|
|
\item hybrid block and Schwarz smoothers \cite{UMYang_2004},
|
|
\item ILU and approximate inverse smoothers.
|
|
\end{itemize}
|
|
Point relaxation schemes can be set using \code{HYPRE_BoomerAMGSetRelaxType} or, if
|
|
one wants to specifically set the up cycle, down cycle or the coarsest grid, with
|
|
\code{HYPRE_BoomerAMGSetCycleRelaxType}. To use the more complicated smoothers,
|
|
e.g. block, Schwarz, ILU smoothers, it is necessary to use \code{HYPRE_BoomerAMGSetSmoothType}
|
|
and \code{HYPRE_BoomerAMGSetSmoothNumLevels}. There are further parameter choices for the
|
|
individual smoothers, which are described in the reference manual.
|
|
The default relaxation type is hybrid Gauss-Seidel with CF-relaxation (relax first the C-,
|
|
then the F-points) on the down cycle and FC-relaxation on the up-cycle. Note that if
|
|
BoomerAMG is used as a preconditioner for conjugate gradient, it is necessary to use
|
|
a symmetric smoother such as weighted Jacobi or hybrid symmetric Gauss-Seidel.
|
|
|
|
|
|
If the users wants to solve systems of PDEs and can provide information on
|
|
which variables belong to which function, BoomerAMG's systems AMG version
|
|
can also be used. Functions that enable the user to access the systems AMG version
|
|
are \code{HYPRE_BoomerAMGSetNumFunctions}, \code{HYPRE_BoomerAMGSetDofFunc}
|
|
and \code{HYPRE\_BoomerAMGSetNodal}.
|
|
|
|
For best performance, it might be necessary to set certain parameters, which will
|
|
affect both coarsening and interpolation.
|
|
One important parameter is the strong threshold, which can be set
|
|
using the function \code{HYPRE_BoomerAMGSetStrongThreshold}.
|
|
The default value is 0.25, which appears to be a good choice for 2-dimensional
|
|
problems and the low complexity coarsening algorithms.
|
|
A better choice for 3-dimensional problems appears to be 0.5, if one uses the
|
|
default coarsening algorithm or CLJP. However,
|
|
the choice of the strength threshold is problem dependent and therefore
|
|
there could be better choices than the two suggested ones.
|
|
|
|
%-----------------------------------------------------------------------------
|
|
|
|
\section{AMS}
|
|
\label{AMS}
|
|
|
|
AMS (the Auxiliary-space Maxwell Solver) is a parallel unstructured Maxwell
|
|
solver for edge finite element discretizations of the variational problem
|
|
\begin{equation} \label{ams-maxwell}
|
|
\mbox{Find } {\mathbf u} \in {\mathbf V}_h \>:\qquad
|
|
(\alpha\, \nabla \times {\mathbf u}, \nabla \times {\mathbf v}) +
|
|
(\beta\, {\mathbf u}, {\mathbf v}) = ({\mathbf f}, {\mathbf v})\,,
|
|
\qquad \mbox{for all } {\mathbf v} \in {\mathbf V}_h \,.
|
|
\end{equation}
|
|
Here ${\mathbf V}_h$ is the lowest order Nedelec (edge) finite element space,
|
|
and $\alpha>0$ and $\beta \ge 0$ are scalar, or SPD matrix coefficients.
|
|
AMS was designed to be scalable on problems with variable coefficients,
|
|
and allows for $\beta$ to be zero in part or the whole domain.
|
|
In either case the resulting problem is only semidefinite, and for solvability
|
|
the right-hand side should be chosen to satisfy compatibility conditions.
|
|
|
|
AMS is based on the auxiliary space methods for definite Maxwell
|
|
problems proposed in \cite{xu_H_curl}.
|
|
For more details, see \cite{ams_jcm}.
|
|
|
|
\subsection{Overview}
|
|
Let ${\mathbf A}$ and ${\mathbf b}$ be the stiffness matrix and the
|
|
load vector corresponding to (\ref{ams-maxwell}). Then the resulting
|
|
linear system of interest reads,
|
|
\begin{equation} \label{ams-maxwell-ls}
|
|
{\mathbf A}\, {\mathbf x} = {\mathbf b} \,.
|
|
\end{equation}
|
|
The coefficients $\alpha$ and $\beta$ are naturally associated with
|
|
the ``stiffness'' and ``mass'' terms of ${\mathbf A}$.
|
|
Besides ${\mathbf A}$ and ${\mathbf b}$, AMS requires the following
|
|
additional user input:
|
|
\begin{enumerate}
|
|
\item The discrete gradient matrix $G$ representing the edges of
|
|
the mesh in terms of its vertices. $G$ has as many rows as the number
|
|
of edges in the mesh, with each row having two nonzero entries:
|
|
$+1$ and $-1$ in the columns corresponding to the vertices composing
|
|
the edge. The sign is determined based on the orientation of the edge.
|
|
We require that $G$ includes all (interior and boundary) edges
|
|
and vertices.
|
|
|
|
\item The representations of the constant vector fields $(1,0,0)$,$(0,1,0)$ and
|
|
$(0,0,1)$ in the ${\mathbf V}_h$ basis, given as three vectors: $G_x$, $G_y$, and $G_z$.
|
|
Note that since no boundary conditions are imposed on $G$, the above vectors
|
|
can be computed as $G_x = G x$, $G_y = G y$ and $G_z = G z$, where
|
|
$x$, $y$, and $z$ are vectors representing the coordinates of the vertices of the mesh.
|
|
\end{enumerate}
|
|
|
|
In addition to the above quantities, AMS can utilize
|
|
the following (optional) information:
|
|
\begin{enumerate}
|
|
\item[(3.)] The Poisson matrices $A_\alpha$ and $A_\beta$, corresponding to
|
|
assembling of the forms $(\alpha\, \nabla u, \nabla v)$
|
|
and $(\beta\, \nabla u, \nabla v)$ using standard linear finite elements on
|
|
the same mesh.
|
|
\end{enumerate}
|
|
|
|
\noindent
|
|
Internally, AMS proceeds with the construction of the following additional objects:
|
|
\begin{itemize}
|
|
\item $A_G$ -- a matrix associated with the mass term which is either $G^{\,T} {\mathbf A} G$, or the Poisson matrix $A_\beta$ (if given).
|
|
\item ${\mathbf \Pi}$ -- the matrix representation of the interpolation operator
|
|
from vector linear to edge finite elements.
|
|
\item ${\mathbf A}_{{\mathbf \Pi}}$ -- a matrix associated with the stiffness term which is either ${\mathbf \Pi}^{\,T} {\mathbf A} {\mathbf \Pi}$ or a block-diagonal matrix with diagonal blocks $A_\alpha$ (if given).
|
|
\item $B_G$ and ${\mathbf B}_{{\mathbf \Pi}}$ -- efficient (AMG) solvers for $A_G$ and ${\mathbf A}_{{\mathbf \Pi}}$.
|
|
\end{itemize}
|
|
The solution procedure then is a three-level method using smoothing in
|
|
the original edge space and subspace corrections based on $B_G$ and ${\mathbf B}_{{\mathbf \Pi}}$.
|
|
We can employ a number of options here utilizing various combinations of the smoother
|
|
and solvers in additive or multiplicative fashion.
|
|
If $\beta$ is identically zero one can skip the subspace correction associated
|
|
with $G$, in which case the solver is a two-level method.
|
|
|
|
\subsection{Sample Usage}
|
|
AMS can be used either as a solver or as a preconditioner.
|
|
Below we list the sequence of \hypre{} calls
|
|
needed to create and use it as a solver.
|
|
See example code \file{ex15.c} for a complete implementation.
|
|
We start with the allocation of the \code{HYPRE_Solver} object:
|
|
\begin{display}\begin{verbatim}
|
|
HYPRE_Solver solver;
|
|
HYPRE_AMSCreate(&solver);
|
|
\end{verbatim}\end{display}
|
|
|
|
Next, we set a number of solver parameters. Some of them are
|
|
optional, while others are necessary in order to perform the
|
|
solver setup.
|
|
|
|
AMS offers the option to set the space dimension.
|
|
By default we consider the dimension to be $3$. The only
|
|
other option is $2$, and it can be set with the function given below.
|
|
We note that a 3D solver will still work for a 2D problem,
|
|
but it will be slower and will require more memory than necessary.
|
|
\begin{display}\begin{verbatim}
|
|
HYPRE_AMSSetDimension(solver, dim);
|
|
\end{verbatim}\end{display}
|
|
|
|
The user is required to provide the discrete gradient matrix $G$.
|
|
AMS expects a matrix defined on the whole mesh with no
|
|
boundary edges/nodes excluded. It is essential to {\bf not} impose any boundary
|
|
conditions on $G$.
|
|
Regardless of which \hypre{} conceptual interface was used to construct $G$,
|
|
one can obtain a ParCSR version of it. This is the expected
|
|
format in the following function.
|
|
\begin{display}\begin{verbatim}
|
|
HYPRE_AMSSetDiscreteGradient(solver, G);
|
|
\end{verbatim}\end{display}
|
|
|
|
In addition to $G$, we need one additional piece of information in order
|
|
to construct the solver.
|
|
The user has the option to choose either the coordinates of the vertices
|
|
in the mesh or the representations of the constant vector fields
|
|
in the edge element basis.
|
|
In both cases three \hypre{} parallel vectors should be provided.
|
|
For 2D problems, the user can set the third vector to NULL.
|
|
The corresponding function calls read:
|
|
\begin{display}\begin{verbatim}
|
|
HYPRE_AMSSetCoordinateVectors(solver,x,y,z);
|
|
\end{verbatim}\end{display}
|
|
or
|
|
\begin{display}\begin{verbatim}
|
|
HYPRE_AMSSetEdgeConstantVectors(solver,
|
|
one_zero_zero,
|
|
zero_one_zero,
|
|
zero_zero_one);
|
|
\end{verbatim}\end{display}
|
|
The vectors \code{one_zero_zero}, \code{zero_one_zero} and \code{zero_zero_one}
|
|
above correspond to the constant vector fields $(1,0,0)$, $(0,1,0)$ and $(0,0,1)$.
|
|
|
|
The remaining solver parameters are optional.
|
|
For example, the user can choose a different cycle type by calling
|
|
\begin{display}\begin{verbatim}
|
|
HYPRE_AMSSetCycleType(solver, cycle_type); /* default value: 1 */
|
|
\end{verbatim}\end{display}
|
|
|
|
\noindent
|
|
The available cycle types in AMS are:
|
|
\begin{itemize}
|
|
\item \code{cycle_type=1}: multiplicative solver $(01210)$
|
|
\item \code{cycle_type=2}: additive solver $(0+1+2)$
|
|
\item \code{cycle_type=3}: multiplicative solver $(02120)$
|
|
\item \code{cycle_type=4}: additive solver $(010+2)$
|
|
\item \code{cycle_type=5}: multiplicative solver $(0102010)$
|
|
\item \code{cycle_type=6}: additive solver $(1+020)$
|
|
\item \code{cycle_type=7}: multiplicative solver $(0201020)$
|
|
\item \code{cycle_type=8}: additive solver $(0(1+2)0)$
|
|
\item \code{cycle_type=11}: multiplicative solver $(013454310)$
|
|
\item \code{cycle_type=12}: additive solver $(0+1+3+4+5)$
|
|
\item \code{cycle_type=13}: multiplicative solver $(034515430)$
|
|
\item \code{cycle_type=14}: additive solver $(01(3+4+5)10)$
|
|
\end{itemize}
|
|
Here we use the following convention for the
|
|
three subspace correction methods:
|
|
$0$ refers to smoothing, $1$ stands for BoomerAMG based on $B_G$, and
|
|
$2$ refers to a call to BoomerAMG for ${\mathbf B}_{{\mathbf \Pi}}$.
|
|
The values $3$, $4$ and $5$ refer to the scalar subspaces
|
|
corresponding to the $x$, $y$ and $z$ components of $\mathbf \Pi$.
|
|
|
|
The abbreviation $xyyz$ for $x,y,z \in \{0,1,2,3,4,5\}$
|
|
refers to a multiplicative subspace correction based on solvers $x$, $y$, $y$, and $z$ (in that order).
|
|
The abbreviation $x+y+z$ stands for an additive subspace correction method
|
|
based on $x$, $y$ and $z$ solvers.
|
|
The additive cycles are meant to be used only when AMS is called
|
|
as a preconditioner.
|
|
In our experience the choices \code{cycle_type=1,5,8,11,13} often produced
|
|
fastest solution times, while \code{cycle_type=7} resulted in smallest
|
|
number of iterations.
|
|
|
|
Additional solver parameters, such as the maximum number of iterations,
|
|
the convergence tolerance and the output level, can be set with
|
|
\begin{display}\begin{verbatim}
|
|
HYPRE_AMSSetMaxIter(solver, maxit); /* default value: 20 */
|
|
HYPRE_AMSSetTol(solver, tol); /* default value: 1e-6 */
|
|
HYPRE_AMSSetPrintLevel(solver, print); /* default value: 1 */
|
|
\end{verbatim}\end{display}
|
|
|
|
More advanced parameters, affecting the smoothing and the
|
|
internal AMG solvers, can be set with the following three
|
|
functions:
|
|
\begin{display}\begin{verbatim}
|
|
HYPRE_AMSSetSmoothingOptions(solver, 2, 1, 1.0, 1.0);
|
|
HYPRE_AMSSetAlphaAMGOptions(solver, 10, 1, 3, 0.25, 0, 0);
|
|
HYPRE_AMSSetBetaAMGOptions(solver, 10, 1, 3, 0.25, 0, 0);
|
|
\end{verbatim}\end{display}
|
|
|
|
For (singular) problems where $\beta = 0$ in the whole domain,
|
|
different (in fact simpler) version of the AMS solver is offered.
|
|
To allow for this simplification, use the following \hypre{} call
|
|
\begin{display}\begin{verbatim}
|
|
HYPRE_AMSSetBetaPoissonMatrix(solver, NULL);
|
|
\end{verbatim}\end{display}
|
|
|
|
If $\beta$ is zero only in parts of the domain, the problem is still singular,
|
|
but the AMS solver will try to detect this and construct a non-singular
|
|
preconditioner. Though this often works well in practice, AMS also provides a
|
|
more robust version for solving such singular problems to very low convergence
|
|
tolerances. This version takes advantage of additional information: the list of
|
|
nodes which are interior to a zero-conductivity region provided by the function
|
|
\begin{display}\begin{verbatim}
|
|
HYPRE_AMSSetInteriorNodes(solver, HYPRE_ParVector interior_nodes);
|
|
\end{verbatim}\end{display}
|
|
Based on the \code{interior_nodes} variable, a restricted discrete gradient
|
|
operator $G_0$ is constructed, and AMS is then defined based on the matrix
|
|
${\mathbf A}+\delta G_0^TG_0$ which is non-singular, and a small $\delta>0$ perturbation of
|
|
${\mathbf A}$. When iterating with this preconditioner, it is advantageous to project on
|
|
the compatible subspace $Ker(G_0^T)$. This can be done periodically, or manually
|
|
through the functions
|
|
\begin{display}\begin{verbatim}
|
|
HYPRE_AMSSetProjectionFrequency(solver, int projection_frequency);
|
|
HYPRE_AMSProjectOutGradients(solver, HYPRE_ParVector x);
|
|
\end{verbatim}\end{display}
|
|
|
|
Two additional matrices are constructed in the setup of the
|
|
AMS method---one corresponding to the coefficient $\alpha$
|
|
and another corresponding to $\beta$.
|
|
This may lead to prohibitively high memory requirements, and
|
|
the next two function calls may help to save some memory.
|
|
For example, if the Poisson matrix with coefficient $\beta$
|
|
(denoted by \code{Abeta}) is
|
|
available then one can avoid one matrix construction by calling
|
|
\begin{display}\begin{verbatim}
|
|
HYPRE_AMSSetBetaPoissonMatrix(solver, Abeta);
|
|
\end{verbatim}\end{display}
|
|
Similarly, if the Poisson matrix with coefficient $\alpha$ is available
|
|
(denoted by \code{Aalpha})
|
|
the second matrix construction can also be avoided by calling
|
|
\begin{display}\begin{verbatim}
|
|
HYPRE_AMSSetAlphaPoissonMatrix(solver, Aalpha);
|
|
\end{verbatim}\end{display}
|
|
Note the following regarding these functions:
|
|
\begin{itemize}
|
|
\item Both of them change their input. More specifically,
|
|
the diagonal entries of the input matrix corresponding to eliminated
|
|
degrees of freedom (due to essential boundary conditions)
|
|
are penalized.
|
|
\item It is assumed that their essential boundary conditions of
|
|
${\mathbf A}$, \code{Abeta} and \code{Aalpha} are on the same
|
|
part of the boundary.
|
|
\item \code{HYPRE_AMSSetAlphaPoissonMatrix} forces the
|
|
AMS method to use a simpler, but weaker (in terms of convergence) method.
|
|
With this option, the multiplicative AMS cycle
|
|
is not guaranteed to converge with the
|
|
default parameters. The reason for this is the fact the solver is not
|
|
variationally obtained from the original matrix (it utilizes
|
|
the auxiliary Poisson--like matrices \code{Abeta} and \code{Aalpha}).
|
|
Therefore, it is recommended in this case to use AMS as preconditioner
|
|
only.
|
|
\end{itemize}
|
|
|
|
After the above calls, the solver is ready to be constructed.
|
|
The user has to provide the stiffness matrix ${\mathbf A}$ (in ParCSR format) and
|
|
the \hypre{} parallel vectors ${\mathbf b}$ and ${\mathbf x}$. (The vectors
|
|
are actually not used in the current AMS setup.) The setup call reads,
|
|
\begin{display}\begin{verbatim}
|
|
HYPRE_AMSSetup(solver, A, b, x);
|
|
\end{verbatim}\end{display}
|
|
It is important to note the order of the calling sequence. For example, do {\bf not}
|
|
call \code{HYPRE_AMSSetup} before calling
|
|
\code{HYPRE_AMSSetDiscreteGradient}
|
|
and one of the functions
|
|
\code{HYPRE_AMSSetCoordinateVectors} or \code{HYPRE_AMSSetEdgeConstantVectors}.
|
|
|
|
\noindent
|
|
Once the setup has completed, we can solve the linear system by calling
|
|
\begin{display}\begin{verbatim}
|
|
HYPRE_AMSSolve(solver, A, b, x);
|
|
\end{verbatim}\end{display}
|
|
|
|
\noindent
|
|
Finally, the solver can be destroyed with
|
|
\begin{display}\begin{verbatim}
|
|
HYPRE_AMSDestroy(&solver);
|
|
\end{verbatim}\end{display}
|
|
|
|
\noindent
|
|
More details can be found in the files \file{ams.h} and \file{ams.c}
|
|
located in the \file{parcsr_ls} directory.
|
|
|
|
\subsection{High-order Discretizations}
|
|
In addition to the interface for the lowest-order Nedelec elements described in
|
|
the previous subsections, AMS also provides support for (arbitrary) high-order
|
|
Nedelec element discretizations. Since the robustness of AMS depends on the
|
|
performance of BoomerAMG on the associated (high-order) auxiliary subspace
|
|
problems, we note that the convergence may not be optimal for large polynomial
|
|
degrees $k \geq 1$.
|
|
|
|
In the high-order AMS interface, the user does not need to provide the
|
|
coordinates of the vertices (or the representations of the constant vector
|
|
fields in the edge basis), but instead should construct and pass the Nedelec
|
|
interpolation matrix ${\mathbf \Pi}$ which maps (high-order) vector nodal finite
|
|
elements into the (high-order) Nedelec space. In other words, ${\mathbf \Pi}$ is
|
|
the (parallel) matrix representation of the interpolation mapping from
|
|
$\mathrm{P}_k^3$/$\mathrm{Q}_k^3$ into $\mathrm{ND}_k$, see \cite{xu_H_curl,
|
|
ams_jcm}. We require this matrix as an input, since in the high-order case
|
|
its entries very much depend on the particular choice of the basis functions in
|
|
the edge and nodal spaces, as well as on the geometry of the mesh elements. The
|
|
columns of ${\mathbf \Pi}$ should use a node-based numbering, where the
|
|
$x$/$y$/$z$ components of the first node (vertex or high-order degree of
|
|
freedom) should be listed first, followed by the $x$/$y$/$z$ components of the
|
|
second node and so on (see the documentation of \code{HYPRE_BoomerAMGSetDofFunc}).
|
|
|
|
Similarly to the Nedelec interpolation, the discrete gradient matrix $G$ should
|
|
correspond to the mapping $\varphi \in \mathrm{P}_k^3$/$\mathrm{Q}_k^3 \mapsto
|
|
\nabla \varphi \in \mathrm{ND}_k$, so even though its values are still
|
|
independent of the mesh coordinates, they will not be $\pm 1$, but will be
|
|
determined by the particular form of the high-order basis functions and degrees
|
|
of freedom.
|
|
|
|
With these matrices, the high-order setup procedure is simply
|
|
\begin{display}\begin{verbatim}
|
|
HYPRE_AMSSetDimension(solver, dim);
|
|
HYPRE_AMSSetDiscreteGradient(solver, G);
|
|
HYPRE_AMSSetInterpolations(solver, Pi, NULL, NULL, NULL);
|
|
\end{verbatim}\end{display}
|
|
We remark that the above interface calls can also be used in the lowest-order
|
|
case (or even other types of discretizations such as those based on the second
|
|
family of Nedelec elements), but we recommend calling the previously described
|
|
\code{HYPRE_AMSSetCoordinateVectors} instead, since this allows AMS to handle the
|
|
construction and use of ${\mathbf \Pi}$ internally.
|
|
|
|
|
|
Specifying the monolithic ${\mathbf \Pi}$ limits the AMS cycle type options to
|
|
those less than 10. Alternatively one can separately specify the $x$, $y$ and
|
|
$z$ components of $\mathbf \Pi$:
|
|
\begin{display}\begin{verbatim}
|
|
HYPRE_AMSSetInterpolations(solver, NULL, Pix, Piy, Piz);
|
|
\end{verbatim}\end{display}
|
|
which enables the use of AMS cycle types with index greater than 10. By
|
|
definition, ${\mathbf \Pi}^x \varphi = {\mathbf \Pi} (\varphi,0,0)$, and
|
|
similarly for ${\mathbf \Pi}^y$ and ${\mathbf \Pi}^z$. Each of these matrices
|
|
has the same sparsity pattern as $G$, but their entries depend on the
|
|
coordinates of the mesh vertices.
|
|
|
|
Finally, both ${\mathbf \Pi}$ and its components can be passed to the solver:
|
|
\begin{display}\begin{verbatim}
|
|
HYPRE_AMSSetInterpolations(solver, Pi, Pix, Piy, Piz);
|
|
\end{verbatim}\end{display}
|
|
which will duplicate some memory, but allows for experimentation with all
|
|
available AMS cycle types.
|
|
|
|
%-----------------------------------------------------------------------------
|
|
|
|
\section{ADS}
|
|
\label{ADS}
|
|
|
|
The Auxiliary-space Divergence Solver (ADS) is a parallel unstructured solver
|
|
similar to AMS, but targeting $H(div)$ instead of $H(curl)$ problems. Its usage
|
|
and options are very similar to those of AMS, and in general the relationship
|
|
between ADS and AMS is analogous to that between AMS and AMG.
|
|
|
|
Specifically ADS was designed for the scalable solution of linear systems
|
|
arising in the finite element discretization of the variational problem
|
|
\begin{equation} \label{ads-hdiv}
|
|
\mbox{Find } {\mathbf u} \in {\mathbf W}_h \>:\qquad
|
|
(\alpha\, \nabla \cdot {\mathbf u}, \nabla \cdot {\mathbf v}) +
|
|
(\beta\, {\mathbf u}, {\mathbf v}) = ({\mathbf f}, {\mathbf v})\,,
|
|
\qquad \mbox{for all } {\mathbf v} \in {\mathbf W}_h \,,
|
|
\end{equation}
|
|
where ${\mathbf W}_h$ is the lowest order Raviart-Thomas (face) finite element
|
|
space, and $\alpha>0$ and $\beta>0$ are scalar, or SPD matrix variable
|
|
coefficients. It is based on the auxiliary space methods for $H(div)$ problems
|
|
proposed in \cite{xu_H_curl}.
|
|
% For more details, see \cite{h_curl_amg_report,ams_report}.
|
|
|
|
\subsection{Overview}
|
|
Let ${\mathbf A}$ and ${\mathbf b}$ be the stiffness matrix and the
|
|
load vector corresponding to (\ref{ads-hdiv}). Then the resulting
|
|
linear system of interest reads,
|
|
\begin{equation} \label{ads-hdiv-ls}
|
|
{\mathbf A}\, {\mathbf x} = {\mathbf b} \,.
|
|
\end{equation}
|
|
The coefficients $\alpha$ and $\beta$ are naturally associated with
|
|
the ``stiffness'' and ``mass'' terms of ${\mathbf A}$.
|
|
Besides ${\mathbf A}$ and ${\mathbf b}$, ADS requires the following
|
|
additional user input:
|
|
\begin{enumerate}
|
|
\item The discrete curl matrix $C$ representing the faces of the mesh in terms
|
|
of its edges. $C$ has as many rows as the number of faces in the mesh, with
|
|
each row having nonzero entries $+1$ and $-1$ in the columns corresponding to
|
|
the edges composing the face. The sign is determined based on the orientation
|
|
of the edges relative to the face. We require that $C$ includes all (interior
|
|
and boundary) faces and edges.
|
|
|
|
\item The discrete gradient matrix $G$ representing the edges of the mesh in
|
|
terms of its vertices. $G$ has as many rows as the number of edges in the
|
|
mesh, with each row having two nonzero entries: $+1$ and $-1$ in the columns
|
|
corresponding to the vertices composing the edge. The sign is determined based
|
|
on the orientation of the edge. We require that $G$ includes all (interior
|
|
and boundary) edges and vertices.
|
|
|
|
\item Vectors $x$, $y$, and $z$ representing the coordinates of the vertices of
|
|
the mesh.
|
|
\end{enumerate}
|
|
|
|
\noindent
|
|
Internally, ADS proceeds with the construction of the following additional objects:
|
|
\begin{itemize}
|
|
\item $A_C$ -- the curl-curl matrix $C^{\,T} {\mathbf A} C$.
|
|
\item ${\mathbf \Pi}$ -- the matrix representation of the interpolation operator
|
|
from vector linear to face finite elements.
|
|
\item ${\mathbf A}_{{\mathbf \Pi}}$ -- the vector nodal matrix ${\mathbf \Pi}^{\,T} {\mathbf A} {\mathbf \Pi}$.
|
|
\item $B_C$ and ${\mathbf B}_{{\mathbf \Pi}}$ -- efficient (AMS/AMG) solvers for $A_C$ and ${\mathbf A}_{{\mathbf \Pi}}$.
|
|
\end{itemize}
|
|
The solution procedure then is a three-level method using smoothing in the
|
|
original face space and subspace corrections based on $B_C$ and ${\mathbf
|
|
B}_{{\mathbf \Pi}}$. We can employ a number of options here utilizing various
|
|
combinations of the smoother and solvers in additive or multiplicative fashion.
|
|
|
|
\subsection{Sample Usage}
|
|
ADS can be used either as a solver or as a preconditioner. Below we list the
|
|
sequence of \hypre{} calls needed to create and use it as a solver. We start
|
|
with the allocation of the \code{HYPRE_Solver} object:
|
|
\begin{display}\begin{verbatim}
|
|
HYPRE_Solver solver;
|
|
HYPRE_ADSCreate(&solver);
|
|
\end{verbatim}\end{display}
|
|
|
|
Next, we set a number of solver parameters. Some of them are optional, while
|
|
others are necessary in order to perform the solver setup.
|
|
|
|
The user is required to provide the discrete curl and gradient matrices $C$ and
|
|
$G$. ADS expects a matrix defined on the whole mesh with no boundary
|
|
faces, edges or nodes excluded. It is essential to {\bf not} impose any boundary
|
|
conditions on $C$ or $G$. Regardless of which \hypre{} conceptual interface
|
|
was used to construct the matrices, one can always obtain a ParCSR version of
|
|
them. This is the expected format in the following functions.
|
|
\begin{display}\begin{verbatim}
|
|
HYPRE_ADSSetDiscreteCurl(solver, C);
|
|
HYPRE_ADSSetDiscreteGradient(solver, G);
|
|
\end{verbatim}\end{display}
|
|
|
|
Next, ADS requires the coordinates of the vertices in the mesh as three
|
|
\hypre{} parallel vectors. The corresponding function call reads:
|
|
\begin{display}\begin{verbatim}
|
|
HYPRE_ADSSetCoordinateVectors(solver, x, y, z);
|
|
\end{verbatim}\end{display}
|
|
|
|
|
|
The remaining solver parameters are optional. For example, the user can choose
|
|
a different cycle type by calling
|
|
\begin{display}\begin{verbatim}
|
|
HYPRE_ADSSetCycleType(solver, cycle_type); /* default value: 1 */
|
|
\end{verbatim}\end{display}
|
|
|
|
\noindent
|
|
The available cycle types in ADS are:
|
|
\begin{itemize}
|
|
\item \code{cycle_type=1}: multiplicative solver $(01210)$
|
|
\item \code{cycle_type=2}: additive solver $(0+1+2)$
|
|
\item \code{cycle_type=3}: multiplicative solver $(02120)$
|
|
\item \code{cycle_type=4}: additive solver $(010+2)$
|
|
\item \code{cycle_type=5}: multiplicative solver $(0102010)$
|
|
\item \code{cycle_type=6}: additive solver $(1+020)$
|
|
\item \code{cycle_type=7}: multiplicative solver $(0201020)$
|
|
\item \code{cycle_type=8}: additive solver $(0(1+2)0)$
|
|
\item \code{cycle_type=11}: multiplicative solver $(013454310)$
|
|
\item \code{cycle_type=12}: additive solver $(0+1+3+4+5)$
|
|
\item \code{cycle_type=13}: multiplicative solver $(034515430)$
|
|
\item \code{cycle_type=14}: additive solver $(01(3+4+5)10)$
|
|
\end{itemize}
|
|
Here we use the following convention for the three subspace correction methods:
|
|
$0$ refers to smoothing, $1$ stands for AMS based on $B_C$, and $2$ refers to a
|
|
call to BoomerAMG for ${\mathbf B}_{{\mathbf \Pi}}$. The values $3$, $4$ and
|
|
$5$ refer to the scalar subspaces corresponding to the $x$, $y$ and $z$
|
|
components of $\mathbf \Pi$.
|
|
|
|
The abbreviation $xyyz$ for $x,y,z \in \{0,1,2,3,4,5\}$ refers to a
|
|
multiplicative subspace correction based on solvers $x$, $y$, $y$, and $z$ (in
|
|
that order). The abbreviation $x+y+z$ stands for an additive subspace
|
|
correction method based on $x$, $y$ and $z$ solvers. The additive cycles are
|
|
meant to be used only when ADS is called as a preconditioner. In our experience
|
|
the choices \code{cycle_type=1,5,8,11,13} often produced fastest solution times,
|
|
while \code{cycle_type=7} resulted in smallest number of iterations.
|
|
|
|
Additional solver parameters, such as the maximum number of iterations, the
|
|
convergence tolerance and the output level, can be set with
|
|
\begin{display}\begin{verbatim}
|
|
HYPRE_ADSSetMaxIter(solver, maxit); /* default value: 20 */
|
|
HYPRE_ADSSetTol(solver, tol); /* default value: 1e-6 */
|
|
HYPRE_ADSSetPrintLevel(solver, print); /* default value: 1 */
|
|
\end{verbatim}\end{display}
|
|
|
|
More advanced parameters, affecting the smoothing and the internal AMS and AMG
|
|
solvers, can be set with the following three functions:
|
|
\begin{display}\begin{verbatim}
|
|
HYPRE_ADSSetSmoothingOptions(solver, 2, 1, 1.0, 1.0);
|
|
HYPRE_ADSSetAMSOptions(solver, 11, 10, 1, 3, 0.25, 0, 0);
|
|
HYPRE_ADSSetAMGOptions(solver, 10, 1, 3, 0.25, 0, 0);
|
|
\end{verbatim}\end{display}
|
|
We note that the AMS cycle type, which is the second parameter of
|
|
\code{HYPRE_ADSSetAMSOptions} should be greater than 10, unless the high-order
|
|
interface of \code{HYPRE_ADSSetInterpolations} described in the next subsection
|
|
is being used.
|
|
|
|
After the above calls, the solver is ready to be constructed. The user has to
|
|
provide the stiffness matrix ${\mathbf A}$ (in ParCSR format) and the \hypre{}
|
|
parallel vectors ${\mathbf b}$ and ${\mathbf x}$. (The vectors are actually not
|
|
used in the current ADS setup.) The setup call reads,
|
|
\begin{display}\begin{verbatim}
|
|
HYPRE_ADSSetup(solver, A, b, x);
|
|
\end{verbatim}\end{display}
|
|
It is important to note the order of the calling sequence. For example, do {\bf not}
|
|
call \code{HYPRE_ADSSetup} before calling each of the functions
|
|
\code{HYPRE_ADSSetDiscreteCurl}, \code{HYPRE_ADSSetDiscreteGradient}
|
|
and \code{HYPRE_ADSSetCoordinateVectors}.
|
|
|
|
\noindent
|
|
Once the setup has completed, we can solve the linear system by calling
|
|
\begin{display}\begin{verbatim}
|
|
HYPRE_ADSSolve(solver, A, b, x);
|
|
\end{verbatim}\end{display}
|
|
|
|
\noindent
|
|
Finally, the solver can be destroyed with
|
|
\begin{display}\begin{verbatim}
|
|
HYPRE_ADSDestroy(&solver);
|
|
\end{verbatim}\end{display}
|
|
|
|
\noindent
|
|
More details can be found in the files \file{ads.h} and \file{ads.c}
|
|
located in the \file{parcsr_ls} directory.
|
|
|
|
\subsection{High-order Discretizations}
|
|
Similarly to AMS, ADS also provides support for (arbitrary) high-order $H(div)$
|
|
discretizations. Since the robustness of ADS depends on the performance of AMS
|
|
and BoomerAMG on the associated (high-order) auxiliary subspace problems, we
|
|
note that the convergence may not be optimal for large polynomial degrees $k
|
|
\geq 1$.
|
|
|
|
In the high-order ADS interface, the user does not need to provide the
|
|
coordinates of the vertices, but instead should construct and pass the
|
|
Raviart-Thomas and Nedelec interpolation matrices ${\mathbf \Pi}_{RT}$ and
|
|
${\mathbf \Pi}_{ND}$ which map (high-order) vector nodal finite elements into
|
|
the (high-order) Raviart-Thomas and Nedelec space. In other words, these are the
|
|
(parallel) matrix representation of the interpolation mappings from
|
|
$\mathrm{P}_k^3$/$\mathrm{Q}_k^3$ into $\mathrm{RT}_{k-1}$ and $\mathrm{ND}_k$,
|
|
see \cite{xu_H_curl, ams_jcm}. We require these matrices as inputs, since in
|
|
the high-order case their entries very much depend on the particular choice of
|
|
the basis functions in the finite element spaces, as well as on the geometry of
|
|
the mesh elements. The columns of the ${\mathbf \Pi}$ matrices should use a
|
|
node-based numbering, where the $x$/$y$/$z$ components of the first node (vertex
|
|
or high-order degree of freedom) should be listed first, followed by the
|
|
$x$/$y$/$z$ components of the second node and so on (see the documentation of
|
|
\code{HYPRE_BoomerAMGSetDofFunc}). Furthermore, each interpolation matrix can be
|
|
split into $x$, $y$ and $z$ components by defining ${\mathbf \Pi}^x \varphi =
|
|
{\mathbf \Pi} (\varphi,0,0)$, and similarly for ${\mathbf \Pi}^y$ and ${\mathbf
|
|
\Pi}^z$.
|
|
|
|
The discrete gradient and curl matrices $G$ and $C$ should correspond to the
|
|
mappings $\varphi \in \mathrm{P}_k^3$/$\mathrm{Q}_k^3 \mapsto \nabla \varphi \in
|
|
\mathrm{ND}_k$ and ${\mathbf u} \in \mathrm{ND}_k \mapsto \nabla \times {\mathbf
|
|
u} \in \mathrm{RT}_{k-1}$, so even though their values are still independent
|
|
of the mesh coordinates, they will not be $\pm 1$, but will be determined by the
|
|
particular form of the high-order basis functions and degrees of freedom.
|
|
|
|
With these matrices, the high-order setup procedure is simply
|
|
\begin{display}\begin{verbatim}
|
|
HYPRE_ADSSetDiscreteCurl(solver, C);
|
|
HYPRE_ADSSetDiscreteGradient(solver, G);
|
|
HYPRE_ADSSetInterpolations(solver, RT_Pi, NULL, NULL, NULL,
|
|
ND_Pi, NULL, NULL, NULL);
|
|
\end{verbatim}\end{display}
|
|
We remark that the above interface calls can also be used in the lowest-order
|
|
case (or even other types of discretizations), but we recommend calling the
|
|
previously described \code{HYPRE_ADSSetCoordinateVectors} instead, since this
|
|
allows ADS to handle the construction and use of the interpolations internally.
|
|
|
|
|
|
Specifying the monolithic ${\mathbf \Pi}_{RT}$ limits the ADS cycle type options
|
|
to those less than 10. Alternatively one can separately specify the $x$, $y$
|
|
and $z$ components of ${\mathbf \Pi}_{RT}$.
|
|
\begin{display}\begin{verbatim}
|
|
HYPRE_ADSSetInterpolations(solver, NULL, RT_Pix, RT_Piy, RT_Piz,
|
|
ND_Pi, NULL, NULL, NULL);
|
|
\end{verbatim}\end{display}
|
|
which enables the use of ADS cycle types with index greater than 10. The same holds for
|
|
${\mathbf \Pi}_{ND}$ and its components, e.g. to enable the subspace AMS cycle type greater
|
|
then 10 we need to call
|
|
\begin{display}\begin{verbatim}
|
|
HYPRE_ADSSetInterpolations(solver, NULL, RT_Pix, RT_Piy, RT_Piz,
|
|
NULL, ND_Pix, ND_Piy, ND_Piz);
|
|
\end{verbatim}\end{display}
|
|
Finally, both ${\mathbf \Pi}$ and their components can be passed to the solver:
|
|
\begin{display}\begin{verbatim}
|
|
HYPRE_ADSSetInterpolations(solver, RT_Pi, RT_Pix, RT_Piy, RT_Piz
|
|
ND_Pi, ND_Pix, ND_Piy, ND_Piz);
|
|
\end{verbatim}\end{display}
|
|
which will duplicate some memory, but allows for experimentation with all
|
|
available ADS and AMS cycle types.
|
|
|
|
%-----------------------------------------------------------------------------
|
|
|
|
\section{The MLI Package}
|
|
|
|
MLI is an object-oriented module that implements the class of algebraic
|
|
multigrid algorithms based on Vanek and Brezina's smoothed aggregation
|
|
method \cite{VaMB96, VaBM01}. There are two main algorithms in this module -
|
|
the original
|
|
smoothed aggregation algorithm and the modified version that uses
|
|
the finite element substructure matrices to construct the prolongation
|
|
operators. As such, the later algorithm can only be used in the
|
|
finite element context via the finite element interface. In addition,
|
|
the nodal coordinates obtained via the finite element interface can be
|
|
used to construct a better prolongation operator than the pure
|
|
translation modes.
|
|
|
|
Below is an example on how to set up MLI as a preconditioner
|
|
for conjugate gradient.
|
|
|
|
\begin{display}\begin{verbatim}
|
|
HYPRE_LSI_MLICreate(MPI_COMM_WORLD, &pcg_precond);
|
|
|
|
HYPRE_LSI_MLISetParams(pcg_precond, "MLI strengthThreshold 0.08");
|
|
...
|
|
|
|
HYPRE_PCGSetPrecond(pcg_solver,
|
|
(HYPRE_PtrToSolverFcn) HYPRE_LSI_MLISolve,
|
|
(HYPRE_PtrToSolverFcn) HYPRE_LSI_MLISetup,
|
|
pcg_precond);
|
|
\end{verbatim}\end{display}
|
|
|
|
\noindent
|
|
Note that parameters are set via \code{HYPRE_LSI_MLISetParams}. A list of
|
|
valid parameters that can be set using this routine can be found in the
|
|
FEI section of the reference manual.
|
|
|
|
%Details of how to set this preconditioners up can be
|
|
%found in the next section.
|
|
|
|
|
|
%-----------------------------------------------------------------------------
|
|
|
|
\section{ParaSails}
|
|
|
|
ParaSails is a parallel implementation of a sparse approximate inverse
|
|
preconditioner, using {\em a priori} sparsity patterns and least-squares
|
|
(Frobenius norm) minimization. Symmetric positive definite (SPD) problems
|
|
are handled using a factored SPD sparse approximate inverse. General
|
|
(nonsymmetric and/or indefinite) problems are handled with an
|
|
unfactored sparse approximate inverse. It is also possible to
|
|
precondition nonsymmetric but definite matrices with a factored, SPD
|
|
preconditioner.
|
|
|
|
ParaSails uses {\em a priori} sparsity patterns that are patterns of powers
|
|
of sparsified matrices. ParaSails also uses a post-filtering technique
|
|
to reduce the cost of applying the preconditioner.
|
|
In advanced usage not described here, the pattern of the
|
|
preconditioner can also be reused to generate preconditioners for different
|
|
matrices in a sequence of linear solves.
|
|
|
|
For more details about the ParaSails algorithm, see \cite{EChow_2000}.
|
|
|
|
%-----------------------------
|
|
|
|
\subsection{Parameter Settings}
|
|
|
|
The accuracy and cost of ParaSails are parametrized by the real {\em thresh}
|
|
and integer {\em nlevels} parameters,
|
|
$0 \le {\it thresh} \le 1$, $0 \le {\it nlevels}$.
|
|
Lower values of {\em thresh}
|
|
and higher values of {\em nlevels} lead to more accurate, but more expensive
|
|
preconditioners. More accurate preconditioners are also more expensive
|
|
per iteration. The default values are ${\it thresh} = 0.1$
|
|
and ${\it nlevels} = 1$. The parameters are set using
|
|
\code{HYPRE_ParaSailsSetParams}.
|
|
|
|
Mathematically, given a symmetric matrix $A$, the pattern of the
|
|
approximate inverse is the pattern of $\tilde{A}^m$ where $\tilde{A}$
|
|
is a matrix that has been sparsified from $A$. The sparsification
|
|
is performed by dropping all entries in a symmetrically diagonally scaled $A$
|
|
whose values are less than {\em thresh} in magnitude. The parameter
|
|
{\em nlevel} is equivalent to $m+1$.
|
|
Filtering is a post-thresholding procedure.
|
|
For more details about the algorithm, see \cite{EChow_2000}.
|
|
|
|
The storage required for the ParaSails preconditioner depends on
|
|
the parameters {\em thresh} and {\em nlevels}. The default parameters
|
|
often produce a preconditioner that can be stored in less than the
|
|
space required to store the original matrix.
|
|
ParaSails does not need a large amount of intermediate storage in
|
|
order to construct the preconditioner.
|
|
|
|
%-----------------------------
|
|
|
|
ParaSail's Create function differs from the synopsis in the following way:
|
|
|
|
\begin{display}
|
|
\begin{verbatim}
|
|
int HYPRE_ParaSailsCreate(MPI_Comm comm, HYPRE_Solver *solver,
|
|
int symmetry);
|
|
\end{verbatim}
|
|
\end{display}
|
|
where \code{comm} is the MPI communicator.
|
|
|
|
The value of \code{symmetry} has the following meanings, to indicate
|
|
the symmetry and definiteness of the problem, and to specify the
|
|
type of preconditioner to construct:
|
|
\begin{center}
|
|
\begin{tabular}{|c|l|} \hline
|
|
value & meaning \\ \hline
|
|
0 & nonsymmetric and/or indefinite problem, and nonsymmetric preconditioner \\
|
|
1 & SPD problem, and SPD (factored) preconditioner \\
|
|
2 & nonsymmetric, definite problem, and SPD (factored) preconditioner \\
|
|
\hline
|
|
\end{tabular}
|
|
\end{center}
|
|
For more information about the final case, see section \ref{nearly}.
|
|
|
|
Parameters for setting up the preconditioner are specified using
|
|
\begin{display}
|
|
\begin{verbatim}
|
|
int HYPRE_ParaSailsSetParams(HYPRE_Solver solver,
|
|
double thresh, int nlevel, double filter);
|
|
\end{verbatim}
|
|
\end{display}
|
|
|
|
The parameters are used to specify the sparsity pattern and filtering value
|
|
(see above), and are described with suggested values as follows:
|
|
|
|
\begin{center}
|
|
\begin{tabular}{|c|c|c|c|c|l|} \hline
|
|
parameter & type & range & sug. values & default & meaning \\ \hline
|
|
{\tt nlevel} & integer & ${\tt nlevel} \ge 0$ & 0, 1, 2 & 1 & $m={\tt nlevel}+1$\\
|
|
\hline
|
|
{\tt thresh} & real & ${\tt thresh} \ge 0$ & 0, 0.1, 0.01 & 0.1 & {\em thresh} $=$ {\tt thresh}\\
|
|
& & ${\tt thresh} < 0$ & -0.75, -0.90 & & {\em thresh} selected automatically\\
|
|
\hline
|
|
{\tt filter} & real & ${\tt filter} \ge 0$ & 0, 0.05, 0.001 & 0.05 & filter value $=$ {\tt filter}\\
|
|
& & ${\tt filter} < 0$ & -0.90 & & filter value selected automatically\\
|
|
\hline
|
|
\end{tabular}
|
|
\end{center}
|
|
|
|
When ${\tt thresh} < 0$, then a threshold is selected such that
|
|
$-{\tt thresh}$ represents the fraction of the nonzero elements
|
|
that are dropped. For example, if ${\tt thresh} = -0.9$ then
|
|
$\tilde{A}$ will contain approximately ten percent of the nonzeros
|
|
in $A$.
|
|
|
|
When ${\tt filter} < 0$, then a filter value is selected such that
|
|
$-{\tt filter}$ represents the fraction of the nonzero elements
|
|
that are dropped. For example, if ${\tt filter} = -0.9$ then
|
|
approximately 90 percent of the entries in the computed approximate
|
|
inverse are dropped.
|
|
|
|
%-----------------------------
|
|
|
|
\subsection{Preconditioning Nearly Symmetric Matrices}
|
|
\label{nearly}
|
|
|
|
A nonsymmetric, but definite and nearly symmetric matrix $A$
|
|
may be preconditioned
|
|
with a symmetric preconditioner $M$. Using a symmetric preconditioner
|
|
has a few advantages, such as guaranteeing positive
|
|
definiteness of the preconditioner, as well as being less expensive
|
|
to construct.
|
|
|
|
The nonsymmetric matrix $A$ must be definite,
|
|
i.e., $(A+A^T)/2$ is SPD, and the {\em a priori} sparsity pattern to be used
|
|
must be symmetric. The latter may be guaranteed by 1)
|
|
constructing the sparsity pattern with a symmetric matrix, or 2) if the
|
|
matrix is structurally symmetric (has symmetric pattern), then
|
|
thresholding to construct the pattern is not used (i.e.,
|
|
zero value of the {\tt thresh} parameter is used).
|
|
|
|
%-----------------------------------------------------------------------------
|
|
|
|
\section{Euclid}
|
|
|
|
The Euclid library is a scalable implementation of the Parallel ILU algorithm
|
|
that was presented at SC99~\cite{DHysom_APothen_1999}, and published in
|
|
expanded form in the SIAM Journal on Scientific
|
|
Computing~\cite{DHysom_APothen_2001}. By {\em scalable} we mean that the
|
|
factorization (setup) and application (triangular solve) timings remain nearly
|
|
constant when the global problem size is scaled in proportion to the number of
|
|
processors. As with all ILU preconditioning methods, the number of iterations
|
|
is expected to increase with global problem size.
|
|
|
|
Experimental results have shown that PILU preconditioning is in general
|
|
more effective than Block Jacobi preconditioning
|
|
for minimizing total solution time.
|
|
For scaled problems, the relative advantage appears to increase
|
|
as the number of processors is scaled upwards.
|
|
Euclid may also be used to good advantage as a smoother within
|
|
multigrid methods.
|
|
|
|
%-----------------------------
|
|
|
|
\subsection{Overview}
|
|
|
|
Euclid is best thought of as an ``extensible ILU preconditioning
|
|
framework.''
|
|
{\em Extensible} means that Euclid can (and eventually will, time and
|
|
contributing agencies permitting) support many variants of ILU($k$)
|
|
and ILUT preconditioning.
|
|
(The current release includes Block Jacobi ILU($k$) and
|
|
Parallel ILU($k$) methods.)
|
|
Due to this extensibility, and also because Euclid was developed
|
|
independently of the \hypre{} project, the methods by which one
|
|
passes runtime parameters to Euclid preconditioners
|
|
differ in some respects from the \hypre{} norm.
|
|
While users can directly set options within their code,
|
|
options can also be passed to Euclid preconditioners via
|
|
command line switches and/or small text-based configuration files.
|
|
The latter strategies have the advantage that users will not need to
|
|
alter their codes as Euclid's capabilities are extended.
|
|
|
|
%Euclid subscribes to the philosophy that
|
|
%the less coding required of the user (and the maintainer) the better.
|
|
%Hence, rather than coding interface function calls for every settable
|
|
%parameter,
|
|
|
|
The following fragment illustrates the minimum coding %that is
|
|
required to invoke Euclid preconditioning within \hypre{} application contexts.
|
|
The next subsection provides examples of the various ways in which
|
|
Euclid's options can be set.
|
|
The final subsection lists the options,
|
|
and provides guidance as to the settings that (in our experience)
|
|
will likely prove effective for minimizing execution time.
|
|
|
|
\begin{display}
|
|
\begin{verbatim}
|
|
#include "HYPRE_parcsr_ls.h"
|
|
|
|
HYPRE_Solver eu;
|
|
HYPRE_Solver pcg_solver;
|
|
HYPRE_ParVector b, x;
|
|
HYPRE_ParCSRMatrix A;
|
|
|
|
//Instantiate the preconditioner.
|
|
HYPRE_EuclidCreate(comm, &eu);
|
|
|
|
//Optionally use the following three methods to set runtime options.
|
|
// 1. pass options from command line or string array.
|
|
HYPRE_EuclidSetParams(eu, argc, argv);
|
|
|
|
// 2. pass options from a configuration file.
|
|
HYPRE_EuclidSetParamsFromFile(eu, "filename");
|
|
|
|
// 3. pass options using interface functions.
|
|
HYPRE_EuclidSetLevel(eu, 3);
|
|
...
|
|
|
|
//Set Euclid as the preconditioning method for some
|
|
//other solver, using the function calls HYPRE_EuclidSetup
|
|
//and HYPRE_EuclidSolve. We assume that the pcg_solver
|
|
//has been properly initialized.
|
|
HYPRE_PCGSetPrecond(pcg_solver,
|
|
(HYPRE_PtrToSolverFcn) HYPRE_EuclidSolve,
|
|
(HYPRE_PtrToSolverFcn) HYPRE_EuclidSetup,
|
|
eu);
|
|
|
|
//Solve the system by calling the Setup and Solve methods for,
|
|
//in this case, the HYPRE_PCG solver. We assume that A, b, and x
|
|
//have been properly initialized.
|
|
HYPRE_PCGSetup(pcg_solver, (HYPRE_Matrix)A, (HYPRE_Vector)b, (HYPRE_Vector)x);
|
|
HYPRE_PCGSolve(pcg_solver, (HYPRE_Matrix)parcsr_A, (HYPRE_Vector)b, (HYPRE_Vector)x);
|
|
|
|
//Destroy the Euclid preconditioning object.
|
|
HYPRE_EuclidDestroy(eu);
|
|
|
|
\end{verbatim}
|
|
\end{display}
|
|
|
|
%-----------------------------
|
|
|
|
\subsection{Setting Options: Examples}
|
|
|
|
For expositional purposes, assume you wish to set the ILU($k$)
|
|
factorization level to the value $k = 3$.
|
|
There are several methods of accomplishing this.
|
|
Internal to Euclid, options are stored in a simple database that
|
|
contains (name, value) pairs.
|
|
Various of Euclid's internal (private) functions query this
|
|
database to determine, at runtime, what action the user
|
|
has requested.
|
|
If you enter the option ``{\bf -eu\_stats 1''}, a report will
|
|
be printed when Euclid's destructor is called; this
|
|
report lists (among other statistics) the options that
|
|
were in effect during the factorization phase.
|
|
|
|
{\bf Method 1.}
|
|
By default, Euclid always looks for a file titled
|
|
``database'' in the working directory.
|
|
If it finds such a file, it opens it and attempts to parse it as
|
|
a configuration file.
|
|
%In the directory in which you will execute the \hypre {}
|
|
%program, open a text file named ``database'' and insert the
|
|
Configuration files should be formatted as follows.
|
|
|
|
\vspace{0.1in}
|
|
\indent {\tt >cat database} \\
|
|
\indent {\tt \#this is an optional comment} \\
|
|
\indent {\tt -level 3}
|
|
\vspace{0.1in}
|
|
|
|
Any line in a configuration file that contains a ``{\tt \#}''
|
|
character in the first column is ignored.
|
|
All other lines should begin with an option {\em name}, followed by
|
|
one or more blanks, followed by the option {\em value}.
|
|
Note that option names always begin with a ``-'' character.
|
|
If you include an option name that is not recognized by Euclid,
|
|
no harm should ensue.
|
|
|
|
{\bf Method 2.}
|
|
To pass options on the command line, call
|
|
\begin{display}
|
|
\begin{verbatim}
|
|
HYPRE_EuclidSetParams(HYPRE_Solver solver, int argc, char *argv[]);
|
|
\end{verbatim}
|
|
\end{display}
|
|
where {\tt argc} and {\tt argv} carry the usual connotation:
|
|
{\tt main(int argc, char *argv[])}.
|
|
If your \hypre{} application is called {\tt phoo}, you can
|
|
then pass options on the command line per the following example.
|
|
|
|
\begin{display}
|
|
\begin{verbatim}
|
|
mpirun -np 2 phoo -level 3
|
|
\end{verbatim}
|
|
\end{display}
|
|
|
|
Since Euclid looks for the ``database'' file when
|
|
{\tt HYPRE\_EuclidCreate} is called, and parses the command line
|
|
when {\tt HYPRE\_EuclidSetParams} is called,
|
|
option values passed on the command line will override
|
|
any similar settings that may be contained in the ``database'' file.
|
|
Also, if same option name appears more than once on the command
|
|
line, the final appearance determines the setting.
|
|
|
|
Some options, such as ``{\tt -bj}'' (see next subsection) are boolean.
|
|
Euclid always treats these options as the value ``1'' (true)
|
|
or ``0'' (false).
|
|
When passing boolean options from the command line
|
|
the value may be committed, in which case it assumed to be ``1.''
|
|
Note, however, that when boolean options are contained in a
|
|
configuration file, either the ``1'' or ``0'' must
|
|
stated explicitly.
|
|
|
|
{\bf Method 3.}
|
|
There are two ways in which you can read in options from a file
|
|
whose name is other than ``database.''
|
|
First, you can call {\tt HYPRE\_EuclidSetParamsFromFile}
|
|
to specify a configuration filename.
|
|
Second, if you have passed the command line arguments as
|
|
described above in Method 2,
|
|
you can then specify the configuration filename on the command
|
|
line using the {\bf -db\_filename filename} option, e.g.,
|
|
|
|
\begin{display}
|
|
\begin{verbatim}
|
|
mpirun -np 2 phoo -db_filename ../myConfigFile
|
|
\end{verbatim}
|
|
\end{display}
|
|
|
|
{\bf Method 4.}
|
|
One can also set parameters via interface functions, e.g
|
|
\begin{display}
|
|
\begin{verbatim}
|
|
int HYPRE_EuclidSetLevel(HYPRE_Solver solver, int level);
|
|
\end{verbatim}
|
|
\end{display}
|
|
For a full set of functions, see the reference manual.
|
|
|
|
%-----------------------------
|
|
|
|
\subsection{Options Summary}
|
|
|
|
\begin{description}
|
|
\item[-level $\langle int \rangle$] Factorization level for ILU($k$).
|
|
Default: 1.
|
|
Guidance: for 2D convection-diffusion and similar problems,
|
|
fastest solution time is typically obtained with levels 4 through
|
|
8. For 3D problems fastest solution time is typically
|
|
obtained with level 1.
|
|
|
|
\item[-bj] Use Block Jacobi ILU preconditioning instead of PILU.
|
|
Default: 0 (false). Guidance: if subdomains contain
|
|
relatively few nodes (less than 1,000), or the problem is
|
|
not well partitioned, Block Jacobi ILU
|
|
may give faster solution time than PILU.
|
|
\item[-eu\_stats] When Euclid's destructor is called a summary of
|
|
runtime settings and timing information is printed
|
|
to stdout. Default: 0 (false).
|
|
The timing marks in the report are the maximum over
|
|
all processors in the MPI communicator.
|
|
\item[-eu\_mem] When Euclid's destructor is called a summary of
|
|
Euclid's memory usage is printed to stdout.
|
|
Default: 0 (false).
|
|
The statistics are for the processor whose rank
|
|
in MPI\_COMM\_WORLD is 0.
|
|
\item[-printTestData] This option is used in our autotest procedures,
|
|
and should not normally be invoked by users.
|
|
\item[-sparseA $\langle float \rangle$] Drop-tolerance for ILU($k$) factorization.
|
|
Default: 0 (no dropping).
|
|
Entries are treated as zero if their absolute
|
|
value is less than \mbox{(sparseA * max)}, where ``max''
|
|
is the largest absolute value of any entry in the
|
|
row. Guidance: try this in conjunction with
|
|
-rowScale. CAUTION: If the coefficient matrix $A$ is
|
|
symmetric, this
|
|
setting is likely to cause the filled matrix,
|
|
$F = L+U-I$, to be unsymmetric.
|
|
This setting has no effect when ILUT factorization
|
|
is selected.
|
|
\item[-rowScale] Scale values prior to factorization such that the
|
|
largest value in any row is +1 or -1.
|
|
Default: 0 (false).
|
|
CAUTION: If the coefficient matrix $A$ is symmetric, this
|
|
setting is likely to cause the filled matrix,
|
|
$F = L+U-I$, to be unsymmetric.
|
|
Guidance: if the matrix is poorly scaled, turning on
|
|
row scaling may help convergence.
|
|
\item[-ilut $\langle float \rangle$] Use ILUT factorization instead
|
|
of the default, ILU($k$). Here, $\langle float \rangle$
|
|
is the drop tolerance, which is relative to the largest
|
|
absolute value of any entry in the row being factored.
|
|
CAUTION: If the coefficient matrix $A$ is symmetric, this
|
|
setting is likely to cause the filled matrix,
|
|
$F = L+U-I$, to be unsymmetric.
|
|
NOTE: this option can only be used sequentially!
|
|
\end{description}
|
|
|
|
%-----------------------------------------------------------------------------
|
|
|
|
\section{\pilut: Parallel Incomplete Factorization}
|
|
\label{PILUT}
|
|
|
|
{\bf Note:} this code is no longer supported by the \hypre{} team. We recommend to
|
|
use Euclid instead, which is more versatile and in general more efficient, especially
|
|
when used with many processors.
|
|
|
|
\vspace{.3in}
|
|
\pilut{} is a parallel preconditioner based on Saad's dual-threshold incomplete
|
|
factorization algorithm. The original version of \pilut{} was done by Karypis
|
|
and Kumar \cite{GKarypis_VKumar_1998} in terms of the Cray SHMEM library. The
|
|
code was subsequently modified by the \hypre{} team: SHMEM was replaced by MPI;
|
|
some algorithmic changes were made; and it was software engineered to be
|
|
interoperable with several matrix implementations, including \hypre{}'s ParCSR
|
|
format, PETSc's matrices, and ISIS++ RowMatrix. The algorithm produces an
|
|
approximate factorization $ L U$, with the preconditioner $M$ defined by $ M =
|
|
L U $.
|
|
|
|
{\bf Note:} \pilut{} produces a nonsymmetric preconditioner even when the
|
|
original matrix is symmetric. Thus, it is generally inappropriate for
|
|
preconditioning symmetric methods such as Conjugate Gradient.
|
|
|
|
%-----------------------------
|
|
|
|
\subsection*{Parameters:}
|
|
|
|
\begin{itemize}
|
|
|
|
\item
|
|
\code{SetMaxNonzerosPerRow( int LFIL ); (Default: 20)}
|
|
Set the maximum number of nonzeros to be retained in each row of $L$ and $U$.
|
|
This parameter can be used to control the amount of memory that $L$ and $U$
|
|
occupy. Generally, the larger the value of \code{LFIL}, the longer it takes to
|
|
calculate the preconditioner and to apply the preconditioner and the larger
|
|
the storage requirements, but this trades
|
|
off versus a higher quality preconditioner that reduces the number of
|
|
iterations.
|
|
|
|
\item
|
|
\code{SetDropTolerance( double tol ); (Default: 0.0001)}
|
|
Set the tolerance (relative to the 2-norm of the row) below which entries in L
|
|
and U are automatically dropped. \pilut{} first drops entries based on the drop
|
|
tolerance, and then retains the largest LFIL elements in each row that remain.
|
|
Smaller values of \code{tol} lead to more accurate preconditioners, but can
|
|
also lead to increases in the time to calculate the preconditioner.
|
|
|
|
\end{itemize}
|
|
|
|
%-----------------------------------------------------------------------------
|
|
|
|
\section{LOBPCG Eigensolver}
|
|
|
|
LOBPCG (Locally Optimal Block Preconditioned Conjugate Gradient) is a simple,
|
|
yet very efficient, algorithm suggested in \cite{LOBPCG_2001, BLOPEX_2007,
|
|
BLOPEXWebPage} for computing several smallest eigenpairs of the symmetric
|
|
generalized eigenvalue problem $Ax=\lambda Bx$ with large, possibly sparse,
|
|
symmetric matrix $A$ and symmetric positive definite matrix $B$. The matrix $A$
|
|
is not assumed to be positive, which also allows one to use LOBPCG to compute
|
|
the largest eigenpairs of $Ax=\lambda Bx$ simply by solving $-Ax=\mu Bx$ for the
|
|
smallest eigenvalues $\mu=-\lambda$.
|
|
|
|
LOBPCG simultaneously computes several eigenpairs together, which is controlled
|
|
by the \code{blockSize} parameter, see example \file{ex11.c}. The LOBCPG also
|
|
allows one to impose constraints on the eigenvectors of the form $x^T B y_i=0$
|
|
for a set of vectors $y_i$ given to LOBPCG as input parameters. This makes it
|
|
possible to compute, e.g., 50 eigenpairs by 5 subsequent calls to LOBPCG with
|
|
the \code{blockSize}=10, using deflation. LOBPCG can use preconditioning in two
|
|
different ways: by running an inner preconditioned PCG linear solver, or by
|
|
applying the preconditioner directly to the eigenvector residual (option
|
|
\code{-pcgitr 0}). In all other respects, LOBPCG is similar to the PCG linear
|
|
solver.
|
|
|
|
The LOBPCG code is available for system interfaces: Struct, SStruct, and IJ. It
|
|
is also used in the Auxiliary-space Maxwell Eigensolver (AME). The LOBPCG setup
|
|
is similar to the setup for PCG.
|
|
|
|
% The LOBPCG code and related functions and utilities in \emph{hypre} are pulled
|
|
% from the package Block Locally Optimal Preconditioned Eigenvalue Xolvers
|
|
% (BLOPEX) revision 1.1, see \cite{2,3}. No special configuration for the LOBPCG
|
|
% during the hypre setup is necessary, since all LOBPCG-relevant files are already
|
|
% built-in into the \emph{hypre} source.
|
|
|
|
%-----------------------------------------------------------------------------
|