hypre/docs/usr_solvers.tex

%=============================================================================
%=============================================================================

\chapter{Solvers and Preconditioners}
\label{ch-Solvers}

There are several solvers available in \hypre{} via different
conceptual interfaces (see Table \ref{table-solver-availability}).
Note that there are a few additional solvers and preconditioners not
mentioned in the table that can be used only through the FEI interface
and are described in Paragraph 6.14.
The procedure for setup and use of solvers and preconditioners is
largely the same. We will refer to them both as solvers in the sequel
except when noted.  In normal usage, the preconditioner is chosen and
constructed before the solver, and then handed to the solver as part
of the solver's setup.  In the following, we assume the most common
usage pattern in which a single linear system is set up and then
solved with a single righthand side. We comment later on
considerations for other usage patterns.

\begin{table}[h]
\center
\begin{tabular}{|l||c|c|c|c|}
\hline
                               & \multicolumn{4}{|c|}{System Interfaces} \\
\multicolumn{1}{|c||}{Solvers} & Struct & SStruct & FEI & IJ \\
\hline\hline
Jacobi     & X & X &   &   \\
SMG        & X & X &   &   \\
PFMG       & X & X &   &   \\
Split      &   & X &   &   \\
SysPFMG    &   & X &   &   \\
FAC        &   & X &   &   \\
Maxwell    &   & X &   &   \\
BoomerAMG  &   & X & X & X \\
AMS        &   & X & X & X \\
ADS        &   & X & X & X \\
MLI        &   & X & X & X \\
ParaSails  &   & X & X & X \\
Euclid     &   & X & X & X \\
PILUT      &   & X & X & X \\
PCG        & X & X & X & X \\
GMRES      & X & X & X & X \\
FlexGMRES  & X & X & X & X \\
LGMRES     & X & X &   & X \\
BiCGSTAB   & X & X & X & X \\
Hybrid     & X & X & X & X \\
LOBPCG     & X & X &   & X \\
\hline
\end{tabular}
\caption{%
Current solver availability via \hypre{} conceptual interfaces.
}
\label{table-solver-availability}
\end{table}

%-----------------------------------------------------------------------------

\section*{Setup:}

\begin{enumerate}

\item
{\bf Pass to the solver the information defining the problem.} In the
typical user cycle, the user has passed this information into a matrix
through one of the conceptual interfaces prior to setting up the
solver. In this situation, the problem definition information is then
passed to the solver by passing the constructed matrix into the
solver. As described before, the matrix and solver must be compatible,
in that the matrix must provide the services needed by the
solver. Krylov solvers, for example, need only a matrix-vector
multiplication.  Most preconditioners, on the other hand, have
additional requirements such as access to the matrix coefficients.

\item
{\bf Create the solver/preconditioner} via the \code{Create()} routine.

\item
{\bf Choose parameters for the preconditioner and/or solver.}
Parameters are chosen through the \code{Set()} calls provided by the
solver.  Throughout \hypre{}, we have made our best effort to
give all parameters reasonable defaults if not chosen.  However,
for some preconditioners/solvers the best choices for parameters
depend on the problem to be solved. We give recommendations in the
individual sections on how to choose these parameters.
Note that in \hypre{}, convergence
criteria can be chosen after the preconditioner/solver has been setup.
For a complete set of all available parameters see the Reference Manual.

\item
{\bf Pass the preconditioner to the solver.} For solvers that are not
preconditioned, this step is omitted.  The preconditioner is passed
through the \code{SetPrecond()} call.

\item
{\bf Set up the solver.} This is just the \code{Setup()} routine.
At this point the matrix and right hand side is passed into the solver
or preconditioner. Note that the actual right hand side is not used
until the actual solve is performed.

\end{enumerate}

At this point, the solver/preconditioner is fully constructed and
ready for use.

%-----------------------------------------------------------------------------

\section*{Use:}

\begin{enumerate}

\item
{\bf Set convergence criteria.}  Convergence can be controlled by the
number of iterations, as well as various tolerances such as relative
residual, preconditioned residual, etc.  Like all parameters,
reasonable defaults are used.  Users are free to change these, though
care must be taken.  For example, if an iterative method is used as a
preconditioner for a Krylov method, a constant number of iterations is
usually required.

\item
{\bf Solve the system.}  This is just the \code{Solve()} routine.

\end{enumerate}

%-----------------------------------------------------------------------------

\section*{Finalize:}

\begin{enumerate}

\item
{\bf Free the solver or preconditioner.} This is done using the
\code{Destroy()} routine.

\end{enumerate}

%-----------------------------------------------------------------------------

\section* {Synopsis}

In general, a solver (let's call it {\tt SOLVER}) is set up and run using the following routines,
where A is the matrix, b the right hand side and x the solution vector
of the linear system to be solved:

\begin{display}
\begin{verbatim}
/* Create Solver */
int HYPRE_SOLVERCreate(MPI_COMM_WORLD, &solver);

/* set certain parameters if desired */
HYPRE_SOLVERSetTol(solver, 1.e-8);
.
.
/* Set up Solver */
HYPRE_SOLVERSetup(solver, A, b, x);
/* Solve the system */
HYPRE_SOLVERSolve(solver, A, b, x);
/* Destroy the solver */
HYPRE_SOLVERDestroy(solver);
\end{verbatim}
\end{display}

In the following sections, we will give brief descriptions of the available \hypre{} solvers
with some suggestions on how to choose the parameters as well as references for users
who are interested in a more detailed description and analysis of the solvers.
A complete list of all routines that are available can be found in the reference manual.

%-----------------------------------------------------------------------------

\section{SMG}

SMG is a parallel semicoarsening multigrid solver for the linear
systems arising from finite difference, finite volume, or finite
element discretizations of the diffusion equation,
\begin{equation}
\nabla \cdot ( D \nabla u ) + \sigma u = f
\end{equation}
on logically rectangular grids.  The code solves both 2D and 3D
problems with discretization stencils of up to 9-point in 2D and up to
27-point in 3D.  See
\cite{SSchaffer_1998a,PNBrown_RDFalgout_JEJones_2000,RDFalgout_JEJones_2000}
for details on the algorithm and its parallel implementation/performance.

SMG is a particularly robust method.  The algorithm semicoarsens in
the z-direction and uses plane smoothing.  The xy plane-solves are
effected by one V-cycle of the 2D SMG algorithm, which semicoarsens in
the y-direction and uses line smoothing.

%-----------------------------------------------------------------------------

\section{PFMG}

PFMG is a parallel semicoarsening multigrid solver similar to SMG.
See \cite{SFAshby_RDFalgout_1996,RDFalgout_JEJones_2000} for details
on the algorithm and its parallel implementation/performance.

The main difference between the two methods is in the smoother: PFMG
uses simple pointwise smoothing.  As a result, PFMG is not as robust
as SMG, but is much more efficient per V-cycle.

%-----------------------------------------------------------------------------

\section{SysPFMG}

SysPFMG is a parallel semicoarsening multigrid solver for systems of
elliptic PDEs. It is a generalization of PFMG, with the interpolation
defined only within the same variable. The relaxation is of nodal type-
all variables at a given point location are simultaneously solved for in the
relaxation.

Although SysPFMG is implemented through the SStruct interface, it can
be used only for problems with one grid part. Ideally, the solver should
handle any of the seven variable types (cell-, node-, xface-, yface-, zface-,
xedge-, yedge-, and zedge-based). However, it has been completed only for cell-based
variables.

%-----------------------------------------------------------------------------

\section{SplitSolve}

SplitSolve is a parallel block Gauss-Seidel solver for semi-structured
problems with multiple parts. For problems with only one variable, it can be viewed as
a domain-decomposition solver with no grid overlapping.

Consider a multiple part problem given by the linear system $Ax=b$. Matrix $A$ can
be decomposed into a structured intra-variable block diagonal component $M$ and a
component $N$ consisting of the inter-variable blocks and any unstructured connections
between the parts. SplitSolve performs the iteration
\[ x_{k+1} = \tilde{M}^{-1} (b + N x_k),\]
where $\tilde{M}^{-1}$ is a decoupled block-diagonal V(1,1) cycle, a separate cycle for each
part and variable type. There are two V-cycle options, SMG and PFMG.

%-----------------------------------------------------------------------------

\section{FAC}

FAC is a parallel fast adaptive composite grid solver for finite volume,
cell-centred discretizations of smooth diffusion coefficient problems.
To be precise, it is a FACx algorithm since the patch solves consist of only
relaxation sweeps. For details of the basic overall algorithms, see \cite{SFMcCormick_1989a}.
Algorithmic particularities include formation of non-Galerkin coarse-grid operators
(i.e., coarse-grid operators underlying refinement patches are automatically generated)
and non-stored linear/constant interpolation/restriction operators. Implementation particularities
include a processor redistribution of the generated coarse-grid operators so that intra-level
communication between adaptive mesh refinement (AMR) levels during the solve phase
is kept to a minimum. This redistribution is hidden from the user.

The user input is essentially a linear system describing the {\it composite}
operator, and the refinement factors between the AMR levels. To form this composite linear system,
the AMR grid is described using semi-structured grid parts. Each AMR level grid corresponds to
a separate part so that this level grid is simply a collection of boxes, all with the same refinement
factor, i.e., it is a struct grid. However, several restrictions are imposed on the patch (box) refinements.
First, a refinement box must cover all underlying coarse cells- i.e., refinement of a partial coarse cell
is not permitted. Also, the refined/coarse indices must follow a mapping: with $[r_1,r_2,r_3]$ denoting the
refinement factor and $[a_1,a_2,a_3] \times [b_1,b_2,b_3]$ denoting the coarse subbox to be refined, the
mapping to the refined patch is
\[ [r_1*a_1,r_2*a_2,r_3*a_3] \times [r_1*b_1+ r_1-1, r_2*b_2+ r_2-1,r_3*b_3+ r_3-1]. \]
With the AMR grid constructed under these restrictions,
the composite matrix can be formed. Since the AMR levels correspond to semi-structured grid parts, the
composite matrix is a semi-structured matrix consisting of structured components within each part, and
unstructured components describing the coarse-to-fine/fine-to-coarse connections. The structured and
unstructured components can be set using stencils and the HYPRE$\_$SStructGraphAddEntries routine, respectively.
The matrix coefficients can be filled after setting these non-zero patterns. Between each pair of
successive AMR levels, the coarse matrix underlying the refinement patch must be the identity and the corresponding
rows of the rhs must be zero. These can performed using routines HYPRE$\_$SStructFACZeroCFSten (to zero off
the stencil values reaching from coarse boxes into refinement boxes), HYPRE$\_$SStructFACZeroFCSten
(to zero off the stencil values reaching from refinement boxes into coarse boxes),
HYPRE$\_$SStructFACZeroAMRMatrixData (to set the identity at coarse grid points underlying a refinement patch),
and HYPRE$\_$SStructFACZeroAMRVectorData (to zero off a vector at coarse grid points underlying a refinement patch).
These routines can simplify the user's matrix setup. For example, consider two successive AMR levels with the
coarser level consisting of one box and the finer level consisting of a collection of boxes. Rather than
distinguishly setting the stencil values and the identity in the appropriate locations, the user can set the
stencil values on the whole coarse grid using the HYPRE$\_$SStructMatrixSetBoxValues routine and then
zero off the appropriate values using the above zeroing routines.

The coarse matrix underlying these patches are algebraically generated by operator-collapsing the
refinement patch operator and the fine-to-coarse coefficients (this is why stencil values reaching
out of a part must be zeroed). This matrix is re-distributed so that each processor has all of its
coarse-grid operator.

To solve the coarsest AMR level, a PFMG V cycle is used. Note that a minimum of two AMR levels
are needed for this solver.

%-----------------------------------------------------------------------------

\section{Maxwell}

Maxwell is a parallel solver for edge finite element discretization of the
curl-curl formulation of the Maxwell equation
\[ \nabla \times \alpha \nabla \times E + \beta E= f, \beta> 0\]
on semi-structured grids. Details of the algorithm can be found in \cite{JonesLee_2006}.
The solver can be viewed as an operator-dependent multiple-coarsening algorithm for the Helmholtz
decomposition of the error correction. Input to this solver consist of only the linear system
and a gradient operator. In fact, if the orientation of the edge elements conforms to
a lexicographical ordering of the nodes of the grid, then the gradient operator can be generated
with the routine HYPRE$\_$MaxwellGrad: at grid points $(i,j,k)$ and $(i-1,j,k),$
the produced gradient operator takes values $1$ and $-1$ respectively, which
is the correct gradient operator for the appropriate edge orientation. Since the gradient operator
is normalized (i.e., $h$ independent) the edge finite element must also be normalized in the
discretization.

This solver is currently developed for perfectly conducting boundary condition (Dirichlet). Hence, the
rows and columns of the matrix that corresponding to the grid boundary must be set to the identity or
zeroed off. This can be achieved with the routines HYPRE$\_$SStructMaxwellPhysBdy and
HYPRE$\_$SStructMaxwellEliminateRowsCols. The former identifies the ranks of the rows that are located on
the grid boundary, and the latter adjusts the boundary rows and cols. As usual, the rhs of the linear system
must also be zeroed off at the boundary rows. This can be done using HYPRE$\_$SStructMaxwellZeroVector.

With the adjusted linear system and a gradient operator, the user can form the Maxwell multigrid solver using
several different edge interpolation schemes. For problems with smooth coefficients, the natural Nedelec
interpolation operator can be used. This is formed by calling HYPRE$\_$SStructMaxwellSetConstantCoef
with the flag$>0$ before setting up the solver, otherwise the default edge interpolation is an
operator-collapsing/element-agglomeration scheme. This is suitable for variable coefficients.
Also, before setting up the solver, the user must pass the gradient operator, whether user or
HYPRE$\_$MaxwellGrad generated, with HYPRE$\_$SStructMaxwellSetGrad. After these
preliminary calls, the Maxwell solver can be setup by calling HYPRE$\_$SStructMaxwellSetup.

There are two solver cycling schemes that can be used to solve the linear system. To describe these, one
needs to consider the augmented system operator
\begin{eqnarray}
\bf{A}= \left [
  \begin{array}{ll}
     A_{ee} & A_{en}  \\ \rule{0cm}{.5cm}
     A_{ne} & A_{nn}
  \end{array}
\right ],
\end{eqnarray}
where $A_{ee}$ is the stiffness matrix corresponding to the above curl-curl formulation, $A_{nn}$
is the nodal Poisson operator created by taking the Galerkin product of $A_{ee}$ and the gradient operator,
and $A_{ne}$ and $A_{en}$ are the nodal-edge coupling operators (see \cite{JonesLee_2006}). The algorithm
for this Maxwell solver is based on forming a multigrid hierarchy to this augmented system using the block-diagonal
interpolation operator
\begin{eqnarray*}
\bf{P}= \left[  \begin{array}{ll}
            P_e & 0  \\ \rule{0cm}{.5cm}
            0   & P_n
         \end{array}
\right],
\end{eqnarray*}
where $P_e$ and $P_n$ are respectively the edge and nodal interpolation operators determined individually
from $A_{ee}$ and $A_{nn}.$ Taking a Galerkin product between $\bf{A}$ and $\bf{P}$ produces the next coarse
augmented operator, which also has the nodal-edge coupling operators. Applying this procedure recursively
produces nodal-edge coupling operators at all levels. Now, the first solver cycling scheme,
HYPRE$\_$SStructMaxwellSolve, keeps these coupling operators on all levels of the V-cycle. The second,
cheaper scheme, HYPRE$\_$SStructMaxwellSolve2, keeps the coupling operators only on the finest level, i.e.,
separate edge and nodal V-cycles that couple only on the finest level.

%-----------------------------------------------------------------------------

\section{Hybrid}

The hybrid solver is designed to detect whether a multigrid preconditioner
is needed when solving a linear system and possibly avoid the expensive setup
of a preconditioner if a system can be solved efficiently with a diagonally
scaled Krylov solver, e.g. a strongly diagonally dominant system.
It first uses a diagonally scaled Krylov solver, which can be chosen by the user
(the default is conjugate gradient, but one should use GMRES if the matrix of the
linear system to be solved is nonsymmetric). It monitors how fast the Krylov solver
converges.
If there is not sufficient progress, the algorithm switches to a preconditioned
Krylov solver.

If used through the \code{Struct} interface, the solver is called StructHybrid
and can be used with the preconditioners SMG and PFMG (default).
It is called ParCSRHybrid, if used through the \code{IJ} interface and is used here
with BoomerAMG.
The user can determine the average convergence speed by setting a convergence tolerance $0 \leq \theta
< 1$
via the routine HYPRE$\_$StructHybridSetConvergenceTol or HYPRE$\_$StructParCSRHybridSetConvergenceTol.
The default setting is 0.9.

The average convergence factor $\rho_i = \left({{\| r_i \|} \over {\| r_0 \|}}\right)^{1/i}$ is
monitored within the chosen Krylov solver, where $r_i = b - Ax_{i}$ is the $i$-th residual.
Convergence is considered too slow when
\begin{equation}
\left( 1 - {{|\rho_i - \rho_{i-1}|} \over { \max(\rho_i, \rho_{i-1})}} \right) \rho_i > \theta .
\end{equation}
When this condition is fulfilled the hybrid solver switches from a diagonally scaled
Krylov solver to a preconditioned solver.

%-----------------------------------------------------------------------------

\section{BoomerAMG}

BoomerAMG is a parallel implementation of the algebraic multigrid
method \cite{Ruge_Stueben_1987}.
It can be used
both as a solver or as a preconditioner.  The user can choose between various
different parallel coarsening techniques, interpolation and relaxation schemes.
See
\cite{VEHenson_UMYang_2002,UMYang_2005} for a detailed description of the
coarsening
algorithms, interpolation and relaxation schemes as well as numerical results.

\subsection{Parameter Options}

Various BoomerAMG functions and options are mentioned below. However, for a complete
listing and description of all available functions, see the reference manual.


BoomerAMG's Create function differs from the synopsis in that it has only one parameter
\code{HYPRE_BoomerAMGCreate(HYPRE_Solver *solver)}. It uses the communicator
of the matrix A.

Coarsening can be set by the user using the function \code{HYPRE\_BoomerAMGSetCoarsenType}.
Various coarsening techniques are available:
\begin{itemize}
\item the Cleary-Luby-Jones-Plassman (CLJP) coarsening,
\item the Falgout coarsening which is a combination of CLJP and the
classical RS coarsening algorithm (default),
\item CGC and CGC-E coarsenings \cite{Griebela_1, Griebel_2},
\item PMIS and HMIS coarsening algorithms which lead to coarsenings with lower complexities \cite{DeSterck_Yang_Heys_2004}
as well as
\item aggressive coarsening, which can be applied to any of the coarsening techniques mentioned above and thus achieving much lower complexities and lower memory use \cite{Stueben_1999}.
\end{itemize}
To use aggressive coarsening the user has to set the number of levels to which he wants to apply
aggressive coarsening (starting with the finest level)
via \code{HYPRE_BoomerAMGSetAggNumLevels}. Since aggressive coarsening requires long range
interpolation, multipass interpolation is always used on levels with aggressive coarsening.

Various interpolation techniques can be set using \code{HYPRE_BoomerAMGSetInterpType}:
\begin{itemize}
\item the ``classical'' interpolation as defined in \cite{Ruge_Stueben_1987} (default),
\item direct interpolation \cite{Stueben_1999},
\item standard interpolation \cite{Stueben_1999},
\item multipass interpolation \cite{Stueben_1999},
\item an extended ``classical'' interpolation, which is a long range interpolation and is recommended to be used with PMIS and HMIS coarsening for harder problems,
\item Jacobi interpolation \cite{Stueben_1999},
\item the ``classical'' interpolation modified for hyperbolic PDEs.
\end{itemize}
Jacobi interpolation is only use to improve certain interpolation operators and can be
used with \code{HYPRE_BoomerAMGSetPostInterpType}.
Since some of the interpolation operators might generate large stencils, it is often possible
and recommended to control complexity and truncate the interpolation operators
using \code{HYPRE_BoomerAMGSetTruncFactor} and/or \code{HYPRE_BoomerAMGSetPMaxElmts},
or \code{HYPRE_BoomerAMGSetJacobiTruncTheshold} (for Jacobi interpolation only).

Various relaxation techniques are available:
\begin{itemize}
\item weighted Jacobi relaxation,
\item a hybrid Gauss-Seidel / Jacobi relaxation scheme,
\item a symmetric hybrid Gauss-Seidel / Jacobi relaxation scheme, and
\item hybrid block and Schwarz smoothers \cite{UMYang_2004},
\item ILU and approximate inverse smoothers.
\end{itemize}
Point relaxation schemes can be set using \code{HYPRE_BoomerAMGSetRelaxType} or, if
one wants to specifically set the up cycle, down cycle or the coarsest grid, with
\code{HYPRE_BoomerAMGSetCycleRelaxType}. To use the more complicated smoothers,
e.g. block, Schwarz, ILU smoothers, it is necessary to use \code{HYPRE_BoomerAMGSetSmoothType}
and \code{HYPRE_BoomerAMGSetSmoothNumLevels}. There are further parameter choices for the
individual smoothers, which are described in the reference manual.
The default relaxation type is hybrid Gauss-Seidel with CF-relaxation (relax first the C-,
then the F-points) on the down cycle and FC-relaxation on the up-cycle. Note that if
BoomerAMG is used as a preconditioner for conjugate gradient, it is necessary to use
a symmetric smoother such as weighted Jacobi or hybrid symmetric Gauss-Seidel.


If the users wants to solve systems of PDEs and can provide information on
which variables belong to which function, BoomerAMG's systems AMG version
can also be used. Functions that enable the user to access the systems AMG version
are \code{HYPRE_BoomerAMGSetNumFunctions}, \code{HYPRE_BoomerAMGSetDofFunc}
and \code{HYPRE\_BoomerAMGSetNodal}.

For best performance, it might be necessary to set certain parameters, which will
affect both coarsening and interpolation.
One important parameter is the strong threshold, which can be set
using the function \code{HYPRE_BoomerAMGSetStrongThreshold}.
The default value is 0.25, which appears to be a good choice for 2-dimensional
problems and the low complexity coarsening algorithms.
A better choice for 3-dimensional problems appears to be 0.5, if one uses the
default coarsening algorithm or CLJP. However,
the choice of the strength threshold is problem dependent and therefore
there could be better choices than the two suggested ones.

%-----------------------------------------------------------------------------

\section{AMS}
\label{AMS}

AMS (the Auxiliary-space Maxwell Solver) is a parallel unstructured Maxwell
solver for edge finite element discretizations of the variational problem
\begin{equation} \label{ams-maxwell}
\mbox{Find } {\mathbf u} \in {\mathbf V}_h \>:\qquad
(\alpha\, \nabla \times {\mathbf u},  \nabla \times {\mathbf v}) +
(\beta\, {\mathbf u},  {\mathbf v}) = ({\mathbf f},  {\mathbf v})\,,
\qquad \mbox{for all } {\mathbf v} \in {\mathbf V}_h \,.
\end{equation}
Here ${\mathbf V}_h$ is the lowest order Nedelec (edge) finite element space,
and $\alpha>0$ and $\beta \ge 0$ are scalar, or SPD matrix coefficients.
AMS was designed to be scalable on problems with variable coefficients,
and allows for $\beta$ to be zero in part or the whole domain.
In either  case the resulting problem is only semidefinite, and for solvability
the right-hand side should be chosen to satisfy compatibility conditions.

AMS is based on the auxiliary space methods for definite Maxwell
problems proposed in \cite{xu_H_curl}.
For more details, see \cite{ams_jcm}.

\subsection{Overview}
Let ${\mathbf A}$ and ${\mathbf b}$ be the stiffness matrix and  the
load vector corresponding to (\ref{ams-maxwell}). Then the resulting
linear system of interest reads,
\begin{equation} \label{ams-maxwell-ls}
{\mathbf A}\, {\mathbf x} = {\mathbf b} \,.
\end{equation}
The coefficients $\alpha$ and $\beta$ are naturally associated with
the ``stiffness'' and ``mass'' terms of ${\mathbf A}$.
Besides ${\mathbf A}$ and ${\mathbf b}$, AMS requires the following
additional user input:
\begin{enumerate}
\item The discrete gradient matrix $G$ representing the edges of
the mesh in terms of its vertices. $G$ has as many rows as the number
of edges in the mesh, with each row having two nonzero entries:
$+1$ and $-1$ in the columns corresponding to the vertices composing
the edge. The sign is determined based on the orientation of the edge.
We require that $G$ includes all (interior and boundary) edges
and vertices.

\item The representations of the constant vector fields $(1,0,0)$,$(0,1,0)$ and
$(0,0,1)$ in the ${\mathbf V}_h$ basis, given as three vectors: $G_x$, $G_y$, and $G_z$.
Note that since no boundary conditions are imposed on $G$, the above vectors
can be computed as $G_x = G x$, $G_y = G y$ and $G_z = G z$, where
$x$, $y$, and $z$ are vectors representing the coordinates of the vertices of the mesh.
\end{enumerate}

In addition to the above quantities, AMS can utilize
the following (optional) information:
\begin{enumerate}
\item[(3.)] The Poisson matrices $A_\alpha$ and $A_\beta$, corresponding to
assembling of the forms $(\alpha\, \nabla u, \nabla v)$
and $(\beta\, \nabla u, \nabla v)$ using standard linear finite elements on
the same mesh.
\end{enumerate}

\noindent
Internally, AMS proceeds with the construction of the following additional objects:
\begin{itemize}
\item $A_G$ -- a matrix associated with the mass term which is either $G^{\,T} {\mathbf A} G$, or the Poisson matrix $A_\beta$ (if given).
\item ${\mathbf \Pi}$ -- the matrix representation of the interpolation operator
from vector linear to edge finite elements.
\item ${\mathbf A}_{{\mathbf \Pi}}$ -- a matrix associated with the stiffness term which is either  ${\mathbf \Pi}^{\,T} {\mathbf A} {\mathbf \Pi}$ or a block-diagonal matrix with diagonal blocks $A_\alpha$ (if given).
\item $B_G$ and ${\mathbf B}_{{\mathbf \Pi}}$ -- efficient (AMG) solvers for $A_G$ and ${\mathbf A}_{{\mathbf \Pi}}$.
\end{itemize}
The solution procedure then is a three-level method using smoothing in
the original edge space and subspace corrections based on $B_G$ and ${\mathbf B}_{{\mathbf \Pi}}$.
We can employ a number of options here utilizing various combinations of the smoother
and solvers in additive or multiplicative fashion.
If $\beta$ is identically zero one can skip the subspace correction associated
with $G$, in which case the solver is a two-level method.

\subsection{Sample Usage}
AMS can be used either as a solver or as a preconditioner.
Below we list the sequence of \hypre{} calls
needed to create and use it as a solver.
See example code \file{ex15.c} for a complete implementation.
We start with the allocation of the \code{HYPRE_Solver} object:
\begin{display}\begin{verbatim}
HYPRE_Solver solver;
HYPRE_AMSCreate(&solver);
\end{verbatim}\end{display}

Next, we set a number of solver parameters. Some of them are
optional, while others are necessary in order to perform the
solver setup.

AMS offers the option to set the space dimension.
By default we consider the dimension to be $3$. The only
other option is $2$, and it can be set with the function given below.
We note that a 3D solver will still work for a 2D problem,
but it will be slower and will require more memory than necessary.
\begin{display}\begin{verbatim}
HYPRE_AMSSetDimension(solver, dim);
\end{verbatim}\end{display}

The user is required to provide the discrete gradient matrix $G$.
AMS expects a matrix defined on the whole mesh with no
boundary edges/nodes excluded. It is essential to {\bf not} impose any boundary
conditions on $G$.
Regardless of which \hypre{} conceptual interface was used to construct $G$,
one can obtain a ParCSR version of it. This is the expected
format in the following function.
\begin{display}\begin{verbatim}
HYPRE_AMSSetDiscreteGradient(solver, G);
\end{verbatim}\end{display}

In addition to $G$, we need one additional piece of information in order
to construct the solver.
The user has the option to choose either the coordinates of the vertices
in the mesh or the representations of the constant vector fields
in the edge element basis.
In both cases three \hypre{} parallel vectors should be provided.
For 2D problems, the user can set the third vector to NULL.
The corresponding function calls read:
\begin{display}\begin{verbatim}
HYPRE_AMSSetCoordinateVectors(solver,x,y,z);
\end{verbatim}\end{display}
or
\begin{display}\begin{verbatim}
HYPRE_AMSSetEdgeConstantVectors(solver,
                                one_zero_zero,
                                zero_one_zero,
                                zero_zero_one);
\end{verbatim}\end{display}
The vectors \code{one_zero_zero}, \code{zero_one_zero} and \code{zero_zero_one}
above correspond to the constant vector fields $(1,0,0)$, $(0,1,0)$ and $(0,0,1)$.

The remaining solver parameters are optional.
For example, the user can choose a different cycle type by calling
\begin{display}\begin{verbatim}
HYPRE_AMSSetCycleType(solver, cycle_type); /* default value: 1 */
\end{verbatim}\end{display}

\noindent
The available cycle types in AMS are:
\begin{itemize}
\item \code{cycle_type=1}: multiplicative solver $(01210)$
\item \code{cycle_type=2}: additive solver $(0+1+2)$
\item \code{cycle_type=3}: multiplicative solver $(02120)$
\item \code{cycle_type=4}: additive solver $(010+2)$
\item \code{cycle_type=5}: multiplicative solver $(0102010)$
\item \code{cycle_type=6}: additive solver $(1+020)$
\item \code{cycle_type=7}: multiplicative solver $(0201020)$
\item \code{cycle_type=8}: additive solver $(0(1+2)0)$
\item \code{cycle_type=11}: multiplicative solver $(013454310)$
\item \code{cycle_type=12}: additive solver $(0+1+3+4+5)$
\item \code{cycle_type=13}: multiplicative solver $(034515430)$
\item \code{cycle_type=14}: additive solver $(01(3+4+5)10)$
\end{itemize}
Here we use the following convention for the
three subspace correction methods:
$0$ refers to smoothing, $1$ stands for BoomerAMG based on $B_G$, and
$2$ refers to a call to BoomerAMG for ${\mathbf B}_{{\mathbf \Pi}}$.
The values $3$, $4$ and $5$ refer to the scalar subspaces
corresponding to the $x$, $y$ and $z$ components of $\mathbf \Pi$.

The abbreviation $xyyz$  for $x,y,z \in \{0,1,2,3,4,5\}$
refers to a multiplicative subspace correction based on solvers $x$, $y$, $y$, and $z$ (in that order).
The abbreviation $x+y+z$ stands for an additive subspace correction method
based on $x$, $y$ and $z$ solvers.
The additive cycles are meant to be used only when AMS is called
as a preconditioner.
In our experience the choices \code{cycle_type=1,5,8,11,13} often produced
fastest solution times, while \code{cycle_type=7} resulted in smallest
number of iterations.

Additional solver parameters, such as the maximum number of iterations,
the convergence tolerance and the output level, can be set with
\begin{display}\begin{verbatim}
HYPRE_AMSSetMaxIter(solver, maxit);     /* default value: 20 */
HYPRE_AMSSetTol(solver, tol);           /* default value: 1e-6 */
HYPRE_AMSSetPrintLevel(solver, print);  /* default value: 1 */
\end{verbatim}\end{display}

More advanced parameters, affecting the smoothing and the
internal AMG solvers, can be set with the following three
functions:
\begin{display}\begin{verbatim}
HYPRE_AMSSetSmoothingOptions(solver, 2, 1, 1.0, 1.0);
HYPRE_AMSSetAlphaAMGOptions(solver, 10, 1, 3, 0.25, 0, 0);
HYPRE_AMSSetBetaAMGOptions(solver, 10, 1, 3, 0.25, 0, 0);
\end{verbatim}\end{display}

For (singular) problems where $\beta = 0$ in the whole domain,
different (in fact simpler) version of the AMS solver is offered.
To allow for this simplification, use the following \hypre{} call
\begin{display}\begin{verbatim}
HYPRE_AMSSetBetaPoissonMatrix(solver, NULL);
\end{verbatim}\end{display}

If $\beta$ is zero only in parts of the domain, the problem is still singular,
but the AMS solver will try to detect this and construct a non-singular
preconditioner. Though this often works well in practice, AMS also provides a
more robust version for solving such singular problems to very low convergence
tolerances. This version takes advantage of additional information: the list of
nodes which are interior to a zero-conductivity region provided by the function
\begin{display}\begin{verbatim}
HYPRE_AMSSetInteriorNodes(solver, HYPRE_ParVector interior_nodes);
\end{verbatim}\end{display}
Based on the \code{interior_nodes} variable, a restricted discrete gradient
operator $G_0$ is constructed, and AMS is then defined based on the matrix
${\mathbf A}+\delta G_0^TG_0$ which is non-singular, and a small $\delta>0$ perturbation of
${\mathbf A}$. When iterating with this preconditioner, it is advantageous to project on
the compatible subspace $Ker(G_0^T)$. This can be done periodically, or manually
through the functions
\begin{display}\begin{verbatim}
HYPRE_AMSSetProjectionFrequency(solver, int projection_frequency);
HYPRE_AMSProjectOutGradients(solver, HYPRE_ParVector x);
\end{verbatim}\end{display}

Two additional matrices are constructed in the setup of the
AMS method---one corresponding to the coefficient $\alpha$
and another corresponding to $\beta$.
This may  lead to prohibitively high memory requirements, and
the next two function calls may help to save some memory.
For example, if the Poisson matrix with coefficient $\beta$
(denoted by \code{Abeta}) is
available then one can avoid one matrix construction by calling
\begin{display}\begin{verbatim}
HYPRE_AMSSetBetaPoissonMatrix(solver, Abeta);
\end{verbatim}\end{display}
Similarly, if the Poisson matrix with coefficient $\alpha$ is available
(denoted by \code{Aalpha})
the second matrix construction can also be avoided by calling
\begin{display}\begin{verbatim}
HYPRE_AMSSetAlphaPoissonMatrix(solver, Aalpha);
\end{verbatim}\end{display}
Note the following regarding these functions:
\begin{itemize}
\item Both of them change their input. More specifically,
the diagonal entries of the input matrix corresponding to eliminated
degrees of freedom (due to essential boundary conditions)
are penalized.
\item It is assumed that their essential boundary conditions of
${\mathbf A}$, \code{Abeta} and \code{Aalpha} are on the same
part of the boundary.
\item \code{HYPRE_AMSSetAlphaPoissonMatrix} forces the
AMS method to use a simpler, but weaker (in terms of convergence) method.
With this option, the multiplicative AMS cycle
is not guaranteed to converge with the
default parameters. The reason for this is the fact the solver is not
variationally obtained from the original matrix (it utilizes
the auxiliary Poisson--like matrices \code{Abeta} and \code{Aalpha}).
Therefore, it is recommended in this case to use AMS as preconditioner
only.
\end{itemize}

After the above calls, the solver is ready to be constructed.
The user has to provide the stiffness matrix ${\mathbf A}$ (in ParCSR format) and
the \hypre{} parallel vectors ${\mathbf b}$ and ${\mathbf x}$. (The vectors
are actually not used in the current AMS setup.) The setup call reads,
\begin{display}\begin{verbatim}
HYPRE_AMSSetup(solver, A, b, x);
\end{verbatim}\end{display}
It is important to note the order of the calling sequence. For example, do {\bf not}
call \code{HYPRE_AMSSetup} before calling
\code{HYPRE_AMSSetDiscreteGradient}
and one of the functions
\code{HYPRE_AMSSetCoordinateVectors} or \code{HYPRE_AMSSetEdgeConstantVectors}.

\noindent
Once the setup has completed, we can solve the linear system by calling
\begin{display}\begin{verbatim}
HYPRE_AMSSolve(solver, A, b, x);
\end{verbatim}\end{display}

\noindent
Finally, the solver can be destroyed with
\begin{display}\begin{verbatim}
HYPRE_AMSDestroy(&solver);
\end{verbatim}\end{display}

\noindent
More details can be found in the files \file{ams.h} and \file{ams.c}
located in the \file{parcsr_ls} directory.

\subsection{High-order Discretizations}
In addition to the interface for the lowest-order Nedelec elements described in
the previous subsections, AMS also provides support for (arbitrary) high-order
Nedelec element discretizations. Since the robustness of AMS depends on the
performance of BoomerAMG on the associated (high-order) auxiliary subspace
problems, we note that the convergence may not be optimal for large polynomial
degrees $k \geq 1$.

In the high-order AMS interface, the user does not need to provide the
coordinates of the vertices (or the representations of the constant vector
fields in the edge basis), but instead should construct and pass the Nedelec
interpolation matrix ${\mathbf \Pi}$ which maps (high-order) vector nodal finite
elements into the (high-order) Nedelec space. In other words, ${\mathbf \Pi}$ is
the (parallel) matrix representation of the interpolation mapping from
$\mathrm{P}_k^3$/$\mathrm{Q}_k^3$ into $\mathrm{ND}_k$, see \cite{xu_H_curl,
  ams_jcm}.  We require this matrix as an input, since in the high-order case
its entries very much depend on the particular choice of the basis functions in
the edge and nodal spaces, as well as on the geometry of the mesh elements. The
columns of ${\mathbf \Pi}$ should use a node-based numbering, where the
$x$/$y$/$z$ components of the first node (vertex or high-order degree of
freedom) should be listed first, followed by the $x$/$y$/$z$ components of the
second node and so on (see the documentation of \code{HYPRE_BoomerAMGSetDofFunc}).

Similarly to the Nedelec interpolation, the discrete gradient matrix $G$ should
correspond to the mapping $\varphi \in \mathrm{P}_k^3$/$\mathrm{Q}_k^3 \mapsto
\nabla \varphi \in \mathrm{ND}_k$, so even though its values are still
independent of the mesh coordinates, they will not be $\pm 1$, but will be
determined by the particular form of the high-order basis functions and degrees
of freedom.

With these matrices, the high-order setup procedure is simply
\begin{display}\begin{verbatim}
HYPRE_AMSSetDimension(solver, dim);
HYPRE_AMSSetDiscreteGradient(solver, G);
HYPRE_AMSSetInterpolations(solver, Pi, NULL, NULL, NULL);
\end{verbatim}\end{display}
We remark that the above interface calls can also be used in the lowest-order
case (or even other types of discretizations such as those based on the second
family of Nedelec elements), but we recommend calling the previously described
\code{HYPRE_AMSSetCoordinateVectors} instead, since this allows AMS to handle the
construction and use of ${\mathbf \Pi}$ internally.


Specifying the monolithic ${\mathbf \Pi}$ limits the AMS cycle type options to
those less than 10. Alternatively one can separately specify the $x$, $y$ and
$z$ components of $\mathbf \Pi$:
\begin{display}\begin{verbatim}
HYPRE_AMSSetInterpolations(solver, NULL, Pix, Piy, Piz);
\end{verbatim}\end{display}
which enables the use of AMS cycle types with index greater than 10. By
definition, ${\mathbf \Pi}^x \varphi = {\mathbf \Pi} (\varphi,0,0)$, and
similarly for ${\mathbf \Pi}^y$ and ${\mathbf \Pi}^z$. Each of these matrices
has the same sparsity pattern as $G$, but their entries depend on the
coordinates of the mesh vertices.

Finally, both ${\mathbf \Pi}$ and its components can be passed to the solver:
\begin{display}\begin{verbatim}
HYPRE_AMSSetInterpolations(solver, Pi, Pix, Piy, Piz);
\end{verbatim}\end{display}
which will duplicate some memory, but allows for experimentation with all
available AMS cycle types.

%-----------------------------------------------------------------------------

\section{ADS}
\label{ADS}

The Auxiliary-space Divergence Solver (ADS) is a parallel unstructured solver
similar to AMS, but targeting $H(div)$ instead of $H(curl)$ problems. Its usage
and options are very similar to those of AMS, and in general the relationship
between ADS and AMS is analogous to that between AMS and AMG.

Specifically ADS was designed for the scalable solution of linear systems
arising in the finite element discretization of the variational problem
\begin{equation} \label{ads-hdiv}
\mbox{Find } {\mathbf u} \in {\mathbf W}_h \>:\qquad
(\alpha\, \nabla \cdot {\mathbf u},  \nabla \cdot {\mathbf v}) +
(\beta\, {\mathbf u},  {\mathbf v}) = ({\mathbf f},  {\mathbf v})\,,
\qquad \mbox{for all } {\mathbf v} \in {\mathbf W}_h \,,
\end{equation}
where ${\mathbf W}_h$ is the lowest order Raviart-Thomas (face) finite element
space, and $\alpha>0$ and $\beta>0$ are scalar, or SPD matrix variable
coefficients.  It is based on the auxiliary space methods for $H(div)$ problems
proposed in \cite{xu_H_curl}.
% For more details, see \cite{h_curl_amg_report,ams_report}.

\subsection{Overview}
Let ${\mathbf A}$ and ${\mathbf b}$ be the stiffness matrix and  the
load vector corresponding to (\ref{ads-hdiv}). Then the resulting
linear system of interest reads,
\begin{equation} \label{ads-hdiv-ls}
{\mathbf A}\, {\mathbf x} = {\mathbf b} \,.
\end{equation}
The coefficients $\alpha$ and $\beta$ are naturally associated with
the ``stiffness'' and ``mass'' terms of ${\mathbf A}$.
Besides ${\mathbf A}$ and ${\mathbf b}$, ADS requires the following
additional user input:
\begin{enumerate}
\item The discrete curl matrix $C$ representing the faces of the mesh in terms
  of its edges. $C$ has as many rows as the number of faces in the mesh, with
  each row having nonzero entries $+1$ and $-1$ in the columns corresponding to
  the edges composing the face. The sign is determined based on the orientation
  of the edges relative to the face.  We require that $C$ includes all (interior
  and boundary) faces and edges.

\item The discrete gradient matrix $G$ representing the edges of the mesh in
  terms of its vertices. $G$ has as many rows as the number of edges in the
  mesh, with each row having two nonzero entries: $+1$ and $-1$ in the columns
  corresponding to the vertices composing the edge. The sign is determined based
  on the orientation of the edge.  We require that $G$ includes all (interior
  and boundary) edges and vertices.

\item Vectors $x$, $y$, and $z$ representing the coordinates of the vertices of
  the mesh.
\end{enumerate}

\noindent
Internally, ADS proceeds with the construction of the following additional objects:
\begin{itemize}
\item $A_C$ -- the curl-curl matrix $C^{\,T} {\mathbf A} C$.
\item ${\mathbf \Pi}$ -- the matrix representation of the interpolation operator
from vector linear to face finite elements.
\item ${\mathbf A}_{{\mathbf \Pi}}$ -- the vector nodal matrix ${\mathbf \Pi}^{\,T} {\mathbf A} {\mathbf \Pi}$.
\item $B_C$ and ${\mathbf B}_{{\mathbf \Pi}}$ -- efficient (AMS/AMG) solvers for $A_C$ and ${\mathbf A}_{{\mathbf \Pi}}$.
\end{itemize}
The solution procedure then is a three-level method using smoothing in the
original face space and subspace corrections based on $B_C$ and ${\mathbf
  B}_{{\mathbf \Pi}}$.  We can employ a number of options here utilizing various
combinations of the smoother and solvers in additive or multiplicative fashion.

\subsection{Sample Usage}
ADS can be used either as a solver or as a preconditioner.  Below we list the
sequence of \hypre{} calls needed to create and use it as a solver. We start
with the allocation of the \code{HYPRE_Solver} object:
\begin{display}\begin{verbatim}
HYPRE_Solver solver;
HYPRE_ADSCreate(&solver);
\end{verbatim}\end{display}

Next, we set a number of solver parameters. Some of them are optional, while
 others are necessary in order to perform the solver setup.

The user is required to provide the discrete curl and gradient matrices $C$ and
$G$.  ADS expects a matrix defined on the whole mesh with no boundary
faces, edges or nodes excluded. It is essential to {\bf not} impose any boundary
conditions on $C$ or $G$.  Regardless of which \hypre{} conceptual interface
was used to construct the matrices, one can always obtain a ParCSR version of
them. This is the expected format in the following functions.
\begin{display}\begin{verbatim}
HYPRE_ADSSetDiscreteCurl(solver, C);
HYPRE_ADSSetDiscreteGradient(solver, G);
\end{verbatim}\end{display}

Next, ADS requires the coordinates of the vertices in the mesh as three
\hypre{} parallel vectors.  The corresponding function call reads:
\begin{display}\begin{verbatim}
HYPRE_ADSSetCoordinateVectors(solver, x, y, z);
\end{verbatim}\end{display}


The remaining solver parameters are optional.  For example, the user can choose
a different cycle type by calling
\begin{display}\begin{verbatim}
HYPRE_ADSSetCycleType(solver, cycle_type); /* default value: 1 */
\end{verbatim}\end{display}

\noindent
The available cycle types in ADS are:
\begin{itemize}
\item \code{cycle_type=1}: multiplicative solver $(01210)$
\item \code{cycle_type=2}: additive solver $(0+1+2)$
\item \code{cycle_type=3}: multiplicative solver $(02120)$
\item \code{cycle_type=4}: additive solver $(010+2)$
\item \code{cycle_type=5}: multiplicative solver $(0102010)$
\item \code{cycle_type=6}: additive solver $(1+020)$
\item \code{cycle_type=7}: multiplicative solver $(0201020)$
\item \code{cycle_type=8}: additive solver $(0(1+2)0)$
\item \code{cycle_type=11}: multiplicative solver $(013454310)$
\item \code{cycle_type=12}: additive solver $(0+1+3+4+5)$
\item \code{cycle_type=13}: multiplicative solver $(034515430)$
\item \code{cycle_type=14}: additive solver $(01(3+4+5)10)$
\end{itemize}
Here we use the following convention for the three subspace correction methods:
$0$ refers to smoothing, $1$ stands for AMS based on $B_C$, and $2$ refers to a
call to BoomerAMG for ${\mathbf B}_{{\mathbf \Pi}}$.  The values $3$, $4$ and
$5$ refer to the scalar subspaces corresponding to the $x$, $y$ and $z$
components of $\mathbf \Pi$.

The abbreviation $xyyz$ for $x,y,z \in \{0,1,2,3,4,5\}$ refers to a
multiplicative subspace correction based on solvers $x$, $y$, $y$, and $z$ (in
that order).  The abbreviation $x+y+z$ stands for an additive subspace
correction method based on $x$, $y$ and $z$ solvers.  The additive cycles are
meant to be used only when ADS is called as a preconditioner.  In our experience
the choices \code{cycle_type=1,5,8,11,13} often produced fastest solution times,
while \code{cycle_type=7} resulted in smallest number of iterations.

Additional solver parameters, such as the maximum number of iterations, the
convergence tolerance and the output level, can be set with
\begin{display}\begin{verbatim}
HYPRE_ADSSetMaxIter(solver, maxit);     /* default value: 20 */
HYPRE_ADSSetTol(solver, tol);           /* default value: 1e-6 */
HYPRE_ADSSetPrintLevel(solver, print);  /* default value: 1 */
\end{verbatim}\end{display}

More advanced parameters, affecting the smoothing and the internal AMS and AMG
 solvers, can be set with the following three functions:
\begin{display}\begin{verbatim}
HYPRE_ADSSetSmoothingOptions(solver, 2, 1, 1.0, 1.0);
HYPRE_ADSSetAMSOptions(solver, 11, 10, 1, 3, 0.25, 0, 0);
HYPRE_ADSSetAMGOptions(solver, 10, 1, 3, 0.25, 0, 0);
\end{verbatim}\end{display}
We note that the AMS cycle type, which is the second parameter of
\code{HYPRE_ADSSetAMSOptions} should be greater than 10, unless the high-order
interface of \code{HYPRE_ADSSetInterpolations} described in the next subsection
is being used.

After the above calls, the solver is ready to be constructed.  The user has to
provide the stiffness matrix ${\mathbf A}$ (in ParCSR format) and the \hypre{}
parallel vectors ${\mathbf b}$ and ${\mathbf x}$. (The vectors are actually not
used in the current ADS setup.) The setup call reads,
\begin{display}\begin{verbatim}
HYPRE_ADSSetup(solver, A, b, x);
\end{verbatim}\end{display}
It is important to note the order of the calling sequence. For example, do {\bf not}
call \code{HYPRE_ADSSetup} before calling each of the functions
\code{HYPRE_ADSSetDiscreteCurl}, \code{HYPRE_ADSSetDiscreteGradient}
and \code{HYPRE_ADSSetCoordinateVectors}.

\noindent
Once the setup has completed, we can solve the linear system by calling
\begin{display}\begin{verbatim}
HYPRE_ADSSolve(solver, A, b, x);
\end{verbatim}\end{display}

\noindent
Finally, the solver can be destroyed with
\begin{display}\begin{verbatim}
HYPRE_ADSDestroy(&solver);
\end{verbatim}\end{display}

\noindent
More details can be found in the files \file{ads.h} and \file{ads.c}
located in the \file{parcsr_ls} directory.

\subsection{High-order Discretizations}
Similarly to AMS, ADS also provides support for (arbitrary) high-order $H(div)$
discretizations. Since the robustness of ADS depends on the performance of AMS
and BoomerAMG on the associated (high-order) auxiliary subspace problems, we
note that the convergence may not be optimal for large polynomial degrees $k
\geq 1$.

In the high-order ADS interface, the user does not need to provide the
coordinates of the vertices, but instead should construct and pass the
Raviart-Thomas and Nedelec interpolation matrices ${\mathbf \Pi}_{RT}$ and
${\mathbf \Pi}_{ND}$ which map (high-order) vector nodal finite elements into
the (high-order) Raviart-Thomas and Nedelec space. In other words, these are the
(parallel) matrix representation of the interpolation mappings from
$\mathrm{P}_k^3$/$\mathrm{Q}_k^3$ into $\mathrm{RT}_{k-1}$ and $\mathrm{ND}_k$,
see \cite{xu_H_curl, ams_jcm}.  We require these matrices as inputs, since in
the high-order case their entries very much depend on the particular choice of
the basis functions in the finite element spaces, as well as on the geometry of
the mesh elements. The columns of the ${\mathbf \Pi}$ matrices should use a
node-based numbering, where the $x$/$y$/$z$ components of the first node (vertex
or high-order degree of freedom) should be listed first, followed by the
$x$/$y$/$z$ components of the second node and so on (see the documentation of
\code{HYPRE_BoomerAMGSetDofFunc}). Furthermore, each interpolation matrix can be
split into $x$, $y$ and $z$ components by defining ${\mathbf \Pi}^x \varphi =
{\mathbf \Pi} (\varphi,0,0)$, and similarly for ${\mathbf \Pi}^y$ and ${\mathbf
  \Pi}^z$.

The discrete gradient and curl matrices $G$ and $C$ should correspond to the
mappings $\varphi \in \mathrm{P}_k^3$/$\mathrm{Q}_k^3 \mapsto \nabla \varphi \in
\mathrm{ND}_k$ and ${\mathbf u} \in \mathrm{ND}_k \mapsto \nabla \times {\mathbf
  u} \in \mathrm{RT}_{k-1}$, so even though their values are still independent
of the mesh coordinates, they will not be $\pm 1$, but will be determined by the
particular form of the high-order basis functions and degrees of freedom.

With these matrices, the high-order setup procedure is simply
\begin{display}\begin{verbatim}
HYPRE_ADSSetDiscreteCurl(solver, C);
HYPRE_ADSSetDiscreteGradient(solver, G);
HYPRE_ADSSetInterpolations(solver, RT_Pi, NULL, NULL, NULL,
                                   ND_Pi, NULL, NULL, NULL);
\end{verbatim}\end{display}
We remark that the above interface calls can also be used in the lowest-order
case (or even other types of discretizations), but we recommend calling the
previously described \code{HYPRE_ADSSetCoordinateVectors} instead, since this
allows ADS to handle the construction and use of the interpolations internally.


Specifying the monolithic ${\mathbf \Pi}_{RT}$ limits the ADS cycle type options
 to those less than 10. Alternatively one can separately specify the $x$, $y$
 and $z$ components of ${\mathbf \Pi}_{RT}$.
\begin{display}\begin{verbatim}
HYPRE_ADSSetInterpolations(solver, NULL, RT_Pix, RT_Piy, RT_Piz,
                                   ND_Pi, NULL, NULL, NULL);
\end{verbatim}\end{display}
which enables the use of ADS cycle types with index greater than 10. The same holds for
${\mathbf \Pi}_{ND}$ and its components, e.g. to enable the subspace AMS cycle type greater
then 10 we need to call
\begin{display}\begin{verbatim}
HYPRE_ADSSetInterpolations(solver, NULL, RT_Pix, RT_Piy, RT_Piz,
                                   NULL, ND_Pix, ND_Piy, ND_Piz);
\end{verbatim}\end{display}
Finally, both ${\mathbf \Pi}$ and their components can be passed to the solver:
\begin{display}\begin{verbatim}
HYPRE_ADSSetInterpolations(solver, RT_Pi, RT_Pix, RT_Piy, RT_Piz
                                   ND_Pi, ND_Pix, ND_Piy, ND_Piz);
\end{verbatim}\end{display}
which will duplicate some memory, but allows for experimentation with all
available ADS and AMS cycle types.

%-----------------------------------------------------------------------------

\section{The MLI Package}

MLI is an object-oriented module that implements the class of algebraic
multigrid algorithms based on Vanek and Brezina's smoothed aggregation
method \cite{VaMB96, VaBM01}.  There are two main algorithms in this module -
the original
smoothed aggregation algorithm and the modified version that uses
the finite element substructure matrices to construct the prolongation
operators.  As such, the later algorithm can only be used in the
finite element context via the finite element interface.  In addition,
the nodal coordinates obtained via the finite element interface can be
used to construct a better prolongation operator than the pure
translation modes.

Below is an example on how to set up MLI as a preconditioner
for conjugate gradient.

\begin{display}\begin{verbatim}
HYPRE_LSI_MLICreate(MPI_COMM_WORLD, &pcg_precond);

HYPRE_LSI_MLISetParams(pcg_precond, "MLI strengthThreshold 0.08");
...

HYPRE_PCGSetPrecond(pcg_solver,
		    (HYPRE_PtrToSolverFcn) HYPRE_LSI_MLISolve,
		    (HYPRE_PtrToSolverFcn) HYPRE_LSI_MLISetup,
		    pcg_precond);
\end{verbatim}\end{display}

\noindent
Note that parameters are set via \code{HYPRE_LSI_MLISetParams}. A list of
valid parameters that can be set using this routine can be found in the
FEI section of the reference manual.

%Details of how to set this preconditioners up can be
%found in the next section.


%-----------------------------------------------------------------------------

\section{ParaSails}

ParaSails is a parallel implementation of a sparse approximate inverse
preconditioner, using {\em a priori} sparsity patterns and least-squares
(Frobenius norm) minimization.  Symmetric positive definite (SPD) problems
are handled using a factored SPD sparse approximate inverse.  General
(nonsymmetric and/or indefinite) problems are handled with an
unfactored sparse approximate inverse.  It is also possible to
precondition nonsymmetric but definite matrices with a factored, SPD
preconditioner.

ParaSails uses {\em a priori} sparsity patterns that are patterns of powers
of sparsified matrices.  ParaSails also uses a post-filtering technique
to reduce the cost of applying the preconditioner.
In advanced usage not described here, the pattern of the
preconditioner can also be reused to generate preconditioners for different
matrices in a sequence of linear solves.

For more details about the ParaSails algorithm, see \cite{EChow_2000}.

%-----------------------------

\subsection{Parameter Settings}

The accuracy and cost of ParaSails are parametrized by the real {\em thresh}
and integer {\em nlevels} parameters,
$0 \le {\it thresh} \le 1$, $0 \le {\it nlevels}$.
Lower values of {\em thresh}
and higher values of {\em nlevels} lead to more accurate, but more expensive
preconditioners.  More accurate preconditioners are also more expensive
per iteration.  The default values are ${\it thresh} = 0.1$
and ${\it nlevels} = 1$.  The parameters are set using
\code{HYPRE_ParaSailsSetParams}.

Mathematically, given a symmetric matrix $A$, the pattern of the
approximate inverse is the pattern of $\tilde{A}^m$ where $\tilde{A}$
is a matrix that has been sparsified from $A$.  The sparsification
is performed by dropping all entries in a symmetrically diagonally scaled $A$
whose values are less than {\em thresh} in magnitude.  The parameter
{\em nlevel} is equivalent to $m+1$.
Filtering is a post-thresholding procedure.
For more details about the algorithm, see \cite{EChow_2000}.

The storage required for the ParaSails preconditioner depends on
the parameters {\em thresh} and {\em nlevels}.  The default parameters
often produce a preconditioner that can be stored in less than the
space required to store the original matrix.
ParaSails does not need a large amount of intermediate storage in
order to construct the preconditioner.

%-----------------------------

ParaSail's Create function differs from the synopsis in the following way:

\begin{display}
\begin{verbatim}
int HYPRE_ParaSailsCreate(MPI_Comm comm, HYPRE_Solver *solver,
  int symmetry);
\end{verbatim}
\end{display}
where \code{comm} is the MPI communicator.

The value of \code{symmetry} has the following meanings, to indicate
the symmetry and definiteness of the problem, and to specify the
type of preconditioner to construct:
\begin{center}
\begin{tabular}{|c|l|} \hline
value & meaning \\ \hline
0 & nonsymmetric and/or indefinite problem, and nonsymmetric preconditioner \\
1 & SPD problem, and SPD (factored) preconditioner \\
2 & nonsymmetric, definite problem, and SPD (factored) preconditioner \\
\hline
\end{tabular}
\end{center}
For more information about the final case, see section \ref{nearly}.

Parameters for setting up the preconditioner are specified using
\begin{display}
\begin{verbatim}
int HYPRE_ParaSailsSetParams(HYPRE_Solver solver,
  double thresh, int nlevel, double filter);
\end{verbatim}
\end{display}

The parameters are used to specify the sparsity pattern and filtering value
(see above), and are described with suggested values as follows:

\begin{center}
\begin{tabular}{|c|c|c|c|c|l|} \hline
parameter    & type    & range                & sug. values  & default & meaning \\ \hline
{\tt nlevel} & integer & ${\tt nlevel} \ge 0$ & 0, 1, 2      & 1   & $m={\tt nlevel}+1$\\
\hline
{\tt thresh} & real    & ${\tt thresh} \ge 0$ & 0, 0.1, 0.01 & 0.1 & {\em thresh} $=$ {\tt thresh}\\
             &         & ${\tt thresh}  <  0$ & -0.75, -0.90 &     & {\em thresh} selected automatically\\
\hline
{\tt filter} & real    & ${\tt filter} \ge 0$ & 0, 0.05, 0.001 & 0.05 & filter value $=$ {\tt filter}\\
             &         & ${\tt filter}  <  0$ & -0.90        &     & filter value selected automatically\\
\hline
\end{tabular}
\end{center}

When ${\tt thresh} < 0$, then a threshold is selected such that
$-{\tt thresh}$ represents the fraction of the nonzero elements
that are dropped.  For example, if ${\tt thresh} = -0.9$ then
$\tilde{A}$ will contain approximately ten percent of the nonzeros
in $A$.

When ${\tt filter} < 0$, then a filter value is selected such that
$-{\tt filter}$ represents the fraction of the nonzero elements
that are dropped.  For example, if ${\tt filter} = -0.9$ then
approximately 90 percent of the entries in the computed approximate
inverse are dropped.

%-----------------------------

\subsection{Preconditioning Nearly Symmetric Matrices}
\label{nearly}

A nonsymmetric, but definite and nearly symmetric matrix $A$
may be preconditioned
with a symmetric preconditioner $M$.  Using a symmetric preconditioner
has a few advantages, such as guaranteeing positive
definiteness of the preconditioner, as well as being less expensive
to construct.

The nonsymmetric matrix $A$ must be definite,
i.e., $(A+A^T)/2$ is SPD, and the {\em a priori} sparsity pattern to be used
must be symmetric.  The latter may be guaranteed by 1)
constructing the sparsity pattern with a symmetric matrix, or 2) if the
matrix is structurally symmetric (has symmetric pattern), then
thresholding to construct the pattern is not used (i.e.,
zero value of the {\tt thresh} parameter is used).

%-----------------------------------------------------------------------------

\section{Euclid}

The Euclid library is a scalable implementation of the Parallel ILU algorithm
that was presented at SC99~\cite{DHysom_APothen_1999}, and published in
expanded form in the SIAM Journal on Scientific
Computing~\cite{DHysom_APothen_2001}.  By {\em scalable} we mean that the
factorization (setup) and application (triangular solve) timings remain nearly
constant when the global problem size is scaled in proportion to the number of
processors.  As with all ILU preconditioning methods, the number of iterations
is expected to increase with global problem size.

Experimental results have shown that PILU preconditioning is in general
more effective than Block Jacobi preconditioning
for minimizing total solution time.
For scaled problems, the relative advantage appears to increase
as the number of processors is scaled upwards.
Euclid may also be used to good advantage as a smoother within
multigrid methods.

%-----------------------------

\subsection{Overview}

Euclid is best thought of as an ``extensible ILU preconditioning
framework.''
{\em Extensible} means that Euclid can (and eventually will, time and
contributing agencies permitting) support many variants of ILU($k$)
and ILUT preconditioning.
(The current release includes Block Jacobi ILU($k$) and
Parallel ILU($k$) methods.)
Due to this extensibility, and also because Euclid was developed
independently of the \hypre{} project, the methods by which one
passes runtime parameters to Euclid preconditioners
differ in some respects from the \hypre{} norm.
While users can directly set options within their code,
options can also be passed to Euclid preconditioners via
command line switches and/or small text-based configuration files.
The latter strategies have the advantage that users will not need to
alter their codes as Euclid's capabilities are extended.

%Euclid subscribes to the philosophy that
%the less coding required of the user (and the maintainer) the better.
%Hence, rather than coding interface function calls for every settable
%parameter,

The following fragment illustrates the minimum coding %that is
required to invoke Euclid preconditioning within \hypre{} application contexts.
The next subsection provides examples of the various ways in which
Euclid's options can be set.
The final subsection lists the options,
and provides guidance as to the settings that (in our experience)
will likely prove effective for minimizing execution time.

\begin{display}
\begin{verbatim}
#include "HYPRE_parcsr_ls.h"

HYPRE_Solver eu;
HYPRE_Solver pcg_solver;
HYPRE_ParVector b, x;
HYPRE_ParCSRMatrix A;

//Instantiate the preconditioner.
HYPRE_EuclidCreate(comm, &eu);

//Optionally use the following three methods to set runtime options.
// 1. pass options from command line or string array.
HYPRE_EuclidSetParams(eu, argc, argv);

// 2. pass options from a configuration file.
HYPRE_EuclidSetParamsFromFile(eu, "filename");

// 3. pass options using interface functions.
HYPRE_EuclidSetLevel(eu, 3);
...

//Set Euclid as the preconditioning method for some
//other solver, using the function calls HYPRE_EuclidSetup
//and HYPRE_EuclidSolve.  We assume that the pcg_solver
//has been properly initialized.
HYPRE_PCGSetPrecond(pcg_solver,
                    (HYPRE_PtrToSolverFcn) HYPRE_EuclidSolve,
                    (HYPRE_PtrToSolverFcn) HYPRE_EuclidSetup,
                    eu);

//Solve the system by calling the Setup and Solve methods for,
//in this case, the HYPRE_PCG solver.  We assume that A, b, and x
//have been properly initialized.
HYPRE_PCGSetup(pcg_solver, (HYPRE_Matrix)A, (HYPRE_Vector)b, (HYPRE_Vector)x);
HYPRE_PCGSolve(pcg_solver, (HYPRE_Matrix)parcsr_A, (HYPRE_Vector)b, (HYPRE_Vector)x);

//Destroy the Euclid preconditioning object.
HYPRE_EuclidDestroy(eu);

\end{verbatim}
\end{display}

%-----------------------------

\subsection{Setting Options: Examples}

For expositional purposes, assume you wish to set the ILU($k$)
factorization level to the value $k = 3$.
There are several methods of accomplishing this.
Internal to Euclid, options are stored in a simple database that
contains (name, value) pairs.
Various of Euclid's internal (private) functions query this
database to determine, at runtime, what action the user
has requested.
If you enter the option ``{\bf -eu\_stats 1''}, a report will
be printed when Euclid's destructor is called; this
report lists (among other statistics) the options that
were in effect during the factorization phase.

{\bf Method 1.}
By default, Euclid always looks for a file titled
``database'' in the working directory.
If it finds such a file, it opens it and attempts to parse it as
a configuration file.
%In the directory in which you will execute the \hypre {}
%program, open a text file named ``database'' and insert the
Configuration files should be formatted as follows.

\vspace{0.1in}
\indent {\tt >cat database} \\
\indent {\tt \#this is an optional comment} \\
\indent {\tt -level 3}
\vspace{0.1in}

Any line in a configuration file that contains a ``{\tt \#}''
character in the first column is ignored.
All other lines should begin with an option {\em name}, followed by
one or more blanks, followed by the option {\em value}.
Note that option names always begin with a ``-'' character.
If you include an option name that is not recognized by Euclid,
no harm should ensue.

{\bf Method 2.}
To pass options on the command line, call
\begin{display}
\begin{verbatim}
HYPRE_EuclidSetParams(HYPRE_Solver solver, int argc, char *argv[]);
\end{verbatim}
\end{display}
where {\tt argc} and {\tt argv} carry the usual connotation:
{\tt main(int argc, char *argv[])}.
If your \hypre{} application is called {\tt phoo}, you can
then pass options on the command line per the following example.

\begin{display}
\begin{verbatim}
mpirun -np 2 phoo -level 3
\end{verbatim}
\end{display}

Since Euclid looks for the ``database'' file when
{\tt HYPRE\_EuclidCreate} is called, and parses the command line
when {\tt HYPRE\_EuclidSetParams} is called,
option values passed on the command line will override
any similar settings that may be contained in the ``database'' file.
Also, if same option name appears more than once on the command
line, the final appearance determines the setting.

Some options, such as ``{\tt -bj}'' (see next subsection) are boolean.
Euclid always treats these options as the value ``1'' (true)
or ``0'' (false).
When passing boolean options from the command line
the value may be committed, in which case it assumed to be ``1.''
Note, however, that when boolean options are contained in a
configuration file, either the ``1'' or ``0'' must
stated explicitly.

{\bf Method 3.}
There are two ways in which you can read in options from a file
whose name is other than ``database.''
First, you can call {\tt HYPRE\_EuclidSetParamsFromFile}
to specify a configuration filename.
Second, if you have passed the command line arguments as
described above in Method 2,
you can then specify the configuration filename on the command
line using the {\bf -db\_filename filename} option, e.g.,

\begin{display}
\begin{verbatim}
mpirun -np 2 phoo -db_filename ../myConfigFile
\end{verbatim}
\end{display}

{\bf Method 4.}
One can also set parameters via interface functions, e.g
\begin{display}
\begin{verbatim}
int HYPRE_EuclidSetLevel(HYPRE_Solver solver, int level);
\end{verbatim}
\end{display}
For a full set of functions, see the reference manual.

%-----------------------------

\subsection{Options Summary}

\begin{description}
\item[-level $\langle int \rangle$] Factorization level for ILU($k$).
           Default: 1.
           Guidance: for 2D convection-diffusion and similar problems,
           fastest solution time is typically obtained with levels 4 through
           8.  For 3D problems fastest solution time is typically
           obtained with level 1.

\item[-bj] Use Block Jacobi ILU preconditioning instead of PILU.
           Default: 0 (false). Guidance: if subdomains contain
           relatively few nodes (less than 1,000), or the problem is
           not well partitioned, Block Jacobi ILU
           may give faster solution time than PILU.
\item[-eu\_stats] When Euclid's destructor is called a summary of
                 runtime settings and timing information is printed
                 to stdout.  Default: 0 (false).
                 The timing marks in the report are the maximum over
                 all processors in the MPI communicator.
\item[-eu\_mem] When Euclid's destructor is called a summary of
               Euclid's memory usage is printed to stdout.
               Default: 0 (false).
               The statistics are for the processor whose rank
               in MPI\_COMM\_WORLD is 0.
\item[-printTestData] This option is used in our autotest procedures,
                 and should not normally be invoked by users.
\item[-sparseA $\langle float \rangle$] Drop-tolerance for ILU($k$) factorization.
                        Default: 0 (no dropping).
                        Entries are treated as zero if their absolute
                        value is less than \mbox{(sparseA * max)}, where ``max''
                        is the largest absolute value of any entry in the
                        row. Guidance: try this in conjunction with
                        -rowScale.  CAUTION: If the coefficient matrix $A$ is
                        symmetric, this
                        setting is likely to cause the filled matrix,
                        $F = L+U-I$, to be unsymmetric.
                        This setting has no effect when ILUT factorization
                        is selected.
\item[-rowScale] Scale values prior to factorization such that the
                 largest value in any row is +1 or -1.
                 Default: 0 (false).
                 CAUTION: If the coefficient matrix $A$ is symmetric, this
                 setting is likely to cause the filled matrix,
                 $F = L+U-I$, to be unsymmetric.
                 Guidance: if the matrix is poorly scaled, turning on
                 row scaling may help convergence.
\item[-ilut $\langle float \rangle$] Use ILUT factorization instead
                 of the default, ILU($k$).  Here, $\langle float \rangle$
                 is the drop tolerance, which is relative to the largest
                 absolute value of any entry in the row being factored.
                 CAUTION: If the coefficient matrix $A$ is symmetric, this
                 setting is likely to cause the filled matrix,
                 $F = L+U-I$, to be unsymmetric.
                 NOTE: this option can only be used sequentially!
\end{description}

%-----------------------------------------------------------------------------

\section{\pilut: Parallel Incomplete Factorization}
\label{PILUT}

{\bf Note:} this code is no longer supported by the \hypre{} team. We recommend to
use Euclid instead, which is more versatile and in general more efficient, especially
when used with many processors.

\vspace{.3in}
\pilut{} is a parallel preconditioner based on Saad's dual-threshold incomplete
factorization algorithm. The original version of \pilut{} was done by Karypis
and Kumar \cite{GKarypis_VKumar_1998} in terms of the Cray SHMEM library. The
code was subsequently modified by the \hypre{} team: SHMEM was replaced by MPI;
some algorithmic changes were made; and it was software engineered to be
interoperable with several matrix implementations, including \hypre{}'s ParCSR
format, PETSc's matrices, and ISIS++ RowMatrix. The algorithm produces an
approximate factorization $ L U$, with the preconditioner $M$ defined by $ M =
L U $.

{\bf Note:} \pilut{} produces a nonsymmetric preconditioner even when the
original matrix is symmetric. Thus, it is generally inappropriate for
preconditioning symmetric methods such as Conjugate Gradient.

%-----------------------------

\subsection*{Parameters:}

\begin{itemize}

\item
\code{SetMaxNonzerosPerRow( int LFIL ); (Default: 20)}
Set the maximum number of nonzeros to be retained in each row of $L$ and $U$.
This parameter can be used to control the amount of memory that $L$ and $U$
occupy. Generally, the larger the value of \code{LFIL}, the longer it takes to
calculate the preconditioner and to apply the preconditioner and the larger
the storage requirements, but this trades
off versus a higher quality preconditioner that reduces the number of
iterations.

\item
\code{SetDropTolerance( double tol ); (Default: 0.0001)}
Set the tolerance (relative to the 2-norm of the row) below which entries in L
and U are automatically dropped. \pilut{} first drops entries based on the drop
tolerance, and then retains the largest LFIL elements in each row that remain.
Smaller values of \code{tol} lead to more accurate preconditioners, but can
also lead to increases in the time to calculate the preconditioner.

\end{itemize}

%-----------------------------------------------------------------------------

\section{LOBPCG Eigensolver}

LOBPCG (Locally Optimal Block Preconditioned Conjugate Gradient) is a simple,
yet very efficient, algorithm suggested in \cite{LOBPCG_2001, BLOPEX_2007,
BLOPEXWebPage} for computing several smallest eigenpairs of the symmetric
generalized eigenvalue problem $Ax=\lambda Bx$ with large, possibly sparse,
symmetric matrix $A$ and symmetric positive definite matrix $B$. The matrix $A$
is not assumed to be positive, which also allows one to use LOBPCG to compute
the largest eigenpairs of $Ax=\lambda Bx$ simply by solving $-Ax=\mu Bx$ for the
smallest eigenvalues $\mu=-\lambda$.

LOBPCG simultaneously computes several eigenpairs together, which is controlled
by the \code{blockSize} parameter, see example \file{ex11.c}. The LOBCPG also
allows one to impose constraints on the eigenvectors of the form $x^T B y_i=0$
for a set of vectors $y_i$ given to LOBPCG as input parameters. This makes it
possible to compute, e.g., 50 eigenpairs by 5 subsequent calls to LOBPCG with
the \code{blockSize}=10, using deflation.  LOBPCG can use preconditioning in two
different ways: by running an inner preconditioned PCG linear solver, or by
applying the preconditioner directly to the eigenvector residual (option
\code{-pcgitr 0}).  In all other respects, LOBPCG is similar to the PCG linear
solver.

The LOBPCG code is available for system interfaces: Struct, SStruct, and IJ.  It
is also used in the Auxiliary-space Maxwell Eigensolver (AME).  The LOBPCG setup
is similar to the setup for PCG.

% The LOBPCG code and related functions and utilities in \emph{hypre} are pulled
% from the package Block Locally Optimal Preconditioned Eigenvalue Xolvers
% (BLOPEX) revision 1.1, see \cite{2,3}.  No special configuration for the LOBPCG
% during the hypre setup is necessary, since all LOBPCG-relevant files are already
% built-in into the \emph{hypre} source.

%-----------------------------------------------------------------------------