845 lines
42 KiB
TeX
845 lines
42 KiB
TeX
\chapter{Finite Element Interface}
|
|
\label{chapter-FEI}
|
|
|
|
\section{Introduction}
|
|
|
|
User applications access the \hypre{} linear solvers via a pipeline of
|
|
two interfaces - user to finite element interface (called {\sf FEI}),
|
|
and the finite element to linear solver interface (called
|
|
{\sf LinearSystemCore}). The purpose of {\sf FEI} is to allow users
|
|
to submit the global matrices in the
|
|
form of element connectivities, element stiffness matrices, element
|
|
loads, and boundary conditions. These element information are processed
|
|
by an implementation of the {\sf FEI} which loads the global matrix and right
|
|
hand side vectors to the linear solver libraries via the
|
|
{\sf LinearSystemCore} interface.
|
|
The {\sf LinearSystemCore} interface also facilitates using
|
|
multiple linear system solver packages (such as PetSC or Aztec)
|
|
with little change in the user code.
|
|
|
|
The specification of the {\sf FEI} and its implementation was first
|
|
developed at Sandia. A simplified implementation has been implemented
|
|
at LLNL in \hypre{}'s finite element module. In Section 2, we describe the
|
|
basic {\sf FEI} functions and a sample program to demonstrate how to
|
|
use them. A brief description of \hypre{}'s internal data structure and solver
|
|
capabilities is presented in Section 3. Associated with \hypre{}'s finite
|
|
element interface is an FE-based gray-box multilevel preconditioning
|
|
module called {\sf MLI} which provides fast multilevel preconditioners.
|
|
A description of the {\sf MLI} is given in Section 4.
|
|
In Section 5, we describe the available options for using \hypre{}'s
|
|
rich solver capabilities. Users who prefer to create their own finite
|
|
element packages but would like to use the \hypre{} solvers can link their
|
|
packages to \hypre{} via the {\sf LinearSystemCore}. A description
|
|
of this interface is given in Section 6. Finally, some installation
|
|
and usage issues are discussed in Section 7.
|
|
|
|
\section{A Brief Description of The Finite Element Interface}
|
|
|
|
Embedded in application finite element codes are data structures
|
|
storing element connectivities, element stiffness matrices, element
|
|
loads, boundary conditions, nodal coordinates, etc. An implicit finite
|
|
element problem can be solved by assembling the global stiffness matrix
|
|
and the corresponding right hand side vector, and then calling linear
|
|
solver to calculate the solution. The first step in this process is
|
|
thus to instantiate an {\sf FEI} object by
|
|
\begin{tabbing}
|
|
\hspace{0.5in} \= {\sf feiPtr = new FEI\_Implementation(mpiComm);}
|
|
\end{tabbing}
|
|
where {\sf mpiComm} is an MPI communicator (e.g. {\sf MPI\_COMM\_WORLD}).
|
|
Next, various finite element information need to be sent into the {\sf FEI}
|
|
object.
|
|
|
|
The first entity to be submitted to the {\sf FEI} is {\it field} information.
|
|
A {\it field} has an identifier called {\it fieldID} and a rank or
|
|
{\it fieldSize} (number of degree of freedom). For example, for a simple
|
|
3D incompressible Navier Stokes equation, the nodal variable is the velocity
|
|
vector which has $3$ degrees of freedom; and the element variable (constant
|
|
over the element) is the pressure (scalar). If these are the only variables,
|
|
and if we assign {\it fieldID} $7$ and $8$ to them, respectively, then the
|
|
finite element field information can be set up by
|
|
\begin{tabbing}
|
|
\hspace{0.5in} \= {\sf nFields = 2;} \\
|
|
\> {\sf fieldID = new int[nFields];} \\
|
|
\> {\sf fieldID[0] = $7$; /* velocity vector */} \\
|
|
\> {\sf fieldID[1] = $8$; /* pressure */} \\
|
|
\> {\sf fieldSize = new int[nFields];} \\
|
|
\> {\sf fieldSize[0] = $3$; /* velocity vector */} \\
|
|
\> {\sf fieldSize[1] = $1$; /* pressure */ } \\
|
|
\> {\sf feiPtr$->$initFields(nFields, fieldSize, fieldID);}
|
|
\end{tabbing}
|
|
|
|
Once the field information has been established, we are ready to initialize
|
|
an element block. An element block is characterized by the block identifier,
|
|
the number of elements, the number of nodes per element, the nodal fields
|
|
and the element fields (fields that have been defined previously). Suppose
|
|
we use $1000$ hexahedral elements in the element block $0$, the setup
|
|
consists of
|
|
\begin{tabbing}
|
|
\hspace{0.5in} \= {\sf elemBlkID = 0;} \\
|
|
\> {\sf nElems = 1000;} \\
|
|
\> {\sf elemNNodes = 8; /* number of nodes per element */} \\
|
|
\> {\sf nodeNFields = 1; /* nodal field - velocity */} \\
|
|
\> {\sf nodeFieldIDs = new[nodeNFields];} \\
|
|
\> {\sf nodeFieldIDs[0] = fieldID[0]; /* velocity */ } \\
|
|
\> {\sf elemNFields = 1; /* element field - pressure */} \\
|
|
\> {\sf elemFieldIDs = new[elemNFields];} \\
|
|
\> {\sf elemFieldIDs[0] = fieldID[1]; /* pressure */ } \\
|
|
\> {\sf feiPtr$->$initElemBlock(elemBlkID, nElems, elemNNodes, nodeNFields, nodeFieldIDs,}\\
|
|
\> \hspace{1.0in} {\sf elemNFields, elemFieldIDs, 0);}
|
|
\end{tabbing}
|
|
The last argument is to specify how the dependent variables are arranged in
|
|
the element matrices. A value of $0$ indicates that each variable is to be
|
|
arranged in a separate block (as opposed to interleaving).
|
|
|
|
In a parallel environment, each processor has one or more element blocks.
|
|
Unless the element blocks are all disjoint, some of the element blocks
|
|
share a common set of nodes on the subdomain boundaries. To facilitate
|
|
setting up interprocessor communications, shared nodes between subdomains
|
|
on different processors are to be identified and sent to the {\sf FEI}.
|
|
Hence, each node in the whole domain is assigned a unique global
|
|
identifier. The shared node list on each processor contains a subset
|
|
of the global node list
|
|
corresponding to the local nodes that are shared with the other processors.
|
|
The syntax for setting up the shared nodes is
|
|
\begin{tabbing}
|
|
\hspace{0.5in} \= {\sf feiPtr$->$initSharedNodes(nShared, sharedIDs, sharedLengs, sharedProcs);}
|
|
\end{tabbing}
|
|
This completes the initialization phase, and a completion signal is sent to
|
|
the {\sf FEI} via
|
|
\begin{tabbing}
|
|
\hspace{0.5in} \= {\sf feiPtr$->$initComplete();}
|
|
\end{tabbing}
|
|
|
|
Next we begin the {\it load} phase. The first entity for loading is the
|
|
nodal boundary conditions. Here we need to specify the number of boundary
|
|
equations and the boundary values given by {\sf alpha, beta}, and {\sf gamma}. Depending whether the boundary conditions are Dirichlet, Neumann, or mixed,
|
|
the three values should be passed into the {\sf FEI} accordingly.
|
|
|
|
The element stiffness matrices are to be loaded in the next step. We need
|
|
to specify the element number $i$, the element block to which element $i$
|
|
belongs, the element connectivity information, the element load, and the
|
|
element matrix format. The element connectivity specifies a set of $8$ node
|
|
global IDs (for hexahedral elements), and the element load is the load or
|
|
force for each degree of freedom. The element format specifies how the
|
|
equations are arranged (similar to the interleaving scheme mentioned above).
|
|
The calling sequence for loading element stiffness matrices is
|
|
\begin{tabbing}
|
|
\hspace{0.5in} \= {\sf for (iE = 0; iE < nElems; iE++)} \\
|
|
\> \hspace{0.5in} {\sf feiPtr$->$sumInElem(elemBlkID, elemID, elemConn[iE], elemStiff[iE],} \\
|
|
\> \hspace{1.5in} {\sf elemLoads[iE], elemFormat);}
|
|
\end{tabbing}
|
|
Again, to complete the loading phase, a completion signal is sent to
|
|
the {\sf FEI} via
|
|
\begin{tabbing}
|
|
\hspace{0.5in} \= {\sf feiPtr$->$loadComplete();}
|
|
\end{tabbing}
|
|
|
|
Now the global stiffness matrix and the corresponding right hand side
|
|
have been assembled. Before the linear system is solved, a number of
|
|
solver parameters have to be passed into the {\sf FEI}. A detailed description
|
|
of the solver parameters is given in Section 3. An example is given below
|
|
\begin{tabbing}
|
|
\hspace{0.5in} \= {\sf nParams = 5;} \\
|
|
\> {\sf paramStrings = new char*[nParams];} \\
|
|
\> {\sf for (i = 0; i < nParams; i++) }\\
|
|
\> \hspace{0.5in} {\sf paramStrings[i] = new char[100];} \\
|
|
\> {\sf strcpy(paramStrings[0], "solver cg");} \\
|
|
\> {\sf strcpy(paramStrings[1], "preconditioner diag");} \\
|
|
\> {\sf strcpy(paramStrings[2], "maxiterations 100");} \\
|
|
\> {\sf strcpy(paramStrings[3], "tolerance 1.0e-6");} \\
|
|
\> {\sf strcpy(paramStrings[4], "outputLevel 1");} \\
|
|
\> {\sf feiPtr$->$parameters(nParams, paramStrings);}
|
|
\end{tabbing}
|
|
|
|
To solve the linear system of equations, we call
|
|
\begin{tabbing}
|
|
\hspace{0.5in} \= {\sf feiPtr$->$solve(\&status);}
|
|
\end{tabbing}
|
|
where {\sf status} is returned from the {\sf FEI} indicating whether
|
|
the solve is successful. Finally, the solution can be retrieved by
|
|
\begin{tabbing}
|
|
\hspace{0.5in} \= {\sf feiPtr->getBlockNodeSolution(elemBlkID, nNodes, nodeIDList, solnOffsets, solnValues);}
|
|
\end{tabbing}
|
|
where {\sf solnOffsets[i]} is the index pointing to the first location
|
|
where the variables at node $i$ is returned in {\sf solnValues}.
|
|
|
|
\section{\hypre{} Solvers}
|
|
|
|
On the other side of the {\sf LinearSystemCore} interface is the linear
|
|
solver package. Any linear solver package can be used as long as there
|
|
is an implementation of the interface in the solver package itself. The
|
|
{\sf LinearSystemCore} implementation in \hypre{} is encapsulated in the
|
|
\hypre{} finite element conceptual interface called {\sf HYPRE\_LinSysCore}.
|
|
The incoming element stiffness matrices and other information are
|
|
converted into \hypre{}'s internal data structures. In this section we
|
|
briefly describe some of these internal data structures and capabilities.
|
|
|
|
\subsection{Parallel Matrix and Vector Construction}
|
|
|
|
The matrix class in \hypre{} accessible via the {\sf LinearSystemCore} interface
|
|
is the parallel compressed sparse row ({\sf ParCSR}) matrix. The
|
|
requirements about how the global matrix is partitioned among the
|
|
processors are that each processor holds a contiguous block of rows and columns
|
|
and the equation numbers in processors of lower rank are lower than those
|
|
in processors of higher rank. The {\sf FEI} is responsible for ensuring
|
|
that these two requirements are followed. The matrix can be loaded in
|
|
parallel - a row or a block of rows at a time. The solution and right
|
|
hand side vectors are constructed accordingly. The matrix rows corresponding
|
|
to the shared nodes can be assigned to either processor, and is determined
|
|
by the {\sf FEI} itself. Once the incoming matrix and vector data have
|
|
been captured in the \hypre{} {\sf ParCSR} format, a whole of matrix and
|
|
vector operators are available for use in the \hypre{} solvers.
|
|
|
|
\subsection{Sequential and Parallel Solvers}
|
|
|
|
\hypre{} currently has many iterative solvers. There is also internally a
|
|
version of the sequential {\sf SuperLU} direct solver (developed at U.C.
|
|
Berkeley) suitable to small problems (may be up to the size of $10000$).
|
|
In the following we list some of these internal solvers.
|
|
|
|
\begin{enumerate}
|
|
\item Krylov solvers (CG, GMRES, FGMRES, TFQMR, BiCGSTAB, symmetric QMR),
|
|
\item Boomeramg (a parallel algebraic multigrid solver),
|
|
\item SuperLU direct solver (sequential),
|
|
\item SuperLU direct solver with iterative refinement (sequential),
|
|
\end{enumerate}
|
|
|
|
\subsection{Parallel Preconditioners}
|
|
|
|
The performance of the Krylov solvers can be improved by clever selection
|
|
of preconditioners. \hypre{} currently has the following preconditioners
|
|
available via the {\sf LinearSystemCore} interface:
|
|
|
|
\begin{enumerate}
|
|
\item no preconditioning,
|
|
\item diagonal,
|
|
\item another parallel incomplete LU ({\sf Euclid}),
|
|
\item parallel sparse approximate inverse ({\sf ParaSails}),
|
|
\item parallel algebraic multigrid ({\sf BoomerAMG}),
|
|
\item another parallel algebraic multigrid ({\sf MLI} based on
|
|
the smoothed aggregation multigrid methods),
|
|
\item parallel domain decomposition with inexact local solves ({\sf DDIlut}),
|
|
\item least-squares polynomial preconditioner,
|
|
\item $2 \times 2$ block preconditioner, and
|
|
\item $2 \times 2$ Uzawa preconditioner.
|
|
\end{enumerate}
|
|
|
|
Some of these preconditioners can be tuned by a number of internal parameters
|
|
modifiable by users. A description of these parameters is given in later
|
|
sections.
|
|
|
|
\subsection{Matrix Reduction}
|
|
|
|
For some structural mechanics problems with multi-point constraints the
|
|
discretization matrix is indefinite (eigenvalues lie in both sides of
|
|
the imaginary axis). Indefinite matrices are much more difficult to solve
|
|
than definite matrices. Methods have been developed to reduce these
|
|
indefinite matrices to definite matrices. Two matrix reduction algorithms
|
|
have been implemented in \hypre{}, as presented in the following subsections.
|
|
|
|
\subsubsection{Schur Complement Reduction}
|
|
|
|
The incoming linear system of equations is assumed to be in the form :
|
|
\[ \left[
|
|
\begin{array}{cc}
|
|
D & B \\
|
|
B^T & 0
|
|
\end{array}
|
|
\right]
|
|
\left[
|
|
\begin{array}{c}
|
|
x_1 \\
|
|
x_2
|
|
\end{array}
|
|
\right]
|
|
=
|
|
\left[
|
|
\begin{array}{c}
|
|
b_1 \\
|
|
b_2
|
|
\end{array}
|
|
\right]
|
|
\]
|
|
where $D$ is a diagonal matrix. After Schur complement reduction is applied,
|
|
the resulting linear system becomes
|
|
$$
|
|
- B^T D^{-1} B x_2 = b_2 - B^T D^{-1} b_1.
|
|
$$
|
|
|
|
\subsubsection{Slide Surface Reduction}
|
|
|
|
With the presence of slide surfaces, the matrix is in the same form as in the
|
|
case of Schur complement reduction. Here $A$ represents the relationship
|
|
between the master, slave, and other degrees of freedom. The matrix block
|
|
$[B^T 0]$ corresponds to the constraint equations. The goal of reduction
|
|
is to eliminate the constraints. As proposed by Manteuffel, the trick is
|
|
to re-order the system into a $3 \times 3$ block matrix.
|
|
\[
|
|
\left[
|
|
\begin{array}{ccc}
|
|
A_{11} & A_{12} & N \\
|
|
A_{21} & A_{22} & D \\
|
|
N_{T} & D & 0 \\
|
|
\end{array}
|
|
\right]
|
|
=
|
|
\left[
|
|
\begin{array}{ccc}
|
|
A_{11} & \hat{A}_{12} \\
|
|
\hat{A}_{21} & \hat{A}_{22}.
|
|
\end{array}
|
|
\right]
|
|
\]
|
|
The reduced system has the form :
|
|
$$
|
|
(A_{11} - \hat{A}_{21} \hat{A}_{22}^{-1} \hat{A}_{12}) x_1 =
|
|
b_1 - \hat{A}_{21} \hat{A}_{22}^{-1} b_2,
|
|
$$
|
|
which is symmetric positive definite (SPD) if the original matrix is PD.
|
|
In addition, $\hat{A}_{22}^{-1}$ is easy to compute.
|
|
|
|
There are three slide surface reduction algorithms in \hypre{}.
|
|
The first follows the matrix formulation in this section. The second is
|
|
similar except that it replaces the eliminated slave equations with
|
|
identity rows so that the degree of freedom at each node is preserved.
|
|
This is essential for certain block algorithms such as the smoothed
|
|
aggregation multilevel preconditioners.
|
|
The third is similar to the second except that it is more general and
|
|
can be applied to problems with intersecting slide surfaces (sequential only
|
|
for intersecting slide surfaces).
|
|
|
|
\subsection{Other Features}
|
|
|
|
To improve the efficiency of the \hypre{} solvers, a few other features have been
|
|
incorporated. We list a few of these features below :
|
|
|
|
\begin{enumerate}
|
|
\item Preconditioner reuse - For multiple linear solves with matrices that are
|
|
slightly perturbed from each other, oftentimes the use of the same
|
|
preconditioners can save preconditioner setup times but suffer little
|
|
convergence rate degradation.
|
|
\item Projection methods - For multiple solves that use the same matrix,
|
|
previous solution vectors can sometimes be used to give a better initial
|
|
guess for subsequent solves. Two projection schemes have been implemented
|
|
in \hypre{} - A-conjugate projection (for SPD matrices) and minimal residual
|
|
projection (for both SPD and non-SPD matrices).
|
|
\item The sparsity pattern of the matrix is in general not destroyed after
|
|
it has been loaded to an \hypre{} matrix. But if the matrix is not to
|
|
be reused, an option is provided to clean up this pattern matrix to
|
|
conserve memory usage.
|
|
\end{enumerate}
|
|
|
|
\section{The MLI Package}
|
|
|
|
MLI is an object-oriented module that implements the class of algebraic
|
|
multigrid algorithms based on Vanek and Brezina's smoothed aggregation
|
|
method. There are two main algorithms in this module - the original
|
|
smoothed aggregation algorithm and the modified version that uses
|
|
the finite element substructure matrices to construct the prolongation
|
|
operators. As such, the later algorithm can only be used in the
|
|
finite element context via the finite element interface. In addition,
|
|
the nodal coordinates obtained via the finite element interface can be
|
|
used to construct a better prolongation operator than the pure
|
|
translation modes.
|
|
|
|
\section{\hypre{} LinearSystemCore Parameters}
|
|
|
|
User applications interact directly with \hypre{} via the {\sf parameters}
|
|
function. In this section we list the parameters recognized by
|
|
{\sf HYPRE\_LinSysCore}.
|
|
|
|
\subsection{Parameters for Solvers and Preconditioners}
|
|
|
|
\begin{description}
|
|
\item[solver xxx] where xxx specifies one of {\sf cg}, {\sf gmres},
|
|
{\sf fgmres}, {\sf bicgs}, {\sf bicgstab}, {\sf tfqmr},
|
|
{\sf symqmr}, {\sf superlu}, or {\sf superlux}. The
|
|
default is {\sf gmres}.
|
|
The solver type can be followed by {\sf override} to
|
|
specify its priority when multiple solvers are declared
|
|
at random order.
|
|
\item[preconditioner xxx] where xxx is one of {\sf diagonal}, {\sf pilut},
|
|
{\sf euclid}, {\sf parasails}, {\sf boomeramg}, {\sf ddilut},
|
|
{\sf poly}, {\sf blockP}, {\sf Uzawa}, or {\sf mli}. The
|
|
default is {\sf diagonal}. Another option for
|
|
xxx is {\sf reuse} which allows the preconditioner to be reused
|
|
(this is to be set after a preconditioner has been set up already).
|
|
The preconditioner type can be followed by {\sf override} to
|
|
specify its priority when multiple preconditioners are declared
|
|
at random order.
|
|
\item[maxIterations xxx] where xxx is an integer specifying the maximum
|
|
number of iterations permitted for the iterative solvers.
|
|
The default value is 1000.
|
|
\item[tolerance xxx] where xxx is a floating point number specifying the
|
|
termination criterion for the iterative solvers. The default
|
|
value is 1.0E-6.
|
|
\item[gmresDim xxx] where xxx is an integer specifying the value of m in
|
|
restarted GMRES(m). The default value is 100.
|
|
\item[stopCrit xxx] where xxx is one of {\sf absolute} or {\sf relative}
|
|
stopping criterion.
|
|
\item[superluOrdering xxx] - where xxx specifies one of {\sf natural} or
|
|
{\sf mmd} (minimum degree ordering). This ordering
|
|
is used to minimize the number of nonzeros generated
|
|
in the LU decomposition. The default is natural ordering.
|
|
\item[superluScale xxx] where xxx specifies one of {\sf y} (perform row
|
|
and column scalings before decomposition) or {\sf n}.
|
|
The default is no scaling.
|
|
\end{description}
|
|
|
|
\subsection{Parameters for ILUT, SPAI, and Polynomial Preconditioners}
|
|
|
|
\begin{description}
|
|
\item[ddilutFillin xxx] where xxx is a floating point number specifying
|
|
the maximum number of nonzeros kept in the formation of local
|
|
incomplete L and U (a value of $0.0$ means same sparsity as $A$,
|
|
and a value of $1.0$ means two times the number of nonzeros as
|
|
$A$.). If this is not called, a value will be selected
|
|
based on the sparsity of the matrix.
|
|
\item[ddilutDropTol xxx] where xxx is a floating point number specifying the
|
|
threshold to drop small entries in L and U. The default
|
|
value is 0.0.
|
|
\item[euclidNlevels xxx] where xxx is an non-negative integer specifying
|
|
the desired sparsity of the incomplete factors. The
|
|
default value is 0.
|
|
\item[euclidThreshold xxx] where xxx is a floating point number specifying
|
|
the threshold used to sparsify the incomplete factors. The default
|
|
value is 0.0.
|
|
\item[parasailsThreshold xxx] where xxx is a floating point number between 0
|
|
and 1 specifying the threshold used to prune small entries
|
|
in setting up the sparse approximate inverse. The default
|
|
value is 0.0.
|
|
\item[parasailsNlevels xxx] where xxx is an integer larger than 0 specifying
|
|
the desired sparsity of the approximate inverse. The
|
|
default value is 1.
|
|
\item[parasailsFilter xxx] where xxx is a floating point number between 0
|
|
and 1 specifying the threshold used to prune small entries
|
|
in $A$. The default value is 0.0.
|
|
\item[parasailsLoadbal xxx] where xxx is a floating point number between 0
|
|
and 1 specifying how load balancing has to be done
|
|
(Edmond, explain please). The default value is 0.0.
|
|
\item[parasailsSymmetric] sets Parasails to take $A$ as symmetric.
|
|
\item[parasailsUnSymmetric] sets Parasails to take $A$ as nonsymmetric
|
|
(default).
|
|
\item[parasailsReuse] sets Parasails to reuse the sparsity pattern of $A$.
|
|
\item[polyorder xxx] where xxx is the order of the least-squares polynomial
|
|
preconditioner.
|
|
\end{description}
|
|
|
|
\subsection{Parameters for Multilevel Preconditioners}
|
|
|
|
\subsubsection{Parameters for BoomerAMG Preconditioner}
|
|
\begin{description}
|
|
\item[amgCoarsenType xxx] where xxx specifies one of {\sf falgout} or
|
|
{\sf ruge}, or {\sf default (CLJP)} coarsening for BOOMERAMG.
|
|
\item[amgMeasureType xxx] where xxx specifies one of {\sf local} or
|
|
or {\sf global}. This parameter affects how coarsening is performed
|
|
in parallel.
|
|
\item[amgNumSweeps xxx] where xxx is an integer specifying the number of
|
|
pre- and post-smoothing at each level of BOOMERAMG.
|
|
The default is two pre- and two post-smoothings.
|
|
\item[amgRelaxType xxx] where xxx is one of {\sf jacobi} (Damped Jacobi),
|
|
{\sf gs-slow} (sequential Gauss-Seidel), {\sf gs-fast}
|
|
(Gauss-Seidel on interior nodes), or {\sf hybrid}.
|
|
The default is {\sf hybrid}.
|
|
\item[amgRelaxWeight xxx] where xxx is a floating point number between
|
|
0 and 1 specifying the damping factor for BOOMERAMG's damped
|
|
Jacobi and GS smoothers. The default value is 1.0.
|
|
\item[amgRelaxOmega xxx] where xxx is a floating point number between
|
|
0 and 1 specifying the damping factor for BOOMERAMG's hybrid
|
|
smoother for multiple processors. The default value is 1.0.
|
|
\item[amgStrongThreshold xxx] where xxx is a floating point number between 0
|
|
and 1 specifying the threshold used to determine
|
|
strong coupling in BOOMERAMG's coasening. The default
|
|
value is 0.25.
|
|
\item[amgSystemSize xxx] where xxx is the degree of freedom per node.
|
|
\item[amgUseGSMG] - tells BOOMERAMG to use a different coarsening called GSMG.
|
|
\item[amgGSMGNumSamples] where xxx is the number of samples to generate
|
|
to determine how to coarsen for GSMG.
|
|
\end{description}
|
|
|
|
\subsubsection{Parameters for MLI Preconditioner}
|
|
|
|
\begin{description}
|
|
\item[outputLevel xxx] where xxx is the output level for diagnostics.
|
|
\item[method xxx] where xxx is either {\sf AMGSA} (default), {\sf AMGSAe},
|
|
to indicate which MLI algorithm is to be used.
|
|
\item[numLevels xxx] where xxx is the maximum number of levels (default=30)
|
|
used.
|
|
\item[maxIterations xxx] where xxx is the maximum number of iterations
|
|
(default = 1 as preconditioner).
|
|
\item[cycleType xxx] where xxx is either 'V' or 'W' cycle (default = 'V').
|
|
\item[strengthThreshold xxx] strength threshold for coarsening (default = 0).
|
|
\item[smoother xxx] where xxx is either {\sf Jacobi}, {\sf BJacobi}, {\sf GS},
|
|
{\sf SGS}, {\sf HSGS} (SSOR,default), {\sf BSGS}, {\sf ParaSails},
|
|
{\sf MLS}, {\sf CGJacobi}, {\sf CGBJacobi}, or {\sf Chebyshev}.
|
|
\item[numSweeps xxx] where xxx is the number of smoother sweeps (default = 2).
|
|
\item[coarseSolver xxx] where xxx is one of those in 'smoother' or
|
|
{\sf SuperLU} (default).
|
|
\item[minCoarseSize xxx] where xxx is the minimum coarse grid size to
|
|
control the number of levels used (default = 3000).
|
|
\item[Pweight xxx] where xxx is the relaxation parameter for the prolongation
|
|
smoother (default 0.0).
|
|
\item[nodeDOF xxx] where xxx is the degree of freedom for each node
|
|
(default = 1).
|
|
\item[nullSpaceDim xxx] where xxx is the dimension of the null space for
|
|
the coarse grid (default = 1).
|
|
\item[useNodalCoord xxx] where xxx is either 'on' or 'off' (default)
|
|
to indicate whether the nodal coordinates are used to generate the
|
|
initial null space.
|
|
\item[saAMGCalibrationSize xxx] where xxx is the additional null space
|
|
vectors to be generated via calibration (default = 0).
|
|
\item[numSmoothVecs xxx] where xxx is the number of near null space
|
|
vectors used to create the prolongation operator (default = 0).
|
|
\item[smoothVecSteps xxx] where xxx is the number of smoothing steps
|
|
used to generate the smooth vectors (default = 0).
|
|
\end{description}
|
|
|
|
In addition, to use 'AMGSAe', the parameter 'haveSFEI' has to be sent into
|
|
the FEI using the parameters function (this option is valid only for the
|
|
Sandia FEI implementation).
|
|
|
|
\subsection{Parameters for Block and Uzawa Preconditioners}
|
|
\subsubsection{Parameters for Block Preconditioners}
|
|
The parameters for this preconditioner are preceded by the keyword {sf blockP}.
|
|
The available parameters after this keywords are:
|
|
\begin{description}
|
|
\item[blockD] turns on block diagonal preconditioning.
|
|
\item[blockT] turns on block tridiagonal preconditioning.
|
|
\item[blockLU] turns on block LU preconditioning.
|
|
\item[outputLevel xxx] where xxx is the output level for diagnostics.
|
|
\item[block1FieldID xxx] where xxx is field ID for the (1,1) block.
|
|
\item[block2FieldID xxx] where xxx is field ID for the (2,2) block (for
|
|
ALE3D's implicit hydrodynamics with slide surfaces, the field ID for
|
|
both blocks are -3.)
|
|
\item[printInfo] prints information about internal parameter settings.
|
|
\item[lumpedMassScheme xxx] where is either {\sf diag} (take the diagonal
|
|
of the (1,1) block) or {\sf ainv} (take the approximate inverse of
|
|
the (1,1) block.
|
|
\item[invA11PSNlevels xxx] where xxx is 0 or 1 to indicate the ParaSails
|
|
nlevels to use to generate the lumped mass matrix (if {\sf ainv} is
|
|
selected.)
|
|
\item[invA11PSThresh xxx] where xxx is a floating point number between 0
|
|
and 1 to indicate the ParaSails threshold to use to generate the lumped
|
|
mass matrix (if {\sf ainv} is selected.)
|
|
\item[A11Solver xxx] where xxx is either {\sf cg} or {\sf gmres} as solver
|
|
for the (1,1) block.
|
|
\item[A11Tolerance xxx] where xxx is convergence tolerance for the
|
|
(1,1) block.
|
|
\item[A11MaxIterations xxx] where xxx is maximum number of iterations for
|
|
the (1,1) block.
|
|
\item[A11Precon xxx] where xxx is either {\sf pilut}, {\sf boomeramg},
|
|
{\sf euclid}, {\sf parasails}, {\sf ddilut}, or {\sf mli}.
|
|
\item[A11PreconPSNlevels xxx] - ParaSails' nlevels.
|
|
\item[A11PreconPSThresh xxx] - ParaSails' threshold.
|
|
\item[A11PreconPSFilter xxx] - ParaSails' filter.
|
|
\item[A11PreconAMGThresh xxx] - Boomeramg's threshold.
|
|
\item[A11PreconAMGRelaxType xxx] - Boomeramg's smoother.
|
|
\item[A11PreconAMGNumSweeps xxx] - Boomeramg's numSweeps.
|
|
\item[A11PreconAMGSystemSize xxx] - Boomeramg's systemSize.
|
|
\item[A11PreconEuclidNLevels xxx] - Euclid's nlevels.
|
|
\item[A11PreconEuclidThresh xxx] - Euclid's threshold.
|
|
\item[A11PreconPilutFillin xxx] - Pilut's fillin.
|
|
\item[A11PreconPilutDropTol xxx] - Pilut's drop tolerance.
|
|
\item[A11PreconDDIlutFillin xxx] - DDILUT's fillin.
|
|
\item[A11PreconDDIlutDropTol xxx] - DDILUT's drop tolerance.
|
|
\item[A11PreconMLIRelaxType xxx] - MLI's smoother.
|
|
\item[A11PreconMLIThresh xxx] - MLI's threshold.
|
|
\item[A11PreconMLIPweight xxx] - MLI's Pweight.
|
|
\item[A11PreconMLINumSweeps xxx] - MLI's numSweeps.
|
|
\item[A11PreconMLINodeDOF xxx] - MLI's nodeDOF.
|
|
\item[A11PreconMLINullDim xxx] - MLI's null space dimension.
|
|
\item[A22Solver xxx] where xxx is either {\sf cg} or {\sf gmres} as solver
|
|
for the (2,2) block.
|
|
\item[A22Tolerance xxx] where xxx is convergence tolerance for the
|
|
(2,2) block.
|
|
\item[A22MaxIterations xxx] where xxx is maximum number of iterations for
|
|
the (2,2) block.
|
|
\item[A22Precon xxx] where xxx is either {\sf pilut}, {\sf boomeramg},
|
|
{\sf euclid}, {\sf parasails}, {\sf ddilut}, or {\sf mli}.
|
|
\item[A22PreconPSNlevels xxx] - ParaSails nlevels.
|
|
\item[A22PreconPSThresh xxx] - ParaSails' threshold.
|
|
\item[A22PreconPSFilter xxx] - ParaSails' filter.
|
|
\item[A22PreconAMGThresh xxx] - Boomeramg's threshold.
|
|
\item[A22PreconAMGRelaxType xxx] - Boomeramg's smoother.
|
|
\item[A22PreconAMGNumSweeps xxx] - Boomeramg's numSweeps.
|
|
\item[A22PreconAMGSystemSize xxx] - Boomeramg's systemSize.
|
|
\item[A22PreconEuclidNLevels xxx] - Euclid's nlevels.
|
|
\item[A22PreconEuclidThresh xxx] - Euclid's threshold.
|
|
\item[A22PreconPilutFillin xxx] - Pilut's fillin.
|
|
\item[A22PreconPilutDropTol xxx] - Pilut's drop tolerance.
|
|
\item[A22PreconDDIlutFillin xxx] - DDILUT's fillin.
|
|
\item[A22PreconDDIlutDropTol xxx] - DDILUT's drop tolerance.
|
|
\item[A22PreconMLIRelaxType xxx] - MLI's smoother.
|
|
\item[A22PreconMLIThresh xxx] - MLI's threshold.
|
|
\item[A22PreconMLIPweight xxx] - MLI's Pweight.
|
|
\item[A22PreconMLINumSweeps xxx] - MLI's numSweeps.
|
|
\item[A22PreconMLINodeDOF xxx] - MLI's nodeDOF.
|
|
\item[A22PreconMLINullDim xxx] - MLI's null space dimension.
|
|
\end{description}
|
|
|
|
\subsubsection{Parameters for Uzawa Preconditioner}
|
|
The Uzawa preconditioner has a similar parameter set as block preconditioner,
|
|
as described in the following (except that DDIlut is not available here).
|
|
\begin{description}
|
|
\item[outputLevel xxx] - where xxx is the output level for diagnostics.
|
|
\item[A11Solver xxx] where xxx is either {\sf cg} or {\sf gmres} as solver
|
|
for the (1,1) block.
|
|
\item[A11Tolerance xxx] where xxx is convergence tolerance for the
|
|
(1,1) block.
|
|
\item[A11MaxIterations xxx] where xxx is maximum number of iterations for
|
|
the (1,1) block.
|
|
\item[A11Precon xxx] where xxx is either {\sf pilut}, {\sf boomeramg},
|
|
{\sf euclid}, {\sf parasails}, {\sf ddilut}, or {\sf mli}.
|
|
\item[A11PreconPSNlevels xxx] - ParaSails' nlevels.
|
|
\item[A11PreconPSThresh xxx] - ParaSails' threshold.
|
|
\item[A11PreconPSFilter xxx] - ParaSails' filter.
|
|
\item[A11PreconAMGThresh xxx] - Boomeramg's threshold.
|
|
\item[A11PreconAMGRelaxType xxx] - Boomeramg's smoother.
|
|
\item[A11PreconAMGNumSweeps xxx] - Boomeramg's numSweeps.
|
|
\item[A11PreconAMGSystemSize xxx] - Boomeramg's systemSize.
|
|
\item[A11PreconEuclidNLevels xxx] - Euclid's nlevels.
|
|
\item[A11PreconEuclidThresh xxx] - Euclid's threshold.
|
|
\item[A11PreconPilutFillin xxx] - Pilut's fillin.
|
|
\item[A11PreconPilutDropTol xxx] - Pilut's drop tolerance.
|
|
\item[A11PreconMLIRelaxType xxx] - MLI's smoother.
|
|
\item[A11PreconMLIThresh xxx] - MLI's threshold.
|
|
\item[A11PreconMLIPweight xxx] - MLI's Pweight.
|
|
\item[A11PreconMLINumSweeps xxx] - MLI's numSweeps.
|
|
\item[A11PreconMLINodeDOF xxx] - MLI's nodeDOF.
|
|
\item[A11PreconMLINullDim xxx] - MLI's null space dimension.
|
|
\item[S22SolverDampingFactor xxx] where xxx is the damping (scaling) factor
|
|
for the Schur complement approximation of the (2,2) block.
|
|
\item[S22Solver xxx] where xxx is either {\sf cg} or {\sf gmres} as solver
|
|
for the (2,2) block.
|
|
\item[S22Tolerance xxx] where xxx is convergence tolerance for the
|
|
(2,2) block.
|
|
\item[S22MaxIterations xxx] where xxx is maximum number of iterations for
|
|
the (2,2) block.
|
|
\item[S22Precon xxx] where xxx is either {\sf pilut}, {\sf boomeramg},
|
|
{\sf euclid}, {\sf parasails}, {\sf ddilut}, or {\sf mli}.
|
|
\item[S22PreconPSNlevels xxx] - ParaSails' nlevels.
|
|
\item[S22PreconPSThresh xxx] - ParaSails' threshold.
|
|
\item[S22PreconPSFilter xxx] - ParaSails' filter.
|
|
\item[S22PreconAMGThresh xxx] - Boomeramg's threshold.
|
|
\item[S22PreconAMGRelaxType xxx] - Boomeramg's smoother.
|
|
\item[S22PreconAMGNumSweeps xxx] - Boomeramg's numSweeps.
|
|
\item[S22PreconAMGSystemSize xxx] - Boomeramg's systemSize.
|
|
\item[S22PreconEuclidNLevels xxx] - Euclid's nlevels.
|
|
\item[S22PreconEuclidThresh xxx] - Euclid's threshold.
|
|
\item[S22PreconPilutFillin xxx] - Pilut's fillin.
|
|
\item[S22PreconPilutDropTol xxx] - Pilut's drop tolerance.
|
|
\item[S22PreconMLIRelaxType xxx] - MLI's smoother.
|
|
\item[S22PreconMLIThresh xxx] - MLI's threshold.
|
|
\item[S22PreconMLIPweight xxx] - MLI's Pweight.
|
|
\item[S22PreconMLINumSweeps xxx] - MLI's numSweeps.
|
|
\item[S22PreconMLINodeDOF xxx] - MLI's nodeDOF.
|
|
\item[S22PreconMLINullDim xxx] - MLI's null space dimension.
|
|
\end{description}
|
|
|
|
\subsection{Parameters for Matrix Reduction}
|
|
\begin{description}
|
|
\item[schurReduction] turns on the Schur reduction mode.
|
|
\item[slideReduction] turns on the slide reduction mode.
|
|
\item[slideReduction2] turns on the slide reduction mode version 2
|
|
(see section 2).
|
|
\item[slideReduction3] turns on the slide reduction mode version 3
|
|
(see section 2).
|
|
\end{description}
|
|
|
|
\subsection{Parameters for Diagnostics and Performance Tuning}
|
|
\begin{description}
|
|
\item[outputLevel xxx] where xxx is an integer specifying the output
|
|
level. An output level of $1$ prints only the solver
|
|
information such as number of iterations and timings.
|
|
An output level of $2$ prints debug information such as
|
|
the functions visited and preconditioner information.
|
|
An output level of $3$ or higher prints more debug information
|
|
such as the matrix and right hand side loaded via the
|
|
LinearSystemCore functions to the standard output.
|
|
\item[setDebug xxx] where xxx is one of {\sf slideReduction1},
|
|
{\sf slideReduction2},
|
|
{\sf slideReduction3} (level 1,2,3 diagnostics in the slide surface
|
|
reduction code), {\sf printMat} (print the original matrix
|
|
into a file), {\sf printReducedMat} (print the reduced matrix
|
|
into a file), {\sf printSol} (print the solution into a file),
|
|
{\sf ddilut} (output diagnostic information for DDIlut
|
|
preconditioner setup), and {\sf amgDebug} (output diagnostic
|
|
information for AMG).
|
|
\item[optimizeMemory] cleans up the matrix sparsity pattern after the matrix
|
|
has been loaded. (It has been kept to allow matrix reuse.)
|
|
\item[imposeNoBC] turns off the boundary condition to allow diagnosing the
|
|
matrix (for example, checking the null space.)
|
|
\end{description}
|
|
|
|
\subsection{Miscellaneous Parameters}
|
|
\begin{description}
|
|
\item[AConjugateProjection xxx] where xxx specifies the number of previous
|
|
solution vectors to keep for the A-conjugate projection.
|
|
The default is 0 (the projection is off).
|
|
\item[minResProjection xxx] where xxx specifies the number of previous
|
|
solution vectors to keep for projection.
|
|
The default is 0 (the projection is off).
|
|
\item[haveFEData] indicates that additional finite element information are
|
|
available to assist in building more efficient solvers.
|
|
\item[haveSFEI] indicates that the simplified finite element information are
|
|
available to assist in building more efficient solvers.
|
|
\end{description}
|
|
|
|
\section{The LinearSystemCore Interface}
|
|
|
|
As described before, users who prefer to create their own finite element
|
|
interface package can also take advantage of the rich solver capabilities
|
|
in \hypre{}. In this section we show how to access {\sf HYPRE\_LinSysCore}'s
|
|
internal solver directly. Users who choose this path need first to
|
|
construct an array (say, {\sf eqnOffsets} describing the matrix row
|
|
partitioning across all processors (so {\sf eqnOffsets[p]} and
|
|
{\sf eqnOffset[p+1]} have the starting and ending row indices for processor
|
|
p). Furthermore, suppose the local submatrix has been constructed as a
|
|
compressed sparse row (CSR) matrix in the {\sf ia, ja, val} arrays.
|
|
The following program segment describes the function calls to set up
|
|
the internal matrix and solve the linear system.
|
|
|
|
\begin{tabbing}
|
|
\hspace{0.5in} \= {\sf Program Segment} \\[1mm]
|
|
\> {\sf startRow = eqnOffsets[mypid];} \\
|
|
\> {\sf endRow = eqnOffsets[mypid+1] - 1;} \\
|
|
\> {\sf nrows = endRow - startRow + 1} \\
|
|
\> {\sf for ( i = startRow; i <= endRow; i++ ) \{ } \\
|
|
\> \hspace{0.3in} \= {\sf ncnt = ia[i+1] - ia[i];} \\
|
|
\> \> {\sf rowLengths[i-startRow] = ncnt;} \\
|
|
\> \> {\sf colIndices[i-startRow] = new int[ncnt];} \\
|
|
\> \> {\sf k = 0;} \\
|
|
\> \> {\sf for (j = ia[i]; j < ia[i+1]; j++) colIndices[i-startRow][k++] = ja[j];}\\
|
|
\> \} \\
|
|
\> {\sf HYPRE\_LinSysCore\_create(\&lsc, MPI\_COMM\_WORLD);} \\
|
|
\> {\sf HYPRE\_setGlobalOffsets(lsc, nrows, NULL, eqnOffsets, NULL);} \\
|
|
\> {\sf HYPRE\_setMatrixStructure(lsc, colIndices, rowLengths, NULL, NULL, NULL);} \\
|
|
\> {\sf for ( i = startRow; i <= endRow; i++ ) \{ } \\
|
|
\> \> {\sf ncnt = ia[i+1] - ia[i];} \\
|
|
\> \> {\sf HYPRE\_sumIntoSystemMatrix(lsc, i, ncnt, \&val[ia[i]], \&ja[ia[i]]);}\\
|
|
\> \> {\sf HYPRE\_sumIntoRHSVector(1, \&rhs[i], \&i);} \\
|
|
\> \} \\
|
|
\> {\sf HYPRE\_matrixLoadComplete();}\\
|
|
\> {\sf strcpy(paramString, "solver gmres");} \\
|
|
\> {\sf HYPRE\_parameters(1, \¶mString);} \\
|
|
\> {\sf strcpy(paramString, "preconditioner boomeramg");} \\
|
|
\> {\sf HYPRE\_parameters(1, \¶mString);} \\
|
|
\> {\sf HYPRE\_launchSolver(\&status, \&iterations);}
|
|
\end{tabbing}
|
|
|
|
A list of available functions is given in the following.
|
|
|
|
\begin{tabbing}
|
|
{\sf HYPRE\_LinSysCore\_create(LinSysCore **lsc, MPI\_Comm comm)} \\[1mm]
|
|
{\sf HYPRE\_LinSysCore\_destroy(LinSysCore **lsc)} \\[1mm]
|
|
{\sf HYPRE\_parameters(LinSysCore *lsc, int nParams, char **params)} \\[1mm]
|
|
{\sf HYPRE\_setGlobalOffsets(LinSysCore* lsc, int leng, int* nodeOffsets,} \\
|
|
\hspace{1.0in} {\sf int* eqnOffsets, int* blkEqnOffsets)} \\[1mm]
|
|
{\sf HYPRE\_setMatrixStructure(LinSysCore *lsc, int** ptColIndices,} \\
|
|
\hspace{1.0in} {\sf int* ptRowLengths, int** blkColIndices, int* blkRowLengths, int* ptRowsPerBlkRow)} \\[1mm]
|
|
{\sf HYPRE\_resetMatrixAndVector(LinSysCore *lsc, double val)} \\[1mm]
|
|
{\sf HYPRE\_resetMatrix(LinSysCore *lsc, double val)} \\[1mm]
|
|
{\sf HYPRE\_resetRHSVector(LinSysCore *lsc, double val)} \\[1mm]
|
|
{\sf HYPRE\_sumIntoSystemMatrix(LinSysCore *lsc, int numPtRows, const int* ptRows,}\\
|
|
\hspace{1.0in} {\sf int numPtCols, const int* ptCols, int numBlkRows, const int* blkRows,} \\
|
|
\hspace{1.0in} {\sf int numBlkCols, const int* blkCols, const double* const* values)} \\[1mm]
|
|
{\sf HYPRE\_sumIntoRHSVector(LinSysCore *lsc, int num, const double* values, const int* indices)} \\[1mm]
|
|
{\sf HYPRE\_matrixLoadComplete(LinSysCore *lsc)} \\[1mm]
|
|
{\sf HYPRE\_enforceEssentialBC(LinSysCore *lsc, int* globalEqn, double* alpha,
|
|
double* gamma, int leng)} \\[1mm]
|
|
|
|
{\sf HYPRE\_enforceRemoteEssBCs(LinSysCore *lsc,int numEqns,int* globalEqns, int** colIndices,} \\
|
|
\hspace{1.0in} {\sf int* colIndLen, double** coefs)} \\[1mm]
|
|
|
|
{\sf HYPRE\_enforceOtherBC(LinSysCore *lsc, int* globalEqn, double* alpha, double *beta} \\
|
|
\hspace{1.0in} {\sf double* gamma, int leng)} \\[1mm]
|
|
|
|
{\sf HYPRE\_putInitialGuess(LinSysCore *lsc, const int* eqnNumbers,
|
|
const double* values, int leng)} \\[1mm]
|
|
{\sf HYPRE\_getSolution(LinSysCore *lsc, double *answers, int leng)} \\[1mm]
|
|
|
|
{\sf HYPRE\_getSolnEntry(LinSysCore *lsc, int eqnNumber, double *answer)} \\[1mm]
|
|
|
|
{\sf HYPRE\_formResidual(LinSysCore *lsc, double *values, int leng)} \\[1mm]
|
|
|
|
{\sf HYPRE\_launchSolver(LinSysCore *lsc, int *solveStatus, int *iter)} \\[1mm]
|
|
\end{tabbing}
|
|
|
|
\section{HYPRE LinearSystemCore Installation}
|
|
|
|
The ultimate objective is for application users to have immediate access
|
|
to the latest FEI/\hypre{} library files on different computing platforms
|
|
via public {\sf lib} directories. While this feature is forthcoming, careful
|
|
version control is needed for users to keep track of capabilities and bug fixes
|
|
for different installations. Users who would like to set up the FEI/\hypre{}
|
|
on their own should do the following :
|
|
|
|
\begin{enumerate}
|
|
|
|
\item obtain the \hypre{} and the Sandia FEI source codes (alternatively, use
|
|
the {\sf FEI} implementation in \hypre{}),
|
|
\item compile Sandia's {\sf FEI} (fei-2.5.0) to create the
|
|
{\sf libfei\_base.a} file.
|
|
\item compile \hypre{}
|
|
\begin{enumerate}
|
|
\item download \hypre{} from the web, ungzip and untar it
|
|
\item go into the {\sf linear\_solvers} directory
|
|
\item do a 'configure' with the {\sf --with-FEI\_BASE\_DIR} option set to
|
|
the {\sf FEI} include directory plus other compile options
|
|
\item compile with {\sf make install} which will create the
|
|
{\sf libHYPRE\_LSI*} file in the {\sf linear\_solvers/hypre/lib}
|
|
directory.
|
|
\end{enumerate}
|
|
\item call the {\sf FEI} functions in your application code (example given
|
|
previously)
|
|
\begin{enumerate}
|
|
\item include {\sf cfei\-hypre.h} in your file
|
|
\item include {\sf FEI\_Implementation.h} in your file
|
|
\item make sure your application has an {\sf include} and an {\sf lib} path
|
|
to the {\sf include} and {\sf lib} directories created above.
|
|
\end{enumerate}
|
|
|
|
\end{enumerate}
|
|
|
|
\subsection{Linking with the library files}
|
|
|
|
To link the {\sf FEI} and \hypre{} into the executable, the following has to be
|
|
attached to the linking command :
|
|
|
|
\begin{tabbing}
|
|
\hspace{0.5in} \= {\sf -L\$\{LIBPATHS\} -lfei\_base -lHYPRE\_LSI}
|
|
\end{tabbing}
|
|
along with all the other libraries (Note : the order in which the libraries are
|
|
listed may be important), where {\sf LIBPATHS} are where
|
|
the \hypre{} and {\sf FEI} libraray files can be found.
|
|
|
|
Since some of these library files make calls to LAPACK and BLAS functions,
|
|
the corresponding libraries need to be linked along with (placed after) these
|
|
library files. For example, on the DEC cluster, it suffices to link
|
|
with the {\sf dxml} library, (So {\sf -ldxml} is placed after the above link
|
|
sequence, with {\sf -lm} placed after {\sf -ldxml}.) while the {\it essl}
|
|
library can be used on the blue machine. If {\sf SuperLU} is also needed,
|
|
{\sf -lHYPRE\_superlu} should be placed immediately after {\sf HYPRE\_LSI}.
|
|
|
|
\subsection{Some more caveats for application developers}
|
|
|
|
Building an application executable often requires linking with many different
|
|
software packages, and many software packages use some LAPACK and/or BLAS
|
|
functions. In order to alleviate the problem of multiply defined functions
|
|
at link time, it is recommended that all software libraries are stripped of
|
|
all LAPACK and BLAS function definitions. These LAPACK and BLAS functions
|
|
should then be resolved at link time by linking with the system LAPACK and
|
|
BLAS libraries (e.g. dxml on DEC cluster). Both \hypre{} and SuperLU were
|
|
built with this in mind. However, some other software library files needed
|
|
may have the BLAS functions defined in them. To avoid the problem of
|
|
multiply defined functions, it is recommended that the offending library
|
|
files be stripped of the BLAS functions.
|
|
|
|
\subsection{Comments about the FEI/\hypre{} Interface and Contacts}
|
|
|
|
Comments about \hypre{}'s finite element interface can be directed
|
|
to Charles Tong (925-422-3411, chtong@llnl.gov).
|
|
|
|
\end{document}
|
|
|