hypre/docs/usr_ij.tex

244 lines
10 KiB
TeX

%=============================================================================
%=============================================================================
\chapter{Linear-Algebraic System Interface (IJ)}
\label{ch-IJ}
The \code{IJ} interface described in this chapter is the lowest common
denominator for specifying linear systems in \hypre{}. This interface
provides access to general sparse-matrix solvers in \hypre{}, not
to the specialized solvers that require more problem information.
%-----------------------------------------------------------------------------
\section{IJ Matrix Interface}
As with the other interfaces in \hypre{}, the \code{IJ} interface
expects to get data in distributed form because this is the only
scalable approach for assembling matrices on thousands of processes.
Matrices are assumed to be distributed by blocks of rows as follows:
\begin{equation}
\left[
\begin{array}{c}
~~~~~~~~~~ A_0 ~~~~~~~~~~ \\
A_1 \\
\vdots \\
A_{P-1}
\end{array}
\right]
\end{equation}
In the above example, the matrix is distributed accross the $P$
processes, $0, 1, ..., P-1$ by blocks of rows. Each submatrix $A_p$
is ``owned'' by a single process and its first and last row numbers
are given by the global indices \code{ilower} and \code{iupper} in the
\code{Create()} call below.
The following example code illustrates the basic usage of the
\code{IJ} interface for building matrices:
\begin{display}
\begin{verbatim}
MPI_Comm comm;
HYPRE_IJMatrix ij_matrix;
HYPRE_ParCSRMatrix parcsr_matrix;
int ilower, iupper;
int jlower, jupper;
int nrows;
int *ncols;
int *rows;
int *cols;
double *values;
HYPRE_IJMatrixCreate(comm, ilower, iupper, jlower, jupper, &ij_matrix);
HYPRE_IJMatrixSetObjectType(ij_matrix, HYPRE_PARCSR);
HYPRE_IJMatrixInitialize(ij_matrix);
/* set matrix coefficients */
HYPRE_IJMatrixSetValues(ij_matrix, nrows, ncols, rows, cols, values);
...
/* add-to matrix cofficients, if desired */
HYPRE_IJMatrixAddToValues(ij_matrix, nrows, ncols, rows, cols, values);
...
HYPRE_IJMatrixAssemble(ij_matrix);
HYPRE_IJMatrixGetObject(ij_matrix, (void **) &parcsr_matrix);
\end{verbatim}
\end{display}
The \code{Create()} routine creates an empty matrix object that lives
on the \code{comm} communicator. This is a collective call (i.e.,
must be called on all processes from a common synchronization point),
with each process passing its own row extents, \code{ilower} and
\code{iupper}. The row partitioning must be contiguous, i.e.,
\code{iupper} for process \code{i} must equal \code{ilower}$-1$ for
process \code{i}$+1$. Note that this allows matrices to have 0- or
1-based indexing. The parameters \code{jlower} and \code{jupper}
define a column partitioning, and should match \code{ilower} and
\code{iupper} when solving square linear systems. See the Reference
Manual for more information.
The \code{SetObjectType()} routine sets the underlying matrix object
type to \code{HYPRE_PARCSR} (this is the only object type currently
supported). The \code{Initialize()} routine indicates that the matrix
coefficients (or values) are ready to be set. This routine may or may
not involve the allocation of memory for the coefficient data,
depending on the implementation. The optional \code{SetRowSizes()}
and \code{SetDiagOffdSizes()} routines
mentioned later in this chapter and in the Reference Manual, should be
called before this step.
The \code{SetValues()} routine sets matrix values for some number of
rows (\code{nrows}) and some number of columns in each row
(\code{ncols}). The actual row and column numbers of the matrix
\code{values} to be set are given by \code{rows} and \code{cols}.
The coefficients can be modified with the
\code{AddToValues()} routine. If \code{AddToValues()} is used to add
to a value that previously didn't exist, it will set this value.
Note that while \code{AddToValues()}
will add to values on other processors, \code{SetValues()} does not set
values on other processors. Instead if a user calls \code{SetValues()}
on processor $i$ to set a matrix coefficient belonging to processor $j$,
processor $i$ will
erase all previous occurrences of this matrix coefficient,
so they will not contribute to this coefficient on processor $j$.
The actual coefficient has to be set on processor $j$.
The \code{Assemble()} routine is a collective call, and finalizes the
matrix assembly, making the matrix ``ready to use''. The
\code{GetObject()} routine retrieves the built matrix object so that
it can be passed on to \hypre{} solvers that use the \code{ParCSR}
internal storage format. Note that this is not an expensive routine;
the matrix already exists in \code{ParCSR} storage format, and the
routine simply returns a ``handle'' or pointer to it. Although we
currently only support one underlying data storage format, in the
future several different formats may be supported.
One can preset the row sizes of the matrix in order to reduce the
execution time for the matrix specification. One can specify the
total number of coefficients for each row, the number of coefficients
in the row that couple the diagonal unknown to (\code{Diag}) unknowns
in the same processor domain, and the number of coefficients in the
row that couple the diagonal unknown to (\code{Offd}) unknowns in
other processor domains:
\begin{display}
\begin{verbatim}
HYPRE_IJMatrixSetRowSizes(ij_matrix, sizes);
HYPRE_IJMatrixSetDiagOffdSizes(matrix, diag_sizes, offdiag_sizes);
\end{verbatim}
\end{display}
Once the matrix has been assembled, the sparsity pattern cannot be
altered without completely destroying the matrix object and starting
from scratch. However, one can modify the matrix values of an already
assembled matrix. To do this, first call the \code{Initialize()}
routine to re-initialize the matrix, then set or add-to values as
before, and call the \code{Assemble()} routine to re-assemble before
using the matrix. Re-initialization and re-assembly are very cheap,
essentially a no-op in the current implementation of the code.
%-----------------------------------------------------------------------------
\section{IJ Vector Interface}
The following example code illustrates the basic usage of the
\code{IJ} interface for building vectors:
\begin{display}
\begin{verbatim}
MPI_Comm comm;
HYPRE_IJVector ij_vector;
HYPRE_ParVector par_vector;
int jlower, jupper;
int nvalues;
int *indices;
double *values;
HYPRE_IJVectorCreate(comm, jlower, jupper, &ij_vector);
HYPRE_IJVectorSetObjectType(ij_vector, HYPRE_PARCSR);
HYPRE_IJVectorInitialize(ij_vector);
/* set vector values */
HYPRE_IJVectorSetValues(ij_vector, nvalues, indices, values);
...
HYPRE_IJVectorAssemble(ij_vector);
HYPRE_IJVectorGetObject(ij_vector, (void **) &par_vector);
\end{verbatim}
\end{display}
The \code{Create()} routine creates an empty vector object that lives
on the \code{comm} communicator. This is a collective call, with each
process passing its own index extents, \code{jlower} and
\code{jupper}. The names of these extent parameters begin with a
\code{j} because we typically think of matrix-vector multiplies as
the fundamental operation involving both matrices and vectors. For
matrix-vector multiplies, the vector partitioning should match the
column partitioning of the matrix (which also uses the \code{j}
notation). For linear system solves, these extents will typically
match the row partitioning of the matrix as well.
The \code{SetObjectType()} routine sets the underlying vector storage
type to \code{HYPRE_PARCSR} (this is the only storage type currently
supported). The \code{Initialize()} routine indicates that the vector
coefficients (or values) are ready to be set. This routine may or may
not involve the allocation of memory for the coefficient data,
depending on the implementation.
The \code{SetValues()} routine sets the vector \code{values} for some
number (\code{nvalues}) of \code{indices}.
The values can be modified with the
\code{AddToValues()} routine.
Note that while \code{AddToValues()}
will add to values on other processors, \code{SetValues()} does not set
values on other processors. Instead if a user calls \code{SetValues()}
on processor $i$ to set a value belonging to processor $j$,
processor $i$ will
erase all previous occurrences of this matrix coefficient,
so they will not contribute to this value on processor $j$.
The actual value has to be set on processor $j$.
The \code{Assemble()} routine is a trivial collective call, and
finalizes the vector assembly, making the vector ``ready to use''.
The \code{GetObject()} routine retrieves the built vector object so
that it can be passed on to \hypre{} solvers that use the
\code{ParVector} internal storage format.
Vector values can be modified in much the same way as with matrices by
first re-initializing the vector with the \code{Initialize()} routine.
%-----------------------------------------------------------------------------
\section{A Scalable Interface}
As explained in the previous sections, problem data is passed to the
\hypre{} library in its distributed form. However, as is typically
the case for a parallel software library, some information
regarding the global distribution of the data will be needed for
\hypre{} to perform its function.
In particular, a solver algorithm requires that a processor obtain
``nearby'' data from other processors in order to complete the solve.
While a processor may easily determine what data it needs from other
processors, it may not know which processor owns the data it needs.
Therefore, processors must determine their communication partners, or
neighbors.
The straightforward approach to determining neighbors involves
constructing a global partition of the data which requires $O(P)$ data
storage. This storage requirement (as well the costs of many of the
associated algorithms that access the storage) is not scalable for
machines such as BlueGene/L with tens of thousands of processors. The
problem of determining inter-processor communication (in the absence
of a global description of the data) in a scalable manner is addressed
in \cite{assumedpartition06}. When using \hypre{} on many thousands of
processors, compiling the library with the ``no global partition''
option as detailed in Section \ref{config_options} improves
scalability as shown in
\cite{assumedpartition06}. Note that this optimization is only
recommended when using at least several thousand of processors and is
most beneficial when using tens of thousands of processors.