 f67b19a884
			
		
	
	
		f67b19a884
		
	
	
	
	
		
			
			From 68d431b4c14ad60a778ee93c1f59ecc4b931950e Mon Sep 17 00:00:00 2001 Found via `codespell -q 3 -I ../eigen-word-whitelist.txt` where the whitelists consists of: ``` als ans cas dum lastr lowd nd overfl pres preverse substraction te uint whch ``` --- CMakeLists.txt | 26 +++++++++---------- Eigen/src/Core/GenericPacketMath.h | 2 +- Eigen/src/SparseLU/SparseLU.h | 2 +- bench/bench_norm.cpp | 2 +- doc/HiPerformance.dox | 2 +- doc/QuickStartGuide.dox | 2 +- .../Eigen/CXX11/src/Tensor/TensorChipping.h | 6 ++--- .../Eigen/CXX11/src/Tensor/TensorDeviceGpu.h | 2 +- .../src/Tensor/TensorForwardDeclarations.h | 4 +-- .../src/Tensor/TensorGpuHipCudaDefines.h | 2 +- .../Eigen/CXX11/src/Tensor/TensorReduction.h | 2 +- .../CXX11/src/Tensor/TensorReductionGpu.h | 2 +- .../test/cxx11_tensor_concatenation.cpp | 2 +- unsupported/test/cxx11_tensor_executor.cpp | 2 +- 14 files changed, 29 insertions(+), 29 deletions(-)
		
			
				
	
	
		
			129 lines
		
	
	
		
			5.3 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			129 lines
		
	
	
		
			5.3 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| 
 | |
| namespace Eigen {
 | |
| 
 | |
| /** \page TopicWritingEfficientProductExpression Writing efficient matrix product expressions
 | |
| 
 | |
| In general achieving good performance with Eigen does no require any special effort:
 | |
| simply write your expressions in the most high level way. This is especially true
 | |
| for small fixed size matrices. For large matrices, however, it might be useful to
 | |
| take some care when writing your expressions in order to minimize useless evaluations
 | |
| and optimize the performance.
 | |
| In this page we will give a brief overview of the Eigen's internal mechanism to simplify
 | |
| and evaluate complex product expressions, and discuss the current limitations.
 | |
| In particular we will focus on expressions matching level 2 and 3 BLAS routines, i.e,
 | |
| all kind of matrix products and triangular solvers.
 | |
| 
 | |
| Indeed, in Eigen we have implemented a set of highly optimized routines which are very similar
 | |
| to BLAS's ones. Unlike BLAS, those routines are made available to user via a high level and
 | |
| natural API. Each of these routines can compute in a single evaluation a wide variety of expressions.
 | |
| Given an expression, the challenge is then to map it to a minimal set of routines.
 | |
| As explained latter, this mechanism has some limitations, and knowing them will allow
 | |
| you to write faster code by making your expressions more Eigen friendly.
 | |
| 
 | |
| \section GEMM General Matrix-Matrix product (GEMM)
 | |
| 
 | |
| Let's start with the most common primitive: the matrix product of general dense matrices.
 | |
| In the BLAS world this corresponds to the GEMM routine. Our equivalent primitive can
 | |
| perform the following operation:
 | |
| \f$ C.noalias() += \alpha op1(A) op2(B) \f$
 | |
| where A, B, and C are column and/or row major matrices (or sub-matrices),
 | |
| alpha is a scalar value, and op1, op2 can be transpose, adjoint, conjugate, or the identity.
 | |
| When Eigen detects a matrix product, it analyzes both sides of the product to extract a
 | |
| unique scalar factor alpha, and for each side, its effective storage order, shape, and conjugation states.
 | |
| More precisely each side is simplified by iteratively removing trivial expressions such as scalar multiple,
 | |
| negation and conjugation. Transpose and Block expressions are not evaluated and they only modify the storage order
 | |
| and shape. All other expressions are immediately evaluated.
 | |
| For instance, the following expression:
 | |
| \code m1.noalias() -= s4 * (s1 * m2.adjoint() * (-(s3*m3).conjugate()*s2))  \endcode
 | |
| is automatically simplified to:
 | |
| \code m1.noalias() += (s1*s2*conj(s3)*s4) * m2.adjoint() * m3.conjugate() \endcode
 | |
| which exactly matches our GEMM routine.
 | |
| 
 | |
| \subsection GEMM_Limitations Limitations
 | |
| Unfortunately, this simplification mechanism is not perfect yet and not all expressions which could be
 | |
| handled by a single GEMM-like call are correctly detected.
 | |
| <table class="manual" style="width:100%">
 | |
| <tr>
 | |
| <th>Not optimal expression</th>
 | |
| <th>Evaluated as</th>
 | |
| <th>Optimal version (single evaluation)</th>
 | |
| <th>Comments</th>
 | |
| </tr>
 | |
| <tr>
 | |
| <td>\code
 | |
| m1 += m2 * m3; \endcode</td>
 | |
| <td>\code
 | |
| temp = m2 * m3;
 | |
| m1 += temp; \endcode</td>
 | |
| <td>\code
 | |
| m1.noalias() += m2 * m3; \endcode</td>
 | |
| <td>Use .noalias() to tell Eigen the result and right-hand-sides do not alias. 
 | |
|     Otherwise the product m2 * m3 is evaluated into a temporary.</td>
 | |
| </tr>
 | |
| <tr class="alt">
 | |
| <td></td>
 | |
| <td></td>
 | |
| <td>\code
 | |
| m1.noalias() += s1 * (m2 * m3); \endcode</td>
 | |
| <td>This is a special feature of Eigen. Here the product between a scalar
 | |
|     and a matrix product does not evaluate the matrix product but instead it
 | |
|     returns a matrix product expression tracking the scalar scaling factor. <br>
 | |
|     Without this optimization, the matrix product would be evaluated into a
 | |
|     temporary as in the next example.</td>
 | |
| </tr>
 | |
| <tr>
 | |
| <td>\code
 | |
| m1.noalias() += (m2 * m3).adjoint(); \endcode</td>
 | |
| <td>\code
 | |
| temp = m2 * m3;
 | |
| m1 += temp.adjoint(); \endcode</td>
 | |
| <td>\code
 | |
| m1.noalias() += m3.adjoint()
 | |
| *              * m2.adjoint(); \endcode</td>
 | |
| <td>This is because the product expression has the EvalBeforeNesting bit which
 | |
|     enforces the evaluation of the product by the Tranpose expression.</td>
 | |
| </tr>
 | |
| <tr class="alt">
 | |
| <td>\code
 | |
| m1 = m1 + m2 * m3; \endcode</td>
 | |
| <td>\code
 | |
| temp = m2 * m3;
 | |
| m1 = m1 + temp; \endcode</td>
 | |
| <td>\code m1.noalias() += m2 * m3; \endcode</td>
 | |
| <td>Here there is no way to detect at compile time that the two m1 are the same,
 | |
|     and so the matrix product will be immediately evaluated.</td>
 | |
| </tr>
 | |
| <tr>
 | |
| <td>\code
 | |
| m1.noalias() = m4 + m2 * m3; \endcode</td>
 | |
| <td>\code
 | |
| temp = m2 * m3;
 | |
| m1 = m4 + temp; \endcode</td>
 | |
| <td>\code
 | |
| m1 = m4;
 | |
| m1.noalias() += m2 * m3; \endcode</td>
 | |
| <td>First of all, here the .noalias() in the first expression is useless because
 | |
|     m2*m3 will be evaluated anyway. However, note how this expression can be rewritten
 | |
|     so that no temporary is required. (tip: for very small fixed size matrix
 | |
|     it is slightly better to rewrite it like this: m1.noalias() = m2 * m3; m1 += m4;</td>
 | |
| </tr>
 | |
| <tr class="alt">
 | |
| <td>\code
 | |
| m1.noalias() += (s1*m2).block(..) * m3; \endcode</td>
 | |
| <td>\code
 | |
| temp = (s1*m2).block(..);
 | |
| m1 += temp * m3; \endcode</td>
 | |
| <td>\code
 | |
| m1.noalias() += s1 * m2.block(..) * m3; \endcode</td>
 | |
| <td>This is because our expression analyzer is currently not able to extract trivial
 | |
|     expressions nested in a Block expression. Therefore the nested scalar
 | |
|     multiple cannot be properly extracted.</td>
 | |
| </tr>
 | |
| </table>
 | |
| 
 | |
| Of course all these remarks hold for all other kind of products involving triangular or selfadjoint matrices.
 | |
| 
 | |
| */
 | |
| 
 | |
| }
 |