You can not select more than 25 topics
			Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
		
		
		
		
		
			
		
			
				
					
					
						
							128 lines
						
					
					
						
							5.3 KiB
						
					
					
				
			
		
		
		
			
			
			
				
					
				
				
					
				
			
		
		
	
	
							128 lines
						
					
					
						
							5.3 KiB
						
					
					
				
								
							 | 
						|
								namespace Eigen {
							 | 
						|
								
							 | 
						|
								/** \page TopicWritingEfficientProductExpression Writing efficient matrix product expressions
							 | 
						|
								
							 | 
						|
								In general achieving good performance with Eigen does no require any special effort:
							 | 
						|
								simply write your expressions in the most high level way. This is especially true
							 | 
						|
								for small fixed size matrices. For large matrices, however, it might be useful to
							 | 
						|
								take some care when writing your expressions in order to minimize useless evaluations
							 | 
						|
								and optimize the performance.
							 | 
						|
								In this page we will give a brief overview of the Eigen's internal mechanism to simplify
							 | 
						|
								and evaluate complex product expressions, and discuss the current limitations.
							 | 
						|
								In particular we will focus on expressions matching level 2 and 3 BLAS routines, i.e,
							 | 
						|
								all kind of matrix products and triangular solvers.
							 | 
						|
								
							 | 
						|
								Indeed, in Eigen we have implemented a set of highly optimized routines which are very similar
							 | 
						|
								to BLAS's ones. Unlike BLAS, those routines are made available to user via a high level and
							 | 
						|
								natural API. Each of these routines can compute in a single evaluation a wide variety of expressions.
							 | 
						|
								Given an expression, the challenge is then to map it to a minimal set of routines.
							 | 
						|
								As explained latter, this mechanism has some limitations, and knowing them will allow
							 | 
						|
								you to write faster code by making your expressions more Eigen friendly.
							 | 
						|
								
							 | 
						|
								\section GEMM General Matrix-Matrix product (GEMM)
							 | 
						|
								
							 | 
						|
								Let's start with the most common primitive: the matrix product of general dense matrices.
							 | 
						|
								In the BLAS world this corresponds to the GEMM routine. Our equivalent primitive can
							 | 
						|
								perform the following operation:
							 | 
						|
								\f$ C.noalias() += \alpha op1(A) op2(B) \f$
							 | 
						|
								where A, B, and C are column and/or row major matrices (or sub-matrices),
							 | 
						|
								alpha is a scalar value, and op1, op2 can be transpose, adjoint, conjugate, or the identity.
							 | 
						|
								When Eigen detects a matrix product, it analyzes both sides of the product to extract a
							 | 
						|
								unique scalar factor alpha, and for each side, its effective storage order, shape, and conjugation states.
							 | 
						|
								More precisely each side is simplified by iteratively removing trivial expressions such as scalar multiple,
							 | 
						|
								negation and conjugation. Transpose and Block expressions are not evaluated and they only modify the storage order
							 | 
						|
								and shape. All other expressions are immediately evaluated.
							 | 
						|
								For instance, the following expression:
							 | 
						|
								\code m1.noalias() -= s4 * (s1 * m2.adjoint() * (-(s3*m3).conjugate()*s2))  \endcode
							 | 
						|
								is automatically simplified to:
							 | 
						|
								\code m1.noalias() += (s1*s2*conj(s3)*s4) * m2.adjoint() * m3.conjugate() \endcode
							 | 
						|
								which exactly matches our GEMM routine.
							 | 
						|
								
							 | 
						|
								\subsection GEMM_Limitations Limitations
							 | 
						|
								Unfortunately, this simplification mechanism is not perfect yet and not all expressions which could be
							 | 
						|
								handled by a single GEMM-like call are correctly detected.
							 | 
						|
								<table class="manual" style="width:100%">
							 | 
						|
								<tr>
							 | 
						|
								<th>Not optimal expression</th>
							 | 
						|
								<th>Evaluated as</th>
							 | 
						|
								<th>Optimal version (single evaluation)</th>
							 | 
						|
								<th>Comments</th>
							 | 
						|
								</tr>
							 | 
						|
								<tr>
							 | 
						|
								<td>\code
							 | 
						|
								m1 += m2 * m3; \endcode</td>
							 | 
						|
								<td>\code
							 | 
						|
								temp = m2 * m3;
							 | 
						|
								m1 += temp; \endcode</td>
							 | 
						|
								<td>\code
							 | 
						|
								m1.noalias() += m2 * m3; \endcode</td>
							 | 
						|
								<td>Use .noalias() to tell Eigen the result and right-hand-sides do not alias. 
							 | 
						|
								    Otherwise the product m2 * m3 is evaluated into a temporary.</td>
							 | 
						|
								</tr>
							 | 
						|
								<tr class="alt">
							 | 
						|
								<td></td>
							 | 
						|
								<td></td>
							 | 
						|
								<td>\code
							 | 
						|
								m1.noalias() += s1 * (m2 * m3); \endcode</td>
							 | 
						|
								<td>This is a special feature of Eigen. Here the product between a scalar
							 | 
						|
								    and a matrix product does not evaluate the matrix product but instead it
							 | 
						|
								    returns a matrix product expression tracking the scalar scaling factor. <br>
							 | 
						|
								    Without this optimization, the matrix product would be evaluated into a
							 | 
						|
								    temporary as in the next example.</td>
							 | 
						|
								</tr>
							 | 
						|
								<tr>
							 | 
						|
								<td>\code
							 | 
						|
								m1.noalias() += (m2 * m3).adjoint(); \endcode</td>
							 | 
						|
								<td>\code
							 | 
						|
								temp = m2 * m3;
							 | 
						|
								m1 += temp.adjoint(); \endcode</td>
							 | 
						|
								<td>\code
							 | 
						|
								m1.noalias() += m3.adjoint()
							 | 
						|
								              * m2.adjoint(); \endcode</td>
							 | 
						|
								<td>This is because the product expression has the EvalBeforeNesting bit which
							 | 
						|
								    enforces the evaluation of the product by the Tranpose expression.</td>
							 | 
						|
								</tr>
							 | 
						|
								<tr class="alt">
							 | 
						|
								<td>\code
							 | 
						|
								m1 = m1 + m2 * m3; \endcode</td>
							 | 
						|
								<td>\code
							 | 
						|
								temp = m2 * m3;
							 | 
						|
								m1 = m1 + temp; \endcode</td>
							 | 
						|
								<td>\code m1.noalias() += m2 * m3; \endcode</td>
							 | 
						|
								<td>Here there is no way to detect at compile time that the two m1 are the same,
							 | 
						|
								    and so the matrix product will be immediately evaluated.</td>
							 | 
						|
								</tr>
							 | 
						|
								<tr>
							 | 
						|
								<td>\code
							 | 
						|
								m1.noalias() = m4 + m2 * m3; \endcode</td>
							 | 
						|
								<td>\code
							 | 
						|
								temp = m2 * m3;
							 | 
						|
								m1 = m4 + temp; \endcode</td>
							 | 
						|
								<td>\code
							 | 
						|
								m1 = m4;
							 | 
						|
								m1.noalias() += m2 * m3; \endcode</td>
							 | 
						|
								<td>First of all, here the .noalias() in the first expression is useless because
							 | 
						|
								    m2*m3 will be evaluated anyway. However, note how this expression can be rewritten
							 | 
						|
								    so that no temporary is required. (tip: for very small fixed size matrix
							 | 
						|
								    it is slighlty better to rewrite it like this: m1.noalias() = m2 * m3; m1 += m4;</td>
							 | 
						|
								</tr>
							 | 
						|
								<tr class="alt">
							 | 
						|
								<td>\code
							 | 
						|
								m1.noalias() += (s1*m2).block(..) * m3; \endcode</td>
							 | 
						|
								<td>\code
							 | 
						|
								temp = (s1*m2).block(..);
							 | 
						|
								m1 += temp * m3; \endcode</td>
							 | 
						|
								<td>\code
							 | 
						|
								m1.noalias() += s1 * m2.block(..) * m3; \endcode</td>
							 | 
						|
								<td>This is because our expression analyzer is currently not able to extract trivial
							 | 
						|
								    expressions nested in a Block expression. Therefore the nested scalar
							 | 
						|
								    multiple cannot be properly extracted.</td>
							 | 
						|
								</tr>
							 | 
						|
								</table>
							 | 
						|
								
							 | 
						|
								Of course all these remarks hold for all other kind of products involving triangular or selfadjoint matrices.
							 | 
						|
								
							 | 
						|
								*/
							 | 
						|
								
							 | 
						|
								}
							 |