6765:71584fd55762 # Made the blocking computation aware of the l3 cache; Also optimized the blocking parameters to take into account the number of threads used for a computation
8789:efcb912e4356 # Made the index type a template parameter to evaluateProductBlockingSizes. Use numext::mini and numext::maxi instead of std::min/std::max to compute blocking sizes
8972:81d53c711775 # Don't optimize the processing of the last rows of a matrix matrix product in cases that violate the assumptions made by the optimized code path
10442:e3f17da72a40 # Bug 1435: fix aliasing issue in exressions like: A = C - B*A;
10735:6913f0cf7d06 # Adds missing EIGEN_STRONG_INLINE to support MSVC properly inlining small vector calculations
10943:4db388d946bd # Bug 1562: optimize evaluation of small products of the form s*A*B by rewriting them as: s*(A.lazyProduct(B)) to save a costly temporary. Measured speedup from 2x to 5x.
10961:5007ff66c9f6 # Introduce the macro ei_declare_local_nested_eval to help allocating on the stack local temporaries via alloca, and let outer-products makes a good use of it.
11083:30a528a984bb # Bug 1578: Improve prefetching in matrix multiplication on MIPS.
11533:71609c41e9f8 # PR 526: Speed up multiplication of small, dynamically sized matrices
11535:6d348dc9b092 # Vectorize row-by-row gebp loop iterations on 16 packets as well
11568:efda481cbd7a # Bug 1624: improve matrix-matrix product on ARM 64, 20% speedup
11596:b8d3f548a9d9 # do not read buffers out of bounds
11628:22f9cc0079bd # Implement AVX512 vectorization of std::complex<float/double>
11638:81172653b67b # Bug 1515: disable gebp's 3pX4 micro kernel for MSVC<=19.14 because of register spilling.
11659:b500fef42ced # Artificially increase l1-blocking size for AVX512. +10% speedup with current kernels.