Go to file
Rasmus Munk Larsen 2fd8a5a08f Add parallelization of TensorScanOp for types without packet ops.
Clean up the code a bit and do a few micro-optimizations to improve performance for small tensors.

Benchmark numbers for Tensor<uint32_t>:

name                                                       old time/op             new time/op             delta
BM_cumSumRowReduction_1T/8   [using 1 threads]             76.5ns ± 0%             61.3ns ± 4%    -19.80%          (p=0.008 n=5+5)
BM_cumSumRowReduction_1T/64  [using 1 threads]             2.47µs ± 1%             2.40µs ± 1%     -2.77%          (p=0.008 n=5+5)
BM_cumSumRowReduction_1T/256 [using 1 threads]             39.8µs ± 0%             39.6µs ± 0%     -0.60%          (p=0.008 n=5+5)
BM_cumSumRowReduction_1T/4k  [using 1 threads]             13.9ms ± 0%             13.4ms ± 1%     -4.19%          (p=0.008 n=5+5)
BM_cumSumRowReduction_2T/8   [using 2 threads]             76.8ns ± 0%             59.1ns ± 0%    -23.09%          (p=0.016 n=5+4)
BM_cumSumRowReduction_2T/64  [using 2 threads]             2.47µs ± 1%             2.41µs ± 1%     -2.53%          (p=0.008 n=5+5)
BM_cumSumRowReduction_2T/256 [using 2 threads]             39.8µs ± 0%             34.7µs ± 6%    -12.74%          (p=0.008 n=5+5)
BM_cumSumRowReduction_2T/4k  [using 2 threads]             13.8ms ± 1%              7.2ms ± 6%    -47.74%          (p=0.008 n=5+5)
BM_cumSumRowReduction_8T/8   [using 8 threads]             76.4ns ± 0%             61.8ns ± 3%    -19.02%          (p=0.008 n=5+5)
BM_cumSumRowReduction_8T/64  [using 8 threads]             2.47µs ± 1%             2.40µs ± 1%     -2.84%          (p=0.008 n=5+5)
BM_cumSumRowReduction_8T/256 [using 8 threads]             39.8µs ± 0%             28.3µs ±11%    -28.75%          (p=0.008 n=5+5)
BM_cumSumRowReduction_8T/4k  [using 8 threads]             13.8ms ± 0%              2.7ms ± 5%    -80.39%          (p=0.008 n=5+5)
BM_cumSumColReduction_1T/8   [using 1 threads]             59.1ns ± 0%             80.3ns ± 0%    +35.94%          (p=0.029 n=4+4)
BM_cumSumColReduction_1T/64  [using 1 threads]             3.06µs ± 0%             3.08µs ± 1%       ~             (p=0.114 n=4+4)
BM_cumSumColReduction_1T/256 [using 1 threads]              175µs ± 0%              176µs ± 0%       ~             (p=0.190 n=4+5)
BM_cumSumColReduction_1T/4k  [using 1 threads]              824ms ± 1%              844ms ± 1%     +2.37%          (p=0.008 n=5+5)
BM_cumSumColReduction_2T/8   [using 2 threads]             59.0ns ± 0%             90.7ns ± 0%    +53.74%          (p=0.029 n=4+4)
BM_cumSumColReduction_2T/64  [using 2 threads]             3.06µs ± 0%             3.10µs ± 0%     +1.08%          (p=0.016 n=4+5)
BM_cumSumColReduction_2T/256 [using 2 threads]              176µs ± 0%              189µs ±18%       ~             (p=0.151 n=5+5)
BM_cumSumColReduction_2T/4k  [using 2 threads]              836ms ± 2%              611ms ±14%    -26.92%          (p=0.008 n=5+5)
BM_cumSumColReduction_8T/8   [using 8 threads]             59.3ns ± 2%             90.6ns ± 0%    +52.79%          (p=0.008 n=5+5)
BM_cumSumColReduction_8T/64  [using 8 threads]             3.07µs ± 0%             3.10µs ± 0%     +0.99%          (p=0.016 n=5+4)
BM_cumSumColReduction_8T/256 [using 8 threads]              176µs ± 0%               80µs ±19%    -54.51%          (p=0.008 n=5+5)
BM_cumSumColReduction_8T/4k  [using 8 threads]              827ms ± 2%              180ms ±14%    -78.24%          (p=0.008 n=5+5)
2020-05-06 14:48:37 -07:00
bench Fix perf monitoring merge function 2020-04-28 17:02:59 +00:00
blas STYLE: Remove CMake-language block-end command arguments 2019-10-31 11:36:27 -05:00
cmake [SYCL] Rebasing the SYCL support branch on top of the Einge upstream master branch. 2019-11-28 10:08:54 +00:00
debug MIsc. source and comment typos 2018-03-11 10:01:44 -04:00
demos Make file formatting comply with POSIX and Unix standards 2020-03-23 18:09:02 +00:00
doc Update PreprocessorDirectives.dox - Added line for the new VectorwiseOp plugin directive (and re-alphabatized the plugin section) 2020-04-17 21:43:37 +00:00
Eigen Fix confusing template param name for Stride fwd decl. 2020-04-30 01:43:05 +00:00
failtest Make file formatting comply with POSIX and Unix standards 2020-03-23 18:09:02 +00:00
lapack STYLE: Convert CMake-language commands to lower case 2019-10-31 11:36:37 -05:00
scripts Replace calls to "hg" by calls to "git" 2019-12-04 11:24:06 +01:00
test Extend support for Packet16b: 2020-04-28 16:12:47 +00:00
unsupported Add parallelization of TensorScanOp for types without packet ops. 2020-05-06 14:48:37 -07:00
.gitignore Renamed .hgignore to .gitignore (removing hg-specific "syntax" line) 2019-12-13 19:40:57 +01:00
.hgeol Added a pattern which forces LF line endings for *.sh files. 2013-07-31 18:20:58 +02:00
CMakeLists.txt Don't restrict CMAKE_BUILD_TYPE 2020-02-28 20:46:53 +00:00
COPYING.BSD Make file formatting comply with POSIX and Unix standards 2020-03-23 18:09:02 +00:00
COPYING.GPL there's no reason why we should follow the FSF's stupid recommendation for the naming of these files, right? This could give the wrong impression that Eigen is only GPL-licensed. 2009-11-14 23:26:07 -05:00
COPYING.LGPL Replace COPYING.LGPL by a copy of the LGPL 2.1 (instead of LGPL 3). 2012-09-10 13:27:44 -04:00
COPYING.MINPACK Make file formatting comply with POSIX and Unix standards 2020-03-23 18:09:02 +00:00
COPYING.MPL2 add COPYING.MPL2 2012-07-15 10:20:59 -04:00
COPYING.README Replace COPYING.LGPL by a copy of the LGPL 2.1 (instead of LGPL 3). 2012-09-10 13:27:44 -04:00
CTestConfig.cmake STYLE: Convert CMake-language commands to lower case 2019-10-31 11:36:37 -05:00
CTestCustom.cmake.in Allow to filter out build-error messages 2018-07-24 20:12:49 +02:00
eigen3.pc.in Further fixes for CMAKE_INSTALL_PREFIX correctness 2015-11-07 21:29:24 -05:00
INSTALL finally, the right fix: set CTEST_BUILD_TARGET. 2009-10-04 20:27:44 -04:00
README.md Update old links to bitbucket to point to gitlab.com 2019-12-04 10:57:07 +01:00
signature_of_eigen3_matrix_library improve the scripts for building unit tests: 2009-11-25 21:26:37 -05:00

Eigen is a C++ template library for linear algebra: matrices, vectors, numerical solvers, and related algorithms.

For more information go to http://eigen.tuxfamily.org/.

For pull request, bug reports, and feature requests, go to https://gitlab.com/libeigen/eigen.