Go to file
Ilya Tokar 231ce21535 Run two independent chains, when reducing tensors.
Running two chains exposes more instruction level parallelism,
by allowing to execute both chains at the same time.

Results are a bit noisy, but for medium length we almost hit
theoretical upper bound of 2x.

BM_fullReduction_16T/3        [using 16 threads]       17.3ns ±11%        17.4ns ± 9%        ~           (p=0.178 n=18+19)
BM_fullReduction_16T/4        [using 16 threads]       17.6ns ±17%        17.0ns ±18%        ~           (p=0.835 n=20+19)
BM_fullReduction_16T/7        [using 16 threads]       18.9ns ±12%        18.2ns ±10%        ~           (p=0.756 n=20+18)
BM_fullReduction_16T/8        [using 16 threads]       19.8ns ±13%        19.4ns ±21%        ~           (p=0.512 n=20+20)
BM_fullReduction_16T/10       [using 16 threads]       23.5ns ±15%        20.8ns ±24%     -11.37%        (p=0.000 n=20+19)
BM_fullReduction_16T/15       [using 16 threads]       35.8ns ±21%        26.9ns ±17%     -24.76%        (p=0.000 n=20+19)
BM_fullReduction_16T/16       [using 16 threads]       38.7ns ±22%        27.7ns ±18%     -28.40%        (p=0.000 n=20+19)
BM_fullReduction_16T/31       [using 16 threads]        146ns ±17%          74ns ±11%     -49.05%        (p=0.000 n=20+18)
BM_fullReduction_16T/32       [using 16 threads]        154ns ±19%          84ns ±30%     -45.79%        (p=0.000 n=20+19)
BM_fullReduction_16T/64       [using 16 threads]        603ns ± 8%         308ns ±12%     -48.94%        (p=0.000 n=17+17)
BM_fullReduction_16T/128      [using 16 threads]       2.44µs ±13%        1.22µs ± 1%     -50.29%        (p=0.000 n=17+17)
BM_fullReduction_16T/256      [using 16 threads]       9.84µs ±14%        5.13µs ±30%     -47.82%        (p=0.000 n=19+19)
BM_fullReduction_16T/512      [using 16 threads]       78.0µs ± 9%        56.1µs ±17%     -28.02%        (p=0.000 n=18+20)
BM_fullReduction_16T/1k       [using 16 threads]        325µs ± 5%         263µs ± 4%     -19.00%        (p=0.000 n=20+16)
BM_fullReduction_16T/2k       [using 16 threads]       1.09ms ± 3%        0.99ms ± 1%      -9.04%        (p=0.000 n=20+20)
BM_fullReduction_16T/4k       [using 16 threads]       7.66ms ± 3%        7.57ms ± 3%      -1.24%        (p=0.017 n=20+20)
BM_fullReduction_16T/10k      [using 16 threads]       65.3ms ± 4%        65.0ms ± 3%        ~           (p=0.718 n=20+20)
2020-06-16 15:55:11 -04:00
bench Fix #1911: add benchmark for move semantics with fixed-size matrix 2020-06-11 23:43:25 +00:00
blas STYLE: Remove CMake-language block-end command arguments 2019-10-31 11:36:27 -05:00
cmake Update FindComputeCpp.cmake to fix build problems on Windows 2020-06-05 20:51:20 +00:00
debug MIsc. source and comment typos 2018-03-11 10:01:44 -04:00
demos Make file formatting comply with POSIX and Unix standards 2020-03-23 18:09:02 +00:00
doc Possibility to specify user-defined default cache sizes for GEBP kernel 2020-05-08 12:54:36 +02:00
Eigen Fix pscatter and pgather for Altivec Complex double 2020-06-16 16:41:02 -03:00
failtest Make file formatting comply with POSIX and Unix standards 2020-03-23 18:09:02 +00:00
lapack STYLE: Convert CMake-language commands to lower case 2019-10-31 11:36:37 -05:00
scripts Replace calls to "hg" by calls to "git" 2019-12-04 11:24:06 +01:00
test Fix #1911: add benchmark for move semantics with fixed-size matrix 2020-06-11 23:43:25 +00:00
unsupported Run two independent chains, when reducing tensors. 2020-06-16 15:55:11 -04:00
.gitignore Renamed .hgignore to .gitignore (removing hg-specific "syntax" line) 2019-12-13 19:40:57 +01:00
.hgeol Added a pattern which forces LF line endings for *.sh files. 2013-07-31 18:20:58 +02:00
CMakeLists.txt Bug #1767: increase required cmake version to 3.5.0 2020-05-31 00:31:09 +02:00
COPYING.BSD Make file formatting comply with POSIX and Unix standards 2020-03-23 18:09:02 +00:00
COPYING.GPL there's no reason why we should follow the FSF's stupid recommendation for the naming of these files, right? This could give the wrong impression that Eigen is only GPL-licensed. 2009-11-14 23:26:07 -05:00
COPYING.LGPL Replace COPYING.LGPL by a copy of the LGPL 2.1 (instead of LGPL 3). 2012-09-10 13:27:44 -04:00
COPYING.MINPACK Make file formatting comply with POSIX and Unix standards 2020-03-23 18:09:02 +00:00
COPYING.MPL2 add COPYING.MPL2 2012-07-15 10:20:59 -04:00
COPYING.README Replace COPYING.LGPL by a copy of the LGPL 2.1 (instead of LGPL 3). 2012-09-10 13:27:44 -04:00
CTestConfig.cmake STYLE: Convert CMake-language commands to lower case 2019-10-31 11:36:37 -05:00
CTestCustom.cmake.in Allow to filter out build-error messages 2018-07-24 20:12:49 +02:00
eigen3.pc.in Further fixes for CMAKE_INSTALL_PREFIX correctness 2015-11-07 21:29:24 -05:00
INSTALL finally, the right fix: set CTEST_BUILD_TARGET. 2009-10-04 20:27:44 -04:00
README.md Update old links to bitbucket to point to gitlab.com 2019-12-04 10:57:07 +01:00
signature_of_eigen3_matrix_library improve the scripts for building unit tests: 2009-11-25 21:26:37 -05:00

Eigen is a C++ template library for linear algebra: matrices, vectors, numerical solvers, and related algorithms.

For more information go to http://eigen.tuxfamily.org/.

For pull request, bug reports, and feature requests, go to https://gitlab.com/libeigen/eigen.