eigen

CFD/eigen

Author	SHA1	Message	Date
Srinivas Vasudevan	facdec5aa7	Add packetized versions of i0e and i1e special functions. - In particular refactor the i0e and i1e code so scalar and vectorized path share code. - Move chebevl to GenericPacketMathFunctions. A brief benchmark with building Eigen with FMA, AVX and AVX2 flags Before: CPU: Intel Haswell with HyperThreading (6 cores) Benchmark Time(ns) CPU(ns) Iterations ----------------------------------------------------------------- BM_eigen_i0e_double/1 57.3 57.3 10000000 BM_eigen_i0e_double/8 398 398 1748554 BM_eigen_i0e_double/64 3184 3184 218961 BM_eigen_i0e_double/512 25579 25579 27330 BM_eigen_i0e_double/4k 205043 205042 3418 BM_eigen_i0e_double/32k 1646038 1646176 422 BM_eigen_i0e_double/256k 13180959 13182613 53 BM_eigen_i0e_double/1M 52684617 52706132 10 BM_eigen_i0e_float/1 28.4 28.4 24636711 BM_eigen_i0e_float/8 75.7 75.7 9207634 BM_eigen_i0e_float/64 512 512 1000000 BM_eigen_i0e_float/512 4194 4194 166359 BM_eigen_i0e_float/4k 32756 32761 21373 BM_eigen_i0e_float/32k 261133 261153 2678 BM_eigen_i0e_float/256k 2087938 2088231 333 BM_eigen_i0e_float/1M 8380409 8381234 84 BM_eigen_i1e_double/1 56.3 56.3 10000000 BM_eigen_i1e_double/8 397 397 1772376 BM_eigen_i1e_double/64 3114 3115 223881 BM_eigen_i1e_double/512 25358 25361 27761 BM_eigen_i1e_double/4k 203543 203593 3462 BM_eigen_i1e_double/32k 1613649 1613803 428 BM_eigen_i1e_double/256k 12910625 12910374 54 BM_eigen_i1e_double/1M 51723824 51723991 10 BM_eigen_i1e_float/1 28.3 28.3 24683049 BM_eigen_i1e_float/8 74.8 74.9 9366216 BM_eigen_i1e_float/64 505 505 1000000 BM_eigen_i1e_float/512 4068 4068 171690 BM_eigen_i1e_float/4k 31803 31806 21948 BM_eigen_i1e_float/32k 253637 253692 2763 BM_eigen_i1e_float/256k 2019711 2019918 346 BM_eigen_i1e_float/1M 8238681 8238713 86 After: CPU: Intel Haswell with HyperThreading (6 cores) Benchmark Time(ns) CPU(ns) Iterations ----------------------------------------------------------------- BM_eigen_i0e_double/1 15.8 15.8 44097476 BM_eigen_i0e_double/8 99.3 99.3 7014884 BM_eigen_i0e_double/64 777 777 886612 BM_eigen_i0e_double/512 6180 6181 100000 BM_eigen_i0e_double/4k 48136 48140 14678 BM_eigen_i0e_double/32k 385936 385943 1801 BM_eigen_i0e_double/256k 3293324 3293551 228 BM_eigen_i0e_double/1M 12423600 12424458 57 BM_eigen_i0e_float/1 16.3 16.3 43038042 BM_eigen_i0e_float/8 30.1 30.1 23456931 BM_eigen_i0e_float/64 169 169 4132875 BM_eigen_i0e_float/512 1338 1339 516860 BM_eigen_i0e_float/4k 10191 10191 68513 BM_eigen_i0e_float/32k 81338 81337 8531 BM_eigen_i0e_float/256k 651807 651984 1000 BM_eigen_i0e_float/1M 2633821 2634187 268 BM_eigen_i1e_double/1 16.2 16.2 42352499 BM_eigen_i1e_double/8 110 110 6316524 BM_eigen_i1e_double/64 822 822 851065 BM_eigen_i1e_double/512 6480 6481 100000 BM_eigen_i1e_double/4k 51843 51843 10000 BM_eigen_i1e_double/32k 414854 414852 1680 BM_eigen_i1e_double/256k 3320001 3320568 212 BM_eigen_i1e_double/1M 13442795 13442391 53 BM_eigen_i1e_float/1 17.6 17.6 41025735 BM_eigen_i1e_float/8 35.5 35.5 19597891 BM_eigen_i1e_float/64 240 240 2924237 BM_eigen_i1e_float/512 1424 1424 485953 BM_eigen_i1e_float/4k 10722 10723 65162 BM_eigen_i1e_float/32k 86286 86297 8048 BM_eigen_i1e_float/256k 691821 691868 1000 BM_eigen_i1e_float/1M 2777336 2777747 256 This shows anywhere from a 50% to 75% improvement on these operations. I've also benchmarked without any of these flags turned on, and got similar performance to before (if not better). Also tested packetmath.cpp + special_functions to ensure no regressions.	2019-09-11 18:34:02 -07:00
Srinivas Vasudevan	99036a3615	Merging from eigen/eigen.	2019-09-03 15:34:47 -04:00
Srinivas Vasudevan	18ceb3413d	Add ndtri function, the inverse of the normal distribution function.	2019-08-12 19:26:29 -04:00
Rasmus Munk Larsen	1187bb65ad	Add more tests for corner cases of log1p and expm1. Add handling of infinite arguments to log1p such that log1p(inf) = inf.	2019-08-28 12:20:21 -07:00
Rasmus Munk Larsen	9aba527405	Revert changes to std_falback::log1p that broke handling of arguments less than -1. Fix packet op accordingly.	2019-08-27 15:35:29 -07:00
Rasmus Munk Larsen	a3298b22ec	Implement vectorized versions of log1p and expm1 in Eigen using Kahan's formulas, and change the scalar implementations to properly handle infinite arguments. Depending on instruction set, significant speedups are observed for the vectorized path: log1p wall time is reduced 60-93% (2.5x - 15x speedup) expm1 wall time is reduced 0-85% (1x - 7x speedup) The scalar path is slower by 20-30% due to the extra branch needed to handle +infinity correctly. Full benchmarks measured on Intel(R) Xeon(R) Gold 6154 here: https://bitbucket.org/snippets/rmlarsen/MXBkpM	2019-08-12 13:53:28 -07:00
Rasmus Munk Larsen	988f24b730	Various fixes for packet ops. 1. Fix buggy pcmp_eq and unit test for half types. 2. Add unit test for pselect and add specializations for SSE 4.1, AVX512, and half types. 3. Get rid of FIXME: Implement faster pnegate for half by XOR'ing with a sign bit mask.	2019-06-20 11:47:49 -07:00
Eugene Zhulenev	e9f0eb8a5e	Add masked_store_available to unpacket_traits	2019-05-02 14:52:58 -07:00
Eugene Zhulenev	b4010f02f9	Add masked pstoreu to AVX and AVX512 PacketMath	2019-05-02 13:14:18 -07:00
Anuj Rawat	8c7a6feb8e	Adding lowlevel APIs for optimized RHS packet load in TensorFlow SpatialConvolution Low-level APIs are added in order to optimized packet load in gemm_pack_rhs in TensorFlow SpatialConvolution. The optimization is for scenario when a packet is split across 2 adjacent columns. In this case we read it as two 'partial' packets and then merge these into 1. Currently this only works for Packet16f (AVX512) and Packet8f (AVX2). We plan to add this for other packet types (such as Packet8d) also. This optimization shows significant speedup in SpatialConvolution with certain parameters. Some examples are below. Benchmark parameters are specified as: Batch size, Input dim, Depth, Num of filters, Filter dim Speedup numbers are specified for number of threads 1, 2, 4, 8, 16. AVX512: Parameters \| Speedup (Num of threads: 1, 2, 4, 8, 16) ----------------------------\|------------------------------------------ 128, 24x24, 3, 64, 5x5 \|2.18X, 2.13X, 1.73X, 1.64X, 1.66X 128, 24x24, 1, 64, 8x8 \|2.00X, 1.98X, 1.93X, 1.91X, 1.91X 32, 24x24, 3, 64, 5x5 \|2.26X, 2.14X, 2.17X, 2.22X, 2.33X 128, 24x24, 3, 64, 3x3 \|1.51X, 1.45X, 1.45X, 1.67X, 1.57X 32, 14x14, 24, 64, 5x5 \|1.21X, 1.19X, 1.16X, 1.70X, 1.17X 128, 128x128, 3, 96, 11x11 \|2.17X, 2.18X, 2.19X, 2.20X, 2.18X AVX2: Parameters \| Speedup (Num of threads: 1, 2, 4, 8, 16) ----------------------------\|------------------------------------------ 128, 24x24, 3, 64, 5x5 \| 1.66X, 1.65X, 1.61X, 1.56X, 1.49X 32, 24x24, 3, 64, 5x5 \| 1.71X, 1.63X, 1.77X, 1.58X, 1.68X 128, 24x24, 1, 64, 5x5 \| 1.44X, 1.40X, 1.38X, 1.37X, 1.33X 128, 24x24, 3, 64, 3x3 \| 1.68X, 1.63X, 1.58X, 1.56X, 1.62X 128, 128x128, 3, 96, 11x11 \| 1.36X, 1.36X, 1.37X, 1.37X, 1.37X In the higher level benchmark cifar10, we observe a runtime improvement of around 6% for AVX512 on Intel Skylake server (8 cores). On lower level PackRhs micro-benchmarks specified in TensorFlow tensorflow/core/kernels/eigen_spatial_convolutions_test.cc, we observe the following runtime numbers: AVX512: Parameters \| Runtime without patch (ns) \| Runtime with patch (ns) \| Speedup ---------------------------------------------------------------\|----------------------------\|-------------------------\|--------- BM_RHS_NAME(PackRhs, 128, 24, 24, 3, 64, 5, 5, 1, 1, 256, 56) \| 41350 \| 15073 \| 2.74X BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 1, 1, 256, 56) \| 7277 \| 7341 \| 0.99X BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 2, 2, 256, 56) \| 8675 \| 8681 \| 1.00X BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 1, 1, 256, 56) \| 24155 \| 16079 \| 1.50X BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 2, 2, 256, 56) \| 25052 \| 17152 \| 1.46X BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 1, 1, 256, 56) \| 18269 \| 18345 \| 1.00X BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 2, 4, 256, 56) \| 19468 \| 19872 \| 0.98X BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 1, 1, 36, 432) \| 156060 \| 42432 \| 3.68X BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 2, 2, 36, 432) \| 132701 \| 36944 \| 3.59X AVX2: Parameters \| Runtime without patch (ns) \| Runtime with patch (ns) \| Speedup ---------------------------------------------------------------\|----------------------------\|-------------------------\|--------- BM_RHS_NAME(PackRhs, 128, 24, 24, 3, 64, 5, 5, 1, 1, 256, 56) \| 26233 \| 12393 \| 2.12X BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 1, 1, 256, 56) \| 6091 \| 6062 \| 1.00X BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 2, 2, 256, 56) \| 7427 \| 7408 \| 1.00X BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 1, 1, 256, 56) \| 23453 \| 20826 \| 1.13X BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 2, 2, 256, 56) \| 23167 \| 22091 \| 1.09X BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 1, 1, 256, 56) \| 23422 \| 23682 \| 0.99X BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 2, 4, 256, 56) \| 23165 \| 23663 \| 0.98X BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 1, 1, 36, 432) \| 72689 \| 44969 \| 1.62X BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 2, 2, 36, 432) \| 61732 \| 39779 \| 1.55X All benchmarks on Intel Skylake server with 8 cores.	2019-04-20 06:46:43 +00:00
Gael Guennebaud	61b6eb05fe	AVX512 (r)sqrt(double) was mistakenly disabled with clang and others	2019-01-14 17:28:47 +01:00
Rasmus Munk Larsen	fcfced13ed	Rename pones -> ptrue. Use _CMP_TRUE_UQ where appropriate.	2019-01-09 17:20:33 -08:00
Rasmus Munk Larsen	8f04442526	Collapsed revision * Collapsed revision * Add packet up "pones". Write pnot(a) as pxor(pones(a), a). * Collapsed revision * Simplify a bit. * Undo useless diffs. * Fix typo.	2019-01-09 16:34:23 -08:00
Rasmus Munk Larsen	cb955df9a6	Add packet up "pones". Write pnot(a) as pxor(pones(a), a).	2019-01-09 16:17:08 -08:00
Rasmus Larsen	cb3c059fa4	Merged eigen/eigen into default	2019-01-09 15:04:17 -08:00
Gael Guennebaud	e6b217b8dd	bug #1652 : implements a much more accurate version of vectorized sin/cos. This new version achieve same speed for SSE/AVX, and is slightly faster with FMA. Guarantees are as follows: - no FMA: 1ULP up to 3pi, 2ULP up to sin(25966) and cos(18838), fallback to std::sin/cos for larger inputs - FMA: 1ULP up to sin(117435.992) and cos(71476.0625), fallback to std::sin/cos for larger inputs	2019-01-09 15:25:17 +01:00
Rasmus Munk Larsen	055f0b73db	Add support for pcmp_eq and pnot, including for complex types.	2019-01-07 16:53:36 -08:00
Gael Guennebaud	697fba3bb0	Fix unit test	2018-12-27 11:20:47 +01:00
Gael Guennebaud	0f6f75bd8a	Implement a faster fix for sin/cos of large entries that also correctly handle INF input.	2018-12-23 17:26:21 +01:00
Gael Guennebaud	38d704def8	Make sure that psin/pcos return number in [-1,1] for large inputs (though sin/cos on large entries is quite useless because it's inaccurate)	2018-12-23 16:13:24 +01:00
Gael Guennebaud	5713fb7feb	Fix plog(+INF): it returned ~87 instead of +INF	2018-12-23 15:40:52 +01:00
Gael Guennebaud	81c27325ae	bug #1641 : fix testing of pandnot and fix pandnot for complex on SSE/AVX/AVX512	2018-12-08 14:27:48 +01:00
Gael Guennebaud	c53eececb0	Implement AVX512 vectorization of std::complex<float/double>	2018-12-06 15:58:06 +01:00
Gael Guennebaud	69ace742be	Several improvements regarding packet-bitwise operations: - add unit tests - optimize their AVX512f implementation - add missing implementations (half, Packet4f, ...)	2018-11-30 15:56:08 +01:00
Gael Guennebaud	382279eb7f	Extend unit test to recursively check half-packet types and non packet types	2018-11-26 14:10:07 +01:00
Gael Guennebaud	626942d9dd	fix alignment issue in ploaddup for AVX512	2018-09-28 16:57:32 +02:00
Gael Guennebaud	eeeb18814f	Fix warning	2018-09-20 17:48:56 +02:00
Gael Guennebaud	82f0ce2726	Get rid of EIGEN_TEST_FUNC, unit tests must now be declared with EIGEN_DECLARE_TEST(mytest) { /* code */ }. This provide several advantages: - more flexibility in designing unit tests - unit tests can be glued to speed up compilation - unit tests are compiled with same predefined macros, which is a requirement for zapcc	2018-07-17 14:46:15 +02:00
Gael Guennebaud	a937c50208	palign is not used anymore, so let's relax the unit test	2018-07-06 17:41:52 +02:00
Gael Guennebaud	f4d623ffa7	Complete Packet8h implementation and test it in packetmath unit test	2018-07-06 17:13:36 +02:00
Gael Guennebaud	097dd4616d	Fix unit test for SIMD engine not supporting sqrt	2018-04-26 10:47:39 +02:00
Gael Guennebaud	584951ca4d	Rename predux_downto4 to be more accurate on its semantic.	2018-04-03 14:28:38 +02:00
Gael Guennebaud	d43b2f01f4	Fix unit testing of predux_downto4 (bad name), and add unit testing of prsqrt	2018-04-03 14:14:00 +02:00
luz.paz	e3912f5e63	MIsc. source and comment typos Found using `codespell` and `grep` from downstream FreeCAD	2018-03-11 10:01:44 -04:00
Srinivas Vasudevan	218764ee1f	Added support for expm1 in Eigen.	2016-12-02 14:13:01 -08:00
Konstantinos Margaritis	a1d5c503fa	replace sizeof(Packet) with PacketSize else it breaks for ZVector.Packet4f	2016-11-17 13:27:45 -05:00
Benoit Steiner	c80587c92b	Merged eigen/eigen into default	2016-11-03 03:55:11 -07:00
Gael Guennebaud	598de8b193	Add pinsertfirst function and implement pinsertlast for complex on SSE/AVX.	2016-11-02 10:38:13 +01:00
Gael Guennebaud	13fc18d3a2	Add a pinsertlast function replacing the last entry of a packet by a scalar. (useful to vectorize LinSpaced)	2016-10-25 16:48:49 +02:00
Benoit Steiner	78d2926508	Merged eigen/eigen into default	2016-10-12 13:46:29 -07:00
Benoit Steiner	507b661106	Renamed predux_half into predux_downto4	2016-10-06 17:57:04 -07:00
Benoit Steiner	78b569f685	Merged latest updates from trunk	2016-10-05 18:48:55 -07:00
Rasmus Munk Larsen	3ed67cb0bb	Fix a bug in the implementation of Carmack's fast sqrt algorithm in Eigen (enabled by EIGEN_FAST_MATH), which causes the vectorized parts of the computation to return -0.0 instead of NaN for negative arguments. Benchmark speed in Giga-sqrts/s Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz ----------------------------------------- SSE AVX Fast=1 2.529G 4.380G Fast=0 1.944G 1.898G Fast=1 fixed 2.214G 3.739G This table illustrates the worst case in terms speed impact: It was measured by repeatedly computing the sqrt of an n=4096 float vector that fits in L1 cache. For large vectors the operation becomes memory bound and the differences between the different versions almost negligible.	2016-10-04 14:22:56 -07:00
Gael Guennebaud	66cbabafed	Add a note regarding gcc bug #72867	2016-09-22 11:18:52 +02:00
Gael Guennebaud	326320ec7b	Fix compilation in non C++11 mode.	2016-08-23 19:28:57 +02:00
Igor Babuschkin	aee693ac52	Add log1p support for CUDA and half floats	2016-08-08 20:24:59 +01:00
Benoit Steiner	03b71c273e	Made the packetmath test compile again. A better fix would be to move the special function tests to the unsupported directory where the code now resides.	2016-07-11 13:50:24 -07:00
Gael Guennebaud	35df3a32eb	Disabled GCC6's ignored-attributes warning in packetmath unit test.	2016-05-26 17:42:58 +02:00
Christoph Hertzberg	718521d5cf	Silenced several double-promotion warnings	2016-05-22 18:17:04 +02:00
Gael Guennebaud	1395056fc0	Make EIGEN_HAS_C99_MATH user configurable	2016-05-20 14:58:19 +02:00
Benoit Steiner	bf185c3c28	Extended the tests for ptanh	2016-05-10 16:21:43 -07:00
Christoph Hertzberg	dacb469bc9	Enable and fix -Wdouble-conversion warnings	2016-05-05 13:35:45 +02:00
Benoit Steiner	3b8da4be5a	Extended the packetmath test to cover all the alignments made possible by avx512 instructions.	2016-04-29 14:13:43 -07:00
Benoit Steiner	d6e596174d	Pull latest updates from upstream	2016-04-11 17:20:17 -07:00
Konstantinos Margaritis	644d0f91d2	enable all tests again	2016-04-05 05:59:54 -04:00
Konstantinos Margaritis	01e7298fe6	actually include ZVector files, passes most basic tests (float still fails)	2016-03-28 10:58:02 -04:00
Konstantinos Margaritis	ed6b9d08f1	some primitives ported, but missing intrinsics and crash with asm() are a problem	2016-03-27 18:47:49 -04:00
Benoit Steiner	1dfaafe28a	Added a regression test for tanh	2016-02-10 17:41:47 -08:00
Benoit Steiner	d93b71a301	Updated the packetmath test to call predux_half instead of predux4	2016-02-01 15:18:33 -08:00
Gael Guennebaud	ca39b1546e	Merged in ebrevdo/eigen (pull request PR-148) Add special functions to eigen: lgamma, erf, erfc.	2015-12-11 11:52:09 +01:00
Benoit Steiner	6acf2bd472	Fixed compilation error triggered by MSVC 2008	2015-12-10 17:17:42 -08:00
Benoit Steiner	48877a6933	Only implement the lgamma, erf, and erfc functions when using a compiler compliant with the C99 specification.	2015-12-10 13:09:49 -08:00
Gael Guennebaud	46d2f6cd78	Workaround gcc issue with -O3 and the i387 FPU.	2015-12-10 21:33:43 +01:00
Benoit Steiner	b630d10b62	Only disable the erf, erfc, and lgamma tests for older versions of c++.	2015-12-07 17:08:08 -08:00
Benoit Steiner	73b68d4370	Fixed a couple of typos Cleaned up the code a bit.	2015-12-07 16:38:48 -08:00
Eugene Brevdo	fa4f933c0f	Add special functions to Eigen: lgamma, erf, erfc. Includes CUDA support and unit tests.	2015-12-07 15:24:49 -08:00
Gael Guennebaud	90323f1751	Fix AVX round/ceil/floor, and fix respective unit test	2015-11-04 22:15:57 +01:00
Alexandre Avenel	d46e2c10a6	Add round, ceil and floor for SSE4.1/AVX (Bug #70 )	2015-11-01 10:49:27 +01:00
Gael Guennebaud	ea9749fd6c	Fix packetmath unit test for pdiv not being always defined	2015-10-13 09:53:46 +02:00
Gael Guennebaud	14458ec0a0	Fix packetmath unit test for exp and log	2015-09-02 15:47:58 +02:00
Gael Guennebaud	dc2c103b3b	merge	2015-08-16 14:22:02 +02:00
Christoph Hertzberg	d6a4805fdf	Protect further isnan/isfinite/isinf calls	2015-08-16 14:00:02 +02:00
Gael Guennebaud	6245591349	Fix prototype of plset and generalize linspace functor.	2015-08-07 19:27:59 +02:00
Gael Guennebaud	aec4814370	Many files were missing in previous changeset.	2015-07-29 11:11:23 +02:00
Benoit Steiner	3625734bc8	Moved some utilities to TensorMeta.h to make it easier to reuse them accross several tensor operations. Created the TensorDimensionList class to encode the list of all the dimensions of a tensor of rank n. This could be done using TensorIndexList, however TensorIndexList require cxx11 which isn't yet supported as widely as we'd like.	2015-06-29 10:49:55 -07:00
Gael Guennebaud	2a33075aeb	std::isnan is c++11 only	2015-06-24 10:29:17 +02:00
Benoit Steiner	6441befbb3	Added more checks to test the correctness of the pexp implementation	2015-06-23 19:12:46 -07:00
Gael Guennebaud	b0d5aaafcc	Rename free functions isFinite, isInf, isNaN to be compatible with c++11	2015-06-10 16:17:09 +02:00
Deanna Hood	8878e1c1de	Remove ambiguity with recent numext methods isNaN and isInf	2015-03-17 22:39:51 +10:00
Benoit Steiner	c739102ef9	Pulled the latest changes from the trunk	2015-02-06 05:25:03 -08:00
Christoph Hertzberg	84aaa03182	Addendum to bug #859 : pexp(NaN) for double did not return NaN, also, plog(NaN) did not return NaN. psqrt(NaN) and psqrt(-1) shall return NaN if EIGEN_FAST_MATH==0	2014-10-20 13:13:43 +02:00
Gael Guennebaud	aa5f79206f	Fix bug #859 : pexp(NaN) returned Inf instead of NaN	2014-10-20 11:38:51 +02:00
Konstantinos Margaritis	7ff266e3ce	Initial VSX commit	2014-08-29 20:03:49 +00:00
Benoit Steiner	16047c8d4a	Pulled in the latest changes from the Eigen trunk	2014-08-13 22:25:29 -07:00
Gael Guennebaud	62f948c56a	Generalize unit testing of pscatter	2014-07-09 16:01:24 +02:00
Benoit Steiner	4304c73542	Pulled latest updates from the Eigen main trunk.	2014-06-10 10:23:32 -07:00
Benoit Steiner	29aebf96e6	Created the pblend packet primitive and implemented it using SSE and AVX instructions.	2014-06-06 20:18:44 -07:00
Christoph Hertzberg	56de8d3816	Fixed unused variable warnings	2014-05-05 15:03:29 +02:00
Gael Guennebaud	7388fdf560	pbroadcast4/2 assume aligned memory	2014-04-25 02:46:22 -07:00
Gael Guennebaud	ae4d9434e2	Add unit test for pbroadcast4/2	2014-04-25 11:21:18 +02:00
Gael Guennebaud	3d8d0f6269	Enable vectorization of pack_rhs with a column-major RHS. Rename and generalize Kernel<> to PacketBlock<,N>.	2014-04-25 10:56:18 +02:00
Gael Guennebaud	45a4aad572	add unit tests for ploadquad and predux4, and split packetmath unit test wrt real/complex	2014-04-17 16:27:22 +02:00
Benoit Steiner	39bfbd43f0	Properly align the input data to prevent false failures of the packetmath.cpp test.	2014-03-28 12:00:08 -07:00
Benoit Steiner	8a94cb3edd	Implemented the SSE version of the gather and scatter packet primitives.	2014-03-27 18:29:01 -07:00
Benoit Steiner	ee86679096	Introduced pscatter/pgather packet primitives. They will be used to optimize the loop peeling code of the block-panel matrix multiplication kernel.	2014-03-27 16:03:03 -07:00
Benoit Steiner	a419cea4a0	Created the ptranspose packet primitive that can transpose an array of N packets, where N is the number of words in each packet. This primitive will be used to complete the vectorization of the gemm_pack_lhs and gemm_pack_rhs functions. Implemented the primitive using SSE instructions.	2014-03-26 19:03:07 -07:00
Benoit Steiner	7ed9441ea4	Reverted the definition of the EIGEN_ALIGN to its former meaning (i.e. a boolean) Created a new EIGEN_ALIGN_BYTES define to encode how the data should be aligned Fixed a few remaining alignment issues exposed when the Eigen code is compiled with avx enabled. Created a new EIGEN_ALIGN_DEFAULT define, which is set to the minimum alignment value required for the chosen instruction set. Use this value instead of EIGEN_ALIGN32 to preserve the existing alignment on SSE/Altivec/Neon.	2014-02-18 18:06:44 -08:00
Benoit Steiner	64a85800bd	Added support for AVX to Eigen.	2014-01-29 11:43:05 -08:00
Gael Guennebaud	3352b8d873	Extend the magnitude range of tested numbers in packet math unit tests	2013-06-13 18:12:58 +02:00
Gael Guennebaud	62670c83a0	Fix bug #314 : move remaining math functions from internal to numext namespace	2013-06-10 23:40:56 +02:00
Gael Guennebaud	f7e52d22d4	Fix missuse of unitialized values in unit tests	2013-04-10 09:46:16 +02:00
Gael Guennebaud	d63712163c	Add SSE4 min/max for integers	2013-03-20 18:28:40 +01:00
Gael Guennebaud	8745da14d8	Fix SSE plog<float> to return -INF on 0	2013-02-14 23:34:05 +01:00
Gael Guennebaud	a76fbbf397	Fix bug #314 : - remove most of the metaprogramming kung fu in MathFunctions.h (only keep functions that differs from the std) - remove the overloads for array expression that were in the std namespace	2012-11-06 15:25:50 +01:00
Benoit Jacob	69124cfca2	Automatic relicensing to MPL2 using Keirs script. Manual fixup follows.	2012-07-13 14:42:47 -04:00
Gael Guennebaud	42e2578ef9	the min/max macros to detect unprotected min/max were undefined by some std header, so let's declare them after and do the respective fixes ;)	2011-08-19 14:18:05 +02:00
Gael Guennebaud	8170ef0b2d	add unit test for plset	2011-05-18 21:11:03 +02:00
Gael Guennebaud	4bfe38eda2	extend testing of ploaddup	2011-02-24 00:22:10 +03:00
Gael Guennebaud	0dfea7fce4	improve packetmath unit test	2011-02-23 21:24:26 +03:00
Gael Guennebaud	955c099eb5	implement ploaddup for altivec and add respective unit test	2011-02-23 18:20:55 +03:00
Gael Guennebaud	a00aaf7f7e	fix overflow in packetmath unit test	2011-02-23 17:57:18 +03:00
Gael Guennebaud	59eeb67187	add unit test for pcplxflip	2011-02-23 14:20:33 +01:00
Gael Guennebaud	aea630a98a	factorize implementation of standard real unary math functions, and add acos, asin	2011-02-17 17:37:11 +01:00
Hauke Heibel	7bc8e3ac09	Initial fixes for bug #85 . Renamed meta_{true\|false} to {true\|false}_type, meta_if to conditional, is_same_type to is_same, un{ref\|pointer\|const} to remove_{reference\|pointer\|const} and makeconst to add_const. Changed boolean type 'ret' member to 'value'. Changed 'ret' members refering to types to 'type'. Adapted all code occurences.	2010-10-25 22:13:49 +02:00
Benoit Jacob	4716040703	bug #86 : use internal:: namespace instead of ei_ prefix	2010-10-25 10:15:22 -04:00
Gael Guennebaud	3f532edc6d	update unit test for new API	2010-07-15 08:38:31 +02:00
Gael Guennebaud	2dba4b7ce7	add a unit test for conj_helper and ei_pconj	2010-07-06 20:54:14 +02:00
Gael Guennebaud	e1eccfad3f	add intitial support for the vectorization of complex<float>	2010-07-05 16:18:09 +02:00
Gael Guennebaud	6249d60715	improve packetmath unit test for sum reductions	2010-07-05 10:54:24 +02:00
Gael Guennebaud	28e64b0da3	email change	2010-06-24 23:21:58 +02:00
Konstantinos Margaritis	112c550b4a	Added initial NEON support, most tests pass however we had to use some hackish workarounds as gcc on ARM (both CodeSourcery 4.4.1 used and experimental 4.5) fail to ensure proper alignment with __attribute__((aligned(16))). This has to be fixed upstream to remove the workarounds.	2010-03-03 11:25:41 -06:00
Benoit Jacob	2840ac7e94	big huge changes, so i dont remember everything. * renaming, e.g. LU ---> FullPivLU * split tests framework: more robust, e.g. dont generate empty tests if a number is skipped * make all remaining tests use that splitting, as needed. * Fix 4x4 inversion (see stable branch) * Transform::inverse() and geo_transform test : adapt to new inverse() API, it was also trying to instantiate inverse() for 3x4 matrices. * CMakeLists: more robust regexp to parse the version number * misc fixes in unit tests	2009-10-28 18:19:29 -04:00
Benoit Jacob	d41577819b	we were already aligning to 16 byte boundary fixed-size objects that are multiple of 16 bytes; now we also align to 8byte boundary fixed-size objects that are multiple of 8 bytes. That's only useful for now for double, not e.g. for Vector2f, but that didn't seem to hurt. Am I missing something? Do you prefer that we don't align Vector2f at all? Also, improvements in test_unalignedassert.	2009-10-05 10:11:11 -04:00
Benoit Jacob	6347b1db5b	remove sentence "Eigen itself is part of the KDE project." it never made very precise sense. but now does it still make any?	2009-05-22 20:25:33 +02:00
Gael Guennebaud	49fc1e3e84	add vectorization of sqrt for float	2009-03-27 14:41:46 +00:00
Gael Guennebaud	17860e578c	add SSE2 versions of sin, cos, log, exp using code from Julien Pommier. They are for float only, and they return exactly the same result as the standard versions in about 90% of the cases. Otherwise the max error is below 1e-7. However, for very large values (>1e3) the accuracy of sin and cos slighlty decrease. They are about 3 or 4 times faster than 4 calls to their respective standard versions. So, is it ok to enable them by default in their respective functors ?	2009-03-25 12:26:13 +00:00
Gael Guennebaud	fbf415c547	add vectorization of unary operator-() (the AltiVec version is probably broken)	2009-03-20 10:03:24 +00:00
Gael Guennebaud	3f80c68be5	add the vectorization of abs	2009-03-09 18:40:09 +00:00
Gael Guennebaud	0be89a4796	big addons: * add Homogeneous expression for vector and set of vectors (aka matrix) => the next step will be to overload operator* * add homogeneous normalization (again for vector and set of vectors) * add a Replicate expression (with uni-directional replication facilities) => for all of them I'll add examples once we agree on the API * fix gcc-4.4 warnings * rename reverse.cpp array_reverse.cpp	2009-03-05 10:25:22 +00:00
Gael Guennebaud	51c991af45	* exit Sum.h, exit Prod.h, welcome vectorization of redux() ! * add vectorization for minCoeff and maxCoeff	2009-02-12 15:18:59 +00:00
Gael Guennebaud	cbbc6d940b	* add ei_predux_mul internal function * apply Ricard Marxer's prod() patch with fixes for the vectorized path	2009-02-10 18:06:05 +00:00
Gael Guennebaud	f5d96df800	Add vectorization of Reverse (was more tricky than I thought) and simplify the index based functions	2009-02-06 12:40:38 +00:00
Benoit Jacob	e1ee876daa	fix segfault due to non-aligned packets	2009-01-04 23:23:32 +00:00
Benoit Jacob	00f89a8f37	Update e-mail address	2008-11-24 13:40:43 +00:00
Gael Guennebaud	7e8aa63bb7	* Add Hyperplane::transform(Matrix/Transform) * Fix compilations with gcc 3.4, ICC and doxygen * Fix krazy directives (hopefully)	2008-08-31 13:32:29 +00:00
Gael Guennebaud	d2b345e6a9	bugfix in test/packetmath.h	2008-08-25 14:19:57 +00:00
Gael Guennebaud	440664cd5d	temporary fix of the pèrevious commit	2008-08-24 15:27:05 +00:00
Gael Guennebaud	f0394edfa7	* bugfix in SolveTriangular found by Timothy Hunter (did not compiled for very small fixed size matrices) * bugfix in Dot unroller * added special random generator for the unit tests and reduced the tolerance threshold by an order of magnitude this fixes issues with sum.cpp but other tests still failed sometimes, this have to be carefully checked...	2008-08-22 17:48:36 +00:00
Gael Guennebaud	67813e01bf	disable the vectorization of div for AltiVec	2008-08-21 14:03:17 +00:00
Gael Guennebaud	fd681507dc	Add a packetmath unit test, re-enable the comma-initializer unit test, and bug fix in PacketMath/SSE	2008-08-20 20:08:38 +00:00

1 2 3 4 5

240 Commits