Commit Graph

7147 Commits

Author SHA1 Message Date
Chip Kerchner
211c5dfc67 Add optional offset parameter to ploadu_partial and pstoreu_partial 2023-06-23 19:53:05 +00:00
Charles Schlosser
44c20bbbe3 rint round floor ceil 2023-06-23 16:29:16 +00:00
Charles Schlosser
387175c258 Fix safe_abs in int_pow 2023-06-23 04:12:41 +00:00
Charles Schlosser
969c31eefc Fix AVX pstore 2023-06-15 01:47:38 +00:00
wilfried.karel
6c1411e521 define a move constructor for Ref<const...> 2023-06-14 20:10:51 +00:00
wilfried.karel
d8f3eb87bf Compile- and run-time assertions for the construction of Ref<const>. 2023-06-14 15:49:58 +00:00
Charles Schlosser
59b3ef5409 Partially Vectorize Cast 2023-06-09 16:54:31 +00:00
Rasmus Munk Larsen
7d7576f326 Avoid underflow in prsqrt. 2023-06-06 14:06:19 -07:00
Charles Schlosser
b7151ffaab Fix unary pow error handling and test 2023-06-06 18:46:55 +00:00
Rasmus Munk Larsen
7ac8897431 Reduce max relative error of prsqrt from 3 to 2 ulps. 2023-06-04 22:25:33 +00:00
Charles Schlosser
1d80e23186 Optimize scalar_unary_pow_op error handling 2023-06-02 18:53:06 +00:00
Alexander Shaposhnikov
316eab8deb Do not set EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC for cuda compilation 2023-05-31 15:15:06 +00:00
Rasmus Munk Larsen
8c43bf2b5b Clean up Redux.h and fix vectorization_logic test after changes to traversal order in Redux. 2023-05-24 20:26:52 +00:00
Charles Schlosser
da6a71faf0 Add linear redux evaluators 2023-05-24 17:07:25 +00:00
Charles Schlosser
67a1e881d9 Sparse matrix column/row removal 2023-05-24 17:04:45 +00:00
Rasmus Munk Larsen
de1c884687 Add reference to writeup of approach used in canonicalEulerAngles. 2023-05-24 15:52:26 +00:00
Charles Schlosser
307a417e1c Fix unrolled assignment evaluator 2023-05-22 16:39:24 +00:00
Juraj Oršulić
c18f94e3b0 Geometry/EulerAngles: introduce canonicalEulerAngles 2023-05-19 15:42:22 +00:00
Charles Schlosser
7d9bb90f15 SVD: fix numerous compiler warnings / failures 2023-05-15 16:56:47 +00:00
Rasmus Munk Larsen
96c42771d6 Make it possible to override the synchonization primitives used by the threadpool using macros. 2023-05-09 19:36:17 +00:00
Rasmus Munk Larsen
1321821e86 Add missing braces in Umeyama.h 2023-05-09 19:10:50 +00:00
Rasmus Munk Larsen
524c329ab2 Work around compiler bug in Umeyama.h. 2023-05-09 18:53:56 +00:00
Charles Schlosser
fbf7189bd5 Fix cuda compilation 2023-05-08 16:15:47 +00:00
Mehdi Goli
0623791930 [SYCL-2020] Enabling USM support for SYCL. SYCL-1.2.1 did not have support for USM. 2023-05-05 17:30:36 +00:00
Tobias Wood
94f57867fe Thread pool 2023-05-05 16:23:34 +00:00
Charles Schlosser
725c11719b Visitor: fix modulo by zero compiler warning 2023-05-04 18:21:09 +00:00
Chip Kerchner
b8208b363c Specialized loadColData correctly - fix previous BF16 GEMV MR 2023-05-04 16:38:17 +00:00
Chip Kerchner
fda1373a15 Fix ColMajor BF16 GEMV for when vector is RowMajor 2023-05-03 20:12:50 +00:00
Charles Schlosser
fdc749de2a JacobiSVD: set m_nonzeroSingularValues to zero if not finite 2023-05-02 17:48:21 +00:00
Chip Kerchner
6418ac0285 Unroll F32 to BF16 loop - 1.8X faster conversions for LLVM. Use vector pairs for GCC. 2023-05-01 16:54:16 +00:00
Charles Schlosser
c9a14f48d9 SSE Packet4ui has pcmp, pmin, pmax 2023-04-28 20:36:08 +00:00
Rasmus Munk Larsen
0b51f763cb Revert "Geometry/EulerAngles: make sure that returned solution has canonical ranges"
This reverts commit 7f06bcae2c
2023-04-27 00:06:23 +00:00
Antonio Sánchez
2d0c6ad873 Revert "Vectorize cast"
This reverts commit eb5ff1861a
2023-04-26 18:03:36 +00:00
Charles Schlosser
8999525c29 AVX2: Packet4ul has pmul, abs2 2023-04-26 16:22:16 +00:00
Charles Schlosser
eb5ff1861a Vectorize cast 2023-04-26 02:50:13 +00:00
Antonio Sánchez
3918768be1 Fix sparse iterator and tests. 2023-04-25 19:05:49 +00:00
Charles Schlosser
f6cf5dca80 Packet4ul does not have Abs2 2023-04-21 19:48:01 +00:00
Chip Kerchner
03f646b7e3 New VSX version of BF16 GEMV (Power) - up to 6.7X faster 2023-04-21 17:06:59 +00:00
Charles Schlosser
29c8e3c754 fix pow for uint32_t, disable pmul<Packet4ul> 2023-04-21 05:47:56 +00:00
Juraj Oršulić
7f06bcae2c Geometry/EulerAngles: make sure that returned solution has canonical ranges 2023-04-19 19:12:24 +00:00
Rasmus Munk Larsen
a347dbbab2 Delete last few occurences of HasHalfPacket. 2023-04-19 10:36:59 -07:00
Charles Schlosser
2b954be663 fix typo in sse packetmath 2023-04-18 18:17:41 +00:00
Rasmus Munk Larsen
25685c90ad Fix incorrect packet type for unsigned int version of pfirst() in MSVC workaround in PacketMath.h. 2023-04-18 17:46:23 +00:00
Chip Kerchner
3f3ce214e6 New BF16 pcast functions and move type casting to TypeCasting.h 2023-04-18 02:38:38 +00:00
Pedro Gonnet
17b5b4de58 Add Packet4ui, Packet8ui, and Packet4ul to the SSE/AVX PacketMath.h headers 2023-04-17 23:33:59 +00:00
Charles Schlosser
87300c93ca Refactor IndexedView 2023-04-17 12:32:50 +00:00
Chip Kerchner
1148f0a9ec Add dynamic dispatch to BF16 GEMM (Power) and new VSX version 2023-04-14 22:20:42 +00:00
Rasmus Munk Larsen
554fe02ae3 Enable new AVX512 GEMM kernel by default. 2023-04-12 13:39:06 -07:00
Charles Schlosser
0d12fcc34e Insert from triplets 2023-04-12 20:01:48 +00:00
b-shi
15fbddaf9b ASAN fixes for AVX512 GEMM/TRSM 2023-04-04 15:54:24 -07:00
Charles Schlosser
178ef8c97f qualify non-const symbolic indexed view with is_lvalue 2023-04-04 19:06:32 +00:00
Rasmus Munk Larsen
df1049ddf4 Small packet math cleanup. 2023-04-04 16:14:32 +00:00
Antoine Hoarau
9b48d10215 Guard all malloc, realloc and free() fonctions with check_that_malloc_is_allowed() 2023-04-04 04:24:22 +00:00
Rasmus Munk Larsen
c730290fa0 Use the correct truncating intrinsic for double->int casting. 2023-04-03 13:56:41 -07:00
Charles Schlosser
766db02020 disable raw array indexed view access for 1d arrays 2023-03-29 02:39:45 +00:00
Charles Schlosser
bfbc66e078 refactor indexedviewmethods, enable non-const ref access with symbolic indices 2023-03-29 01:35:26 +00:00
Rasmus Munk Larsen
1a5dfd7c0f Fix incorrect casting in AVX512DQ path. 2023-03-27 09:28:06 -07:00
Charles Schlosser
a08649994f Optimize generic_rsqrt_newton_step 2023-03-24 22:42:57 +00:00
Rasmus Munk Larsen
b8b8a26145 Add more missing vectorized casts for int on x86, and remove redundant unit tests 2023-03-24 16:02:00 +00:00
unageek
33e206f714 Remove unused declarations of BLAS/LAPACK routines 2023-03-23 21:54:05 +00:00
Rasmus Munk Larsen
d57a79e512 Optimize float->bool cast for AVX2, based on Charles Schlosser's comments. 2023-03-21 20:59:25 -07:00
Rasmus Munk Larsen
a5ae832773 Fix reversal of arguments to _mm256_set_m128() in pcast<Packet4d, Packet8f>. 2023-03-22 03:21:44 +00:00
Rasmus Munk Larsen
09945f2cc1 Optimize casting for x86_64. 2023-03-21 18:24:16 +00:00
Colin Broderick
8f9b8e3630 Replaced all instances of internal::(U)IntPtr with std::(u)intptr_t. Remove ICC workaround. 2023-03-21 16:50:23 +00:00
Antonio Sánchez
2c8011c2dd Fix arm builds. 2023-03-20 16:59:38 +00:00
Charles Schlosser
fd8f410bbe Fix 2624 2625 2023-03-20 16:30:04 +00:00
Jonas Schulze
81cb6a51d0 Fix some typos 2023-03-16 23:11:43 +00:00
Rasmus Munk Larsen
0488b708b4 Vectorize tensor.isnan() by using typed predicates. 2023-03-16 04:04:22 +00:00
Rasmus Munk Larsen
f02856c640 Use EIGEN_NOT_A_MACRO macro (oh the irony!) to avoid build issue in TensorFlow. 2023-03-15 11:42:57 -07:00
Rasmus Munk Larsen
690ae9502f Use C++11 standard features for detecting presence of Inf and NaN 2023-03-15 16:52:44 +00:00
Chip Kerchner
d71ac6a755 Fix recent PowerPC warnings and clang warning 2023-03-15 16:50:46 +00:00
Chip Kerchner
23e1541863 Put deadcode checks back in from previous change. 2023-03-14 00:57:16 +00:00
Chip Kerchner
6c58f0fe1f Revert changes that made BF16 GEMM to cause bad register spillage for LLVM (Power) 2023-03-13 23:36:06 +00:00
Rasmus Munk Larsen
79de101d23 Handle PropagateFast the same way as PropagateNaN in minmax visitor to 2023-03-13 20:47:11 +00:00
Chip Kerchner
9d72412385 Add MMA to BF16 GEMV - 5.0-6.3X faster (for Power) 2023-03-13 19:37:13 +00:00
Rasmus Munk Larsen
2067b54b13 Fix bug in minmax_coeff_visitor for matrix of all NaNs. 2023-03-13 18:25:22 +00:00
Rasmus Munk Larsen
ee0ff0ab3a Fix typo in MathFunctions.h 2023-03-13 15:50:40 +00:00
Rasmus Munk Larsen
21c49e8f8e Delete mystery character from Eigen/src/Core/arch/NEON/MathFunctions.h 2023-03-10 23:27:24 +00:00
Rasmus Munk Larsen
6bb9609bcb Make new Select implementation backwards compatible. 2023-03-10 23:07:47 +00:00
Antonio Sánchez
394aabb0a3 Fix failing MSVC tests due to compiler bugs. 2023-03-10 22:36:57 +00:00
Rasmus Munk Larsen
d6235d76db Clean up generic packetmath specializations for various backends with the help of a macro. 2023-03-10 22:02:23 +00:00
Rasmus Munk Larsen
e8fdf127c6 Work around compiler bug in Tridiagonalization.h 2023-03-10 21:21:07 +00:00
Rasmus Munk Larsen
adf26b6840 Add newline to end of file. 2023-03-10 16:53:22 +00:00
Rasmus Munk Larsen
3492d9e2e5 s/Lesser/Less/ 2023-03-10 00:28:31 +00:00
Rasmus Munk Larsen
2419632cf5 Revert change to allFinite(), since the new version does not work for complex numbers. 2023-03-09 21:50:43 +00:00
Charles Schlosser
7bf2968fed Specify Permutation Index for PartialPivLU and FullPivLU 2023-03-07 20:28:05 +00:00
Charles Schlosser
1ce8b25825 Vectorize any() / all() 2023-03-06 23:54:02 +00:00
Charles Schlosser
cb8e6d4975 Fix 2240, 2620 2023-03-06 23:11:06 +00:00
Chip Kerchner
2b513ca2a0 Added partial linear access for LHS & Output - 30% faster for bfloat16 GEMM MMA (Power) 2023-03-02 19:22:43 +00:00
Charles Schlosser
0b396c3167 Scalarize comps 2023-03-02 17:06:23 +00:00
Antonio Sánchez
62d5cfe835 Fix ODR issues with Intel's AVX512 TRSM kernels. 2023-02-27 07:54:52 +00:00
Charles Schlosser
826627f653 vectorize comparisons and select by enabling typed comparisons 2023-02-25 20:52:11 +00:00
Rasmus Munk Larsen
2e9b945baf Fix bug that disabled vectorization for coeffMin/coeffMax. 2023-02-25 20:03:54 +00:00
Antonio Sánchez
bc5cdc7a67 Guard use of long double on GPU device. 2023-02-24 21:49:59 +00:00
Chip Kerchner
e4598fedbe Fix compiler versions for certain instructions on Power. 2023-02-23 23:24:41 +00:00
Rasmus Munk Larsen
1c0a6cf228 Get rid of EIGEN_HAS_AVX512_MATH workaround. 2023-02-23 23:16:41 +00:00
Rasmus Munk Larsen
6bcd941ee3 Use pmsub in twoprod. This speeds up pow() on Skylake by ~1%. 2023-02-21 20:09:29 +00:00
Rasmus Munk Larsen
ce62177b5b Vectorize atanh & add a missing definition and unit test for atan. 2023-02-21 03:14:05 +00:00
Charles Schlosser
049a144798 Add typed logicals 2023-02-18 01:23:47 +00:00
Chip Kerchner
e797974689 Add and enable Packet int divide for Power10. 2023-02-17 19:04:18 +00:00