Commit Graph

7114 Commits

Author SHA1 Message Date
Rasmus Munk Larsen
3492d9e2e5 s/Lesser/Less/ 2023-03-10 00:28:31 +00:00
Rasmus Munk Larsen
2419632cf5 Revert change to allFinite(), since the new version does not work for complex numbers. 2023-03-09 21:50:43 +00:00
Charles Schlosser
7bf2968fed Specify Permutation Index for PartialPivLU and FullPivLU 2023-03-07 20:28:05 +00:00
Charles Schlosser
1ce8b25825 Vectorize any() / all() 2023-03-06 23:54:02 +00:00
Charles Schlosser
cb8e6d4975 Fix 2240, 2620 2023-03-06 23:11:06 +00:00
Chip Kerchner
2b513ca2a0 Added partial linear access for LHS & Output - 30% faster for bfloat16 GEMM MMA (Power) 2023-03-02 19:22:43 +00:00
Charles Schlosser
0b396c3167 Scalarize comps 2023-03-02 17:06:23 +00:00
Antonio Sánchez
62d5cfe835 Fix ODR issues with Intel's AVX512 TRSM kernels. 2023-02-27 07:54:52 +00:00
Charles Schlosser
826627f653 vectorize comparisons and select by enabling typed comparisons 2023-02-25 20:52:11 +00:00
Rasmus Munk Larsen
2e9b945baf Fix bug that disabled vectorization for coeffMin/coeffMax. 2023-02-25 20:03:54 +00:00
Antonio Sánchez
bc5cdc7a67 Guard use of long double on GPU device. 2023-02-24 21:49:59 +00:00
Chip Kerchner
e4598fedbe Fix compiler versions for certain instructions on Power. 2023-02-23 23:24:41 +00:00
Rasmus Munk Larsen
1c0a6cf228 Get rid of EIGEN_HAS_AVX512_MATH workaround. 2023-02-23 23:16:41 +00:00
Rasmus Munk Larsen
6bcd941ee3 Use pmsub in twoprod. This speeds up pow() on Skylake by ~1%. 2023-02-21 20:09:29 +00:00
Rasmus Munk Larsen
ce62177b5b Vectorize atanh & add a missing definition and unit test for atan. 2023-02-21 03:14:05 +00:00
Charles Schlosser
049a144798 Add typed logicals 2023-02-18 01:23:47 +00:00
Chip Kerchner
e797974689 Add and enable Packet int divide for Power10. 2023-02-17 19:04:18 +00:00
Chip Kerchner
54459214a1 Fix epsilon and dummy_precision values in long double for double doubles. Prevented some algorithms from converging on PPC. 2023-02-16 23:35:42 +00:00
Antonio Sánchez
a16fb889dd Guard complex sqrt on old MSVC compilers. 2023-02-16 19:47:00 +00:00
Charles Schlosser
94b19dc5f2 Add CArg 2023-02-15 21:33:06 +00:00
Charles Schlosser
71a8e60a7a Tweak pasin_float, fix psqrt_complex 2023-02-15 01:01:14 +00:00
Antonio Sánchez
384269937f More NEON packetmath fixes. 2023-02-14 21:45:25 +00:00
Antonio Sánchez
2dfbf1b251 Fix NEON make_packet2f. 2023-02-14 16:52:07 +00:00
Chip Kerchner
4a03409569 Fix problem with array conversions BF16->F32 in Power. 2023-02-13 21:30:45 +00:00
Rasmus Munk Larsen
77b48c440e Fix compiler warnings. 2023-02-10 20:46:23 +00:00
Chip Kerchner
0ecae61568 Disable array BF16 to F32 conversions in Power 2023-02-10 20:06:58 +00:00
Charles Schlosser
c999284bad Print diagonal matrix 2023-02-10 18:07:29 +00:00
Chip Kerchner
fba12e02b3 Fold extra column calculations into an extra MMA accumulator and other bfloat16 MMA GEMM improvements 2023-02-10 17:32:06 +00:00
Chip Kerchner
79cfc74f4d Revert ODR changes and make gemm_extra_cols and gemm_complex_extra_cols EIGEN_ALWAYS_INLINE to avoid external functions. 2023-02-10 17:05:07 +00:00
Alexander Grund
f9659d91f1 Fix ODR violation with gemm_extra_cols on PPC 2023-02-09 22:16:06 +00:00
Charles Schlosser
325e3063d9 Optimize psign 2023-02-09 22:15:26 +00:00
Charles Schlosser
0e490d452d Update file ColPivHouseholderQR_LAPACKE.h 2023-02-09 13:45:56 +00:00
Antonio Sánchez
0a5392d606 Fix MSVC arm build. 2023-02-08 21:46:37 +00:00
Antonio Sánchez
3f7e775715 Add IWYU export pragmas to top-level headers. 2023-02-08 17:40:31 +00:00
Rasmus Munk Larsen
e4f58816d9 Get rid of custom implementation of equal_to and not_equal_no. No longer needed with c+14. 2023-02-07 21:36:44 -08:00
Antonio Sánchez
e256ad1823 Remove LGPL Code and references. 2023-02-08 01:25:06 +00:00
Chip Kerchner
e71f88abce Change in Power eigen_asserts to eigen_internal_asserts since it is putting unnecessary error checking and assertions without NDEBUG. 2023-02-08 00:57:30 +00:00
Gregory Kramida
232b18fa8a Fixes #2602 2023-02-06 22:52:39 +00:00
Antonio Sánchez
f6cc359e10 More EIGEN_DEVICE_FUNC fixes for CUDA 10/11/12. 2023-02-03 19:18:45 +00:00
Charles Schlosser
2a90653395 fix lapacke config 2023-02-03 16:40:08 +00:00
Jeremy Nimmer
13a1f25da9 Revert StlIterators edit from "Fix undefined behavior..." 2023-02-01 20:01:36 +00:00
Charles Schlosser
fd2fd48703 Update file ForwardDeclarations.h 2023-02-01 16:52:20 +00:00
Rasmus Munk Larsen
37b2e97175 Tweak special case handling in atan2. 2023-01-31 17:48:00 -08:00
Jeremy Nimmer
a1cdcdb038 Fix undefined behavior in Block access 2023-02-01 00:40:45 +00:00
Chip Kerchner
4a58f30aa0 Fix pre-POWER8_VECTOR bugs in pcmp_lt and pnegate and reactivate psqrt. 2023-01-31 19:40:24 +00:00
Rasmus Munk Larsen
12ad99ce60 Remove unused variables from GenericPacketMathFunctions.h 2023-01-29 18:10:28 +00:00
Charles Schlosser
6987a200bb Fix stupid sparse bugs with outerSize == 0 2023-01-28 02:03:09 +00:00
Charles Schlosser
0471e61b4c Optimize various mathematical packet ops 2023-01-28 01:34:26 +00:00
Charles Schlosser
1aa6dc2007 Fix sparse warnings 2023-01-27 22:47:42 +00:00
Antonio Sánchez
17ae83a966 Fix bugs exposed by enabling GPU asserts. 2023-01-27 21:43:00 +00:00
Chip Kerchner
ab8725d947 Turn off vectorize version of rsqrt - doesn't match generic version 2023-01-27 18:28:54 +00:00
Charles Schlosser
6d9f662a70 Tweak atan2 2023-01-26 17:38:21 +00:00
Chip Kerchner
6fc9de7d93 Fix slowdown in bfloat16 MMA when rows is not a multiple of 8 or columns is not a multiple of 4. 2023-01-25 18:22:20 +00:00
Charles Schlosser
7f58bc98b1 Refactor sparse 2023-01-23 17:55:50 +00:00
Rasmus Munk Larsen
576448572f More fixes for __GNUC_PATCHLEVEL__. 2023-01-23 17:04:24 +00:00
Rasmus Munk Larsen
164ddf75ab Use __GNUC_PATCHLEVEL__ rather than __GNUC_PATCH__, according to the documentation https://gcc.gnu.org/onlinedocs/cpp/Common-Predefined-Macros.html 2023-01-23 16:56:14 +00:00
Charles Schlosser
5a7ca681d5 Fix sparse insert 2023-01-20 21:32:32 +00:00
Antonio Sánchez
08c961e837 Add custom ODR-safe assert. 2023-01-20 17:38:13 +00:00
Sean McBride
d70b4864d9 issue #2581: review and cleanup of compiler version checks 2023-01-17 18:58:34 +00:00
Mehdi Goli
b523120687 [SYCL-2020 Support] Enabling Intel DPCPP Compiler support to Eigen 2023-01-16 07:04:08 +00:00
tttapa
bae119bb7e Support per-thread is_malloc_allowed() state 2023-01-16 01:34:56 +00:00
Charles Schlosser
fa0bd2c34e improve sparse permutations 2023-01-15 03:21:25 +00:00
Antonio Sánchez
2e61c0c6b4 Add missing EIGEN_DEVICE_FUNC in a few places when called by asserts. 2023-01-15 02:06:17 +00:00
Charles Schlosser
4aca06f63a avoid move assignment in ColPivHouseholderQR 2023-01-15 01:34:10 +00:00
Charles Schlosser
68082b8226 Fix QR, again 2023-01-13 03:23:17 +00:00
Sergey Fedorov
4d05765345 Altivec fixes for Darwin: do not use unsupported VSX insns 2023-01-12 16:33:33 +00:00
Rasmus Munk Larsen
6156797016 Revert "Add template to specify QR permutation index type, Fix ColPivHouseholderQR Lapacke bindings"
This reverts commit be7791e097
2023-01-11 18:50:52 +00:00
Charles Schlosser
be7791e097 Add template to specify QR permutation index type, Fix ColPivHouseholderQR Lapacke bindings 2023-01-11 15:57:28 +00:00
Charles Schlosser
9463fc95f4 change insert strategy 2023-01-11 06:24:49 +00:00
Martin Burchell
c54785b071 Fix error: unused parameter 'tmp' [-Werror,-Wunused-parameter] on clang/32-bit arm 2023-01-10 21:15:28 +00:00
Charles Schlosser
81172cbdcb Overhaul Sparse Core 2023-01-07 22:09:42 +00:00
Chip Kerchner
d20fe21ae4 Improve performance for Power10 MMA bfloat16 GEMM 2023-01-06 23:08:37 +00:00
Ryan Senanayake
fe7f527787 Fix guard macros for emulated FP16 operators on GPU 2023-01-06 22:02:51 +00:00
Antonio Sánchez
262194f12c Fix a bunch of minor build and test issues. 2023-01-06 16:37:26 +00:00
Antonio Sánchez
3564668908 Fix overalign check. 2023-01-05 17:10:48 +00:00
Charles Schlosser
f3929ac7ed Fix EIGEN_HAS_CXX17_OVERALIGN for icc 2023-01-03 17:30:10 +00:00
Charles Schlosser
a8bab0d8ae Patch SparseLU 2022-12-31 04:52:36 +00:00
Arthur
311cc0f9cc Enable NEON pcmp, plset, and complex psqrt 2022-12-22 05:38:34 +00:00
Antonio Sánchez
dbf7ae6f9b Fix up C++ version detection macros and cmake tests. 2022-12-20 18:06:03 +00:00
Antonio Sánchez
bb6675caf7 Fix incorrect NEON native fp16 multiplication. 2022-12-19 20:46:44 +00:00
Rasmus Munk Larsen
dd85d26946 Revert "Avoid mixing types in CompressedStorage.h" 2022-12-19 20:09:37 +00:00
Arthur Feeney
c4fb6af24b Enable NEON pabs for unsigned int types 2022-12-19 17:07:36 +00:00
Rasmus Munk Larsen
04e4f0bb24 Add missing colon in SparseMatrix.h. 2022-12-16 21:50:00 +00:00
Rasmus Munk Larsen
3d8a8def8a Avoid mixing types in CompressedStorage.h 2022-12-16 20:11:02 +00:00
Charles Schlosser
4bb2446796 Add operators to CompressedStorageIterator 2022-12-16 16:48:50 +00:00
Alexander Richardson
37de432907 Avoid using std::raise() for divide by zero 2022-12-14 20:06:16 +00:00
Alexander Richardson
62de593c40 Allow std::initializer_list constructors in constexpr expressions 2022-12-14 17:05:37 +00:00
Charles Schlosser
6d3e3678b4 optimize equalspace packetop 2022-12-13 01:22:25 +00:00
Charles Schlosser
2004831941 add EqualSpaced / setEqualSpaced 2022-12-13 00:54:57 +00:00
Melven Roehrig-Zoellner
273f803846 Add BDCSVD_LAPACKE binding 2022-12-09 18:50:12 +00:00
Antonio Sánchez
03c9b4738c Enable direct access for NestByValue. 2022-12-07 18:21:45 +00:00
Chip Kerchner
b59f18b4f7 Increase L2 and L3 cache size for Power10. 2022-12-07 18:20:33 +00:00
Charles Schlosser
44fe539150 add sparse sort inner vectors function 2022-12-01 19:28:56 +00:00
Lianhuang Li
d194167149 Fix the bug using neon instruction fmla for data type half 2022-12-01 17:28:57 +00:00
Pedro Caldeira
31ab62d347 Add support for Power10 (AltiVec) MMA instructions for bfloat16. 2022-11-30 23:33:37 +00:00
Antonio Sánchez
dcb042a87d Fix serialization for non-compressed matrices. 2022-11-30 18:16:47 +00:00
Antonio Sánchez
2260e11eb0 Fix reshape strides when input has non-zero inner stride. 2022-11-29 19:39:29 +00:00
Alexandre Hoffmann
23524ab6fc Changing BiCGSTAB parameters initialization so that it works with custom types 2022-11-29 19:37:46 +00:00
Antonio Sánchez
ab2b26fbc2 Fix sparseLU solver when destination has a non-unit stride. 2022-11-29 19:37:03 +00:00
Antonio Sánchez
e7b1ad0315 Add serialization for sparse matrix and sparse vector. 2022-11-21 19:43:07 +00:00
Charles Schlosser
044f3f6234 Fix bug in handmade_aligned_realloc 2022-11-18 22:35:31 +00:00
Charles Schlosser
02805bd56c Fix AVX2 psignbit 2022-11-16 13:43:11 +00:00
Chip Kerchner
399ce1ed63 Fix duplicate execution code for Power 8 Altivec in pstore_partial. 2022-11-16 13:41:42 +00:00
Gabriele Buondonno
6431dfdb50 Cross product for vectors of size 2. Fixes #1037 2022-11-15 22:39:42 +00:00
Antonio Sánchez
8588d8c74b Correct pnegate for floating-point zero. 2022-11-15 18:07:23 +00:00
Antonio Sanchez
5eacb9e117 Put brackets around unsigned type names. 2022-11-15 09:09:45 -08:00
Antonio Sánchez
37e40dca85 Fix ambiguity in PPC for vec_splats call. 2022-11-14 18:58:16 +00:00
Antonio Sánchez
7dc6db75d4 Fix typo in CholmodSupport 2022-11-08 23:49:56 +00:00
Charles Schlosser
9b6d624eab fix neon 2022-11-08 20:03:01 +00:00
Rasmus Munk Larsen
7e398e9436 Add missing return keyword in psignbit for NEON. 2022-11-04 16:13:09 +00:00
Charles Schlosser
82b152dbe7 Add signbit function 2022-11-04 00:31:20 +00:00
Antonio Sánchez
8f8e36458f Remove recently added sparse assert in SparseMapBase. 2022-11-03 17:29:05 +00:00
Antonio Sanchez
01a31b81b2 Remove unused parameter name. 2022-11-01 15:51:25 -07:00
Antonio Sánchez
c5b896c5a3 Allow empty matrices to be resized. 2022-10-27 20:33:35 +00:00
Antonio Sánchez
886aad1361 Disable patan for double on PPC. 2022-10-27 17:56:08 +00:00
Antonio Sánchez
ab407b2b6e Fix handmade_aligned_malloc offset computation. 2022-10-27 17:33:47 +00:00
Antonio Sánchez
adb30efb25 Add assert for invalid outerIndexPtr array in SparseMapBase. 2022-10-26 22:51:33 +00:00
Antonio Sánchez
c27d1abe46 Fix pragma check for disabling fastmath. 2022-10-26 22:50:57 +00:00
Charles Schlosser
a226371371 Change handmade_aligned_malloc/realloc/free to store a 1 byte offset instead of absolute address 2022-10-22 22:51:31 +00:00
Antonio Sánchez
bf48d46338 Explicitly state that indices must be sorted. 2022-10-19 18:15:29 +00:00
Rasmus Munk Larsen
3bb6a48d8c Fix bug atan2 2022-10-12 23:49:32 +00:00
Rasmus Munk Larsen
14c847dc0e Refactor special values test for pow, and add a similar test for atan2 2022-10-12 20:12:08 +00:00
Rasmus Munk Larsen
462758e8a3 Don't use generic sign function for sign(complex) unless it is vectorizable 2022-10-12 16:03:29 +00:00
Rasmus Munk Larsen
c0d6a72611 Use pnegate(pzero(x)) as a generic way to generate -0.0. Some compiler do not handle the literal -0.0 properly in fastmath mode. 2022-10-12 01:57:05 +00:00
Laurent Rineau
7846c7387c Eigen/Sparse: fix warnings -Wunused-but-set-variable 2022-10-11 17:37:04 +00:00
Rasmus Munk Larsen
3167544873 Handle NaN inputs to atan2. 2022-10-10 19:36:36 -07:00
Rasmus Munk Larsen
72db3f0fa5 Remove references to M_PI_2 and M_PI_4. 2022-10-11 00:27:16 +00:00
Rasmus Munk Larsen
5ceed0d57f Guard GCC-specific pragmas with "#ifdef EIGEN_COMP_GNUC" 2022-10-10 20:38:53 +00:00
Rasmus Munk Larsen
e95c4a837f Simpler range reduction strategy for atan<float>(). 2022-10-04 18:11:00 +00:00
Antonio Sánchez
80efbfdeda Unconditionally enable CXX11 math. 2022-10-04 17:37:47 +00:00
Antonio Sánchez
e5794873cb Replace assert with eigen_assert. 2022-10-04 17:11:23 +00:00
Antonio Sánchez
7d6a9925cc Fix 4x4 inverse when compiling with -Ofast. 2022-10-04 16:05:49 +00:00
Rasmus Munk Larsen
1414a76fa9 Only vectorize atan<double> for Altivec if VSX is available. 2022-10-03 22:06:58 +00:00
Rasmus Munk Larsen
c475228b28 Vectorize atan() for double. 2022-10-01 01:49:30 +00:00
Rasmus Munk Larsen
1e1848fdb1 Add a vectorized implementation of atan2 to Eigen. 2022-09-28 20:46:49 +00:00
Rasmus Munk Larsen
b3bf8d6a13 Try to reduce size of GEBP kernel for non-ARM targets. 2022-09-28 02:37:18 +00:00
Rasmus Munk Larsen
13b69fc1b0 Try to reduce compilation time/memory for GEBP kernel using EIGEN_IF_CONSTEXPR 2022-09-23 20:09:42 +00:00
Rasmus Munk Larsen
ed8cda3ce4 Move EIGEN_NEON_GEBP_NR macro to the right place in GeneralBlockPanelKernel.h 2022-09-23 02:24:27 +00:00
Rasmus Munk Larsen
e2ea866515 Add a macro to set the nr trait in the BEBP kernel for NEON. 2022-09-22 23:56:34 +00:00
Lianhuang Li
23299632c2 Use 3px8/2px8/1px8/1x8 gebp_kernel on arm64-neon 2022-09-21 16:36:40 +00:00
Rasmus Munk Larsen
7b2901e2aa Add vectorized integer division for int32 with AVX512, AVX or SSE. 2022-09-21 00:27:23 +00:00
Rasmus Munk Larsen
f913a40678 Revert "Add AVX int32_t pdiv"
This reverts commit ea84e7ad63
2022-09-16 22:48:08 +00:00
Rasmus Munk Larsen
273e0c884e Revert "Add constexpr, test for C++14 constexpr." 2022-09-16 21:14:29 +00:00
Charles Schlosser
ea84e7ad63 Add AVX int32_t pdiv 2022-09-16 17:06:29 +00:00
Rasmus Munk Larsen
afc014f1b5 Allow mixed types for pow(), as long as the exponent is exactly representable in the base type. 2022-09-12 21:55:30 +00:00
Rasmus Munk Larsen
e8a2aa24a2 Fix a couple of issues with unary pow(): 2022-09-09 17:21:11 +00:00
Rohit Santhanam
07d0759951 [ROCm] Fix for sparse matrix related breakage on ROCm. 2022-09-09 14:41:00 +00:00
Antonio Sánchez
fb212c745d Fix g++-6 constexpr and c++20 constexpr build errors. 2022-09-09 03:41:45 +00:00
Thomas Gloor
ec9c7163a3 Feature/skew symmetric matrix3 2022-09-08 20:44:40 +00:00
Antonio Sánchez
311ba66f7c Fix realloc for non-trivial types. 2022-09-08 19:39:36 +00:00