Pedro Gonnet
|
17b5b4de58
|
Add Packet4ui, Packet8ui, and Packet4ul to the SSE/AVX PacketMath.h headers
|
2023-04-17 23:33:59 +00:00 |
|
Charles Schlosser
|
87300c93ca
|
Refactor IndexedView
|
2023-04-17 12:32:50 +00:00 |
|
Chip Kerchner
|
1148f0a9ec
|
Add dynamic dispatch to BF16 GEMM (Power) and new VSX version
|
2023-04-14 22:20:42 +00:00 |
|
Rasmus Munk Larsen
|
554fe02ae3
|
Enable new AVX512 GEMM kernel by default.
|
2023-04-12 13:39:06 -07:00 |
|
Charles Schlosser
|
0d12fcc34e
|
Insert from triplets
|
2023-04-12 20:01:48 +00:00 |
|
b-shi
|
15fbddaf9b
|
ASAN fixes for AVX512 GEMM/TRSM
|
2023-04-04 15:54:24 -07:00 |
|
Charles Schlosser
|
178ef8c97f
|
qualify non-const symbolic indexed view with is_lvalue
|
2023-04-04 19:06:32 +00:00 |
|
Rasmus Munk Larsen
|
df1049ddf4
|
Small packet math cleanup.
|
2023-04-04 16:14:32 +00:00 |
|
Antoine Hoarau
|
9b48d10215
|
Guard all malloc, realloc and free() fonctions with check_that_malloc_is_allowed()
|
2023-04-04 04:24:22 +00:00 |
|
Rasmus Munk Larsen
|
c730290fa0
|
Use the correct truncating intrinsic for double->int casting.
|
2023-04-03 13:56:41 -07:00 |
|
Charles Schlosser
|
766db02020
|
disable raw array indexed view access for 1d arrays
|
2023-03-29 02:39:45 +00:00 |
|
Charles Schlosser
|
bfbc66e078
|
refactor indexedviewmethods, enable non-const ref access with symbolic indices
|
2023-03-29 01:35:26 +00:00 |
|
Rasmus Munk Larsen
|
1a5dfd7c0f
|
Fix incorrect casting in AVX512DQ path.
|
2023-03-27 09:28:06 -07:00 |
|
Charles Schlosser
|
a08649994f
|
Optimize generic_rsqrt_newton_step
|
2023-03-24 22:42:57 +00:00 |
|
Rasmus Munk Larsen
|
b8b8a26145
|
Add more missing vectorized casts for int on x86, and remove redundant unit tests
|
2023-03-24 16:02:00 +00:00 |
|
unageek
|
33e206f714
|
Remove unused declarations of BLAS/LAPACK routines
|
2023-03-23 21:54:05 +00:00 |
|
Rasmus Munk Larsen
|
d57a79e512
|
Optimize float->bool cast for AVX2, based on Charles Schlosser's comments.
|
2023-03-21 20:59:25 -07:00 |
|
Rasmus Munk Larsen
|
a5ae832773
|
Fix reversal of arguments to _mm256_set_m128() in pcast<Packet4d, Packet8f>.
|
2023-03-22 03:21:44 +00:00 |
|
Rasmus Munk Larsen
|
09945f2cc1
|
Optimize casting for x86_64.
|
2023-03-21 18:24:16 +00:00 |
|
Colin Broderick
|
8f9b8e3630
|
Replaced all instances of internal::(U)IntPtr with std::(u)intptr_t. Remove ICC workaround.
|
2023-03-21 16:50:23 +00:00 |
|
Antonio Sánchez
|
2c8011c2dd
|
Fix arm builds.
|
2023-03-20 16:59:38 +00:00 |
|
Charles Schlosser
|
fd8f410bbe
|
Fix 2624 2625
|
2023-03-20 16:30:04 +00:00 |
|
Jonas Schulze
|
81cb6a51d0
|
Fix some typos
|
2023-03-16 23:11:43 +00:00 |
|
Rasmus Munk Larsen
|
0488b708b4
|
Vectorize tensor.isnan() by using typed predicates.
|
2023-03-16 04:04:22 +00:00 |
|
Rasmus Munk Larsen
|
f02856c640
|
Use EIGEN_NOT_A_MACRO macro (oh the irony!) to avoid build issue in TensorFlow.
|
2023-03-15 11:42:57 -07:00 |
|
Rasmus Munk Larsen
|
690ae9502f
|
Use C++11 standard features for detecting presence of Inf and NaN
|
2023-03-15 16:52:44 +00:00 |
|
Chip Kerchner
|
d71ac6a755
|
Fix recent PowerPC warnings and clang warning
|
2023-03-15 16:50:46 +00:00 |
|
Chip Kerchner
|
23e1541863
|
Put deadcode checks back in from previous change.
|
2023-03-14 00:57:16 +00:00 |
|
Chip Kerchner
|
6c58f0fe1f
|
Revert changes that made BF16 GEMM to cause bad register spillage for LLVM (Power)
|
2023-03-13 23:36:06 +00:00 |
|
Rasmus Munk Larsen
|
79de101d23
|
Handle PropagateFast the same way as PropagateNaN in minmax visitor to
|
2023-03-13 20:47:11 +00:00 |
|
Chip Kerchner
|
9d72412385
|
Add MMA to BF16 GEMV - 5.0-6.3X faster (for Power)
|
2023-03-13 19:37:13 +00:00 |
|
Rasmus Munk Larsen
|
2067b54b13
|
Fix bug in minmax_coeff_visitor for matrix of all NaNs.
|
2023-03-13 18:25:22 +00:00 |
|
Rasmus Munk Larsen
|
ee0ff0ab3a
|
Fix typo in MathFunctions.h
|
2023-03-13 15:50:40 +00:00 |
|
Rasmus Munk Larsen
|
21c49e8f8e
|
Delete mystery character from Eigen/src/Core/arch/NEON/MathFunctions.h
|
2023-03-10 23:27:24 +00:00 |
|
Rasmus Munk Larsen
|
6bb9609bcb
|
Make new Select implementation backwards compatible.
|
2023-03-10 23:07:47 +00:00 |
|
Antonio Sánchez
|
394aabb0a3
|
Fix failing MSVC tests due to compiler bugs.
|
2023-03-10 22:36:57 +00:00 |
|
Rasmus Munk Larsen
|
d6235d76db
|
Clean up generic packetmath specializations for various backends with the help of a macro.
|
2023-03-10 22:02:23 +00:00 |
|
Rasmus Munk Larsen
|
e8fdf127c6
|
Work around compiler bug in Tridiagonalization.h
|
2023-03-10 21:21:07 +00:00 |
|
Rasmus Munk Larsen
|
adf26b6840
|
Add newline to end of file.
|
2023-03-10 16:53:22 +00:00 |
|
Rasmus Munk Larsen
|
3492d9e2e5
|
s/Lesser/Less/
|
2023-03-10 00:28:31 +00:00 |
|
Rasmus Munk Larsen
|
2419632cf5
|
Revert change to allFinite(), since the new version does not work for complex numbers.
|
2023-03-09 21:50:43 +00:00 |
|
Charles Schlosser
|
7bf2968fed
|
Specify Permutation Index for PartialPivLU and FullPivLU
|
2023-03-07 20:28:05 +00:00 |
|
Charles Schlosser
|
1ce8b25825
|
Vectorize any() / all()
|
2023-03-06 23:54:02 +00:00 |
|
Charles Schlosser
|
cb8e6d4975
|
Fix 2240, 2620
|
2023-03-06 23:11:06 +00:00 |
|
Chip Kerchner
|
2b513ca2a0
|
Added partial linear access for LHS & Output - 30% faster for bfloat16 GEMM MMA (Power)
|
2023-03-02 19:22:43 +00:00 |
|
Charles Schlosser
|
0b396c3167
|
Scalarize comps
|
2023-03-02 17:06:23 +00:00 |
|
Antonio Sánchez
|
62d5cfe835
|
Fix ODR issues with Intel's AVX512 TRSM kernels.
|
2023-02-27 07:54:52 +00:00 |
|
Charles Schlosser
|
826627f653
|
vectorize comparisons and select by enabling typed comparisons
|
2023-02-25 20:52:11 +00:00 |
|
Rasmus Munk Larsen
|
2e9b945baf
|
Fix bug that disabled vectorization for coeffMin/coeffMax.
|
2023-02-25 20:03:54 +00:00 |
|
Antonio Sánchez
|
bc5cdc7a67
|
Guard use of long double on GPU device.
|
2023-02-24 21:49:59 +00:00 |
|
Chip Kerchner
|
e4598fedbe
|
Fix compiler versions for certain instructions on Power.
|
2023-02-23 23:24:41 +00:00 |
|
Rasmus Munk Larsen
|
1c0a6cf228
|
Get rid of EIGEN_HAS_AVX512_MATH workaround.
|
2023-02-23 23:16:41 +00:00 |
|
Rasmus Munk Larsen
|
6bcd941ee3
|
Use pmsub in twoprod. This speeds up pow() on Skylake by ~1%.
|
2023-02-21 20:09:29 +00:00 |
|
Rasmus Munk Larsen
|
ce62177b5b
|
Vectorize atanh & add a missing definition and unit test for atan.
|
2023-02-21 03:14:05 +00:00 |
|
Charles Schlosser
|
049a144798
|
Add typed logicals
|
2023-02-18 01:23:47 +00:00 |
|
Chip Kerchner
|
e797974689
|
Add and enable Packet int divide for Power10.
|
2023-02-17 19:04:18 +00:00 |
|
Chip Kerchner
|
54459214a1
|
Fix epsilon and dummy_precision values in long double for double doubles. Prevented some algorithms from converging on PPC.
|
2023-02-16 23:35:42 +00:00 |
|
Antonio Sánchez
|
a16fb889dd
|
Guard complex sqrt on old MSVC compilers.
|
2023-02-16 19:47:00 +00:00 |
|
Charles Schlosser
|
94b19dc5f2
|
Add CArg
|
2023-02-15 21:33:06 +00:00 |
|
Charles Schlosser
|
71a8e60a7a
|
Tweak pasin_float, fix psqrt_complex
|
2023-02-15 01:01:14 +00:00 |
|
Antonio Sánchez
|
384269937f
|
More NEON packetmath fixes.
|
2023-02-14 21:45:25 +00:00 |
|
Antonio Sánchez
|
2dfbf1b251
|
Fix NEON make_packet2f.
|
2023-02-14 16:52:07 +00:00 |
|
Chip Kerchner
|
4a03409569
|
Fix problem with array conversions BF16->F32 in Power.
|
2023-02-13 21:30:45 +00:00 |
|
Rasmus Munk Larsen
|
77b48c440e
|
Fix compiler warnings.
|
2023-02-10 20:46:23 +00:00 |
|
Chip Kerchner
|
0ecae61568
|
Disable array BF16 to F32 conversions in Power
|
2023-02-10 20:06:58 +00:00 |
|
Charles Schlosser
|
c999284bad
|
Print diagonal matrix
|
2023-02-10 18:07:29 +00:00 |
|
Chip Kerchner
|
fba12e02b3
|
Fold extra column calculations into an extra MMA accumulator and other bfloat16 MMA GEMM improvements
|
2023-02-10 17:32:06 +00:00 |
|
Chip Kerchner
|
79cfc74f4d
|
Revert ODR changes and make gemm_extra_cols and gemm_complex_extra_cols EIGEN_ALWAYS_INLINE to avoid external functions.
|
2023-02-10 17:05:07 +00:00 |
|
Alexander Grund
|
f9659d91f1
|
Fix ODR violation with gemm_extra_cols on PPC
|
2023-02-09 22:16:06 +00:00 |
|
Charles Schlosser
|
325e3063d9
|
Optimize psign
|
2023-02-09 22:15:26 +00:00 |
|
Charles Schlosser
|
0e490d452d
|
Update file ColPivHouseholderQR_LAPACKE.h
|
2023-02-09 13:45:56 +00:00 |
|
Antonio Sánchez
|
0a5392d606
|
Fix MSVC arm build.
|
2023-02-08 21:46:37 +00:00 |
|
Antonio Sánchez
|
3f7e775715
|
Add IWYU export pragmas to top-level headers.
|
2023-02-08 17:40:31 +00:00 |
|
Rasmus Munk Larsen
|
e4f58816d9
|
Get rid of custom implementation of equal_to and not_equal_no. No longer needed with c+14.
|
2023-02-07 21:36:44 -08:00 |
|
Antonio Sánchez
|
e256ad1823
|
Remove LGPL Code and references.
|
2023-02-08 01:25:06 +00:00 |
|
Chip Kerchner
|
e71f88abce
|
Change in Power eigen_asserts to eigen_internal_asserts since it is putting unnecessary error checking and assertions without NDEBUG.
|
2023-02-08 00:57:30 +00:00 |
|
Gregory Kramida
|
232b18fa8a
|
Fixes #2602
|
2023-02-06 22:52:39 +00:00 |
|
Antonio Sánchez
|
f6cc359e10
|
More EIGEN_DEVICE_FUNC fixes for CUDA 10/11/12.
|
2023-02-03 19:18:45 +00:00 |
|
Charles Schlosser
|
2a90653395
|
fix lapacke config
|
2023-02-03 16:40:08 +00:00 |
|
Jeremy Nimmer
|
13a1f25da9
|
Revert StlIterators edit from "Fix undefined behavior..."
|
2023-02-01 20:01:36 +00:00 |
|
Charles Schlosser
|
fd2fd48703
|
Update file ForwardDeclarations.h
|
2023-02-01 16:52:20 +00:00 |
|
Rasmus Munk Larsen
|
37b2e97175
|
Tweak special case handling in atan2.
|
2023-01-31 17:48:00 -08:00 |
|
Jeremy Nimmer
|
a1cdcdb038
|
Fix undefined behavior in Block access
|
2023-02-01 00:40:45 +00:00 |
|
Chip Kerchner
|
4a58f30aa0
|
Fix pre-POWER8_VECTOR bugs in pcmp_lt and pnegate and reactivate psqrt.
|
2023-01-31 19:40:24 +00:00 |
|
Rasmus Munk Larsen
|
12ad99ce60
|
Remove unused variables from GenericPacketMathFunctions.h
|
2023-01-29 18:10:28 +00:00 |
|
Charles Schlosser
|
6987a200bb
|
Fix stupid sparse bugs with outerSize == 0
|
2023-01-28 02:03:09 +00:00 |
|
Charles Schlosser
|
0471e61b4c
|
Optimize various mathematical packet ops
|
2023-01-28 01:34:26 +00:00 |
|
Charles Schlosser
|
1aa6dc2007
|
Fix sparse warnings
|
2023-01-27 22:47:42 +00:00 |
|
Antonio Sánchez
|
17ae83a966
|
Fix bugs exposed by enabling GPU asserts.
|
2023-01-27 21:43:00 +00:00 |
|
Chip Kerchner
|
ab8725d947
|
Turn off vectorize version of rsqrt - doesn't match generic version
|
2023-01-27 18:28:54 +00:00 |
|
Charles Schlosser
|
6d9f662a70
|
Tweak atan2
|
2023-01-26 17:38:21 +00:00 |
|
Chip Kerchner
|
6fc9de7d93
|
Fix slowdown in bfloat16 MMA when rows is not a multiple of 8 or columns is not a multiple of 4.
|
2023-01-25 18:22:20 +00:00 |
|
Charles Schlosser
|
7f58bc98b1
|
Refactor sparse
|
2023-01-23 17:55:50 +00:00 |
|
Rasmus Munk Larsen
|
576448572f
|
More fixes for __GNUC_PATCHLEVEL__.
|
2023-01-23 17:04:24 +00:00 |
|
Rasmus Munk Larsen
|
164ddf75ab
|
Use __GNUC_PATCHLEVEL__ rather than __GNUC_PATCH__, according to the documentation https://gcc.gnu.org/onlinedocs/cpp/Common-Predefined-Macros.html
|
2023-01-23 16:56:14 +00:00 |
|
Charles Schlosser
|
5a7ca681d5
|
Fix sparse insert
|
2023-01-20 21:32:32 +00:00 |
|
Antonio Sánchez
|
08c961e837
|
Add custom ODR-safe assert.
|
2023-01-20 17:38:13 +00:00 |
|
Sean McBride
|
d70b4864d9
|
issue #2581: review and cleanup of compiler version checks
|
2023-01-17 18:58:34 +00:00 |
|
Mehdi Goli
|
b523120687
|
[SYCL-2020 Support] Enabling Intel DPCPP Compiler support to Eigen
|
2023-01-16 07:04:08 +00:00 |
|
tttapa
|
bae119bb7e
|
Support per-thread is_malloc_allowed() state
|
2023-01-16 01:34:56 +00:00 |
|
Charles Schlosser
|
fa0bd2c34e
|
improve sparse permutations
|
2023-01-15 03:21:25 +00:00 |
|
Antonio Sánchez
|
2e61c0c6b4
|
Add missing EIGEN_DEVICE_FUNC in a few places when called by asserts.
|
2023-01-15 02:06:17 +00:00 |
|
Charles Schlosser
|
4aca06f63a
|
avoid move assignment in ColPivHouseholderQR
|
2023-01-15 01:34:10 +00:00 |
|
Charles Schlosser
|
68082b8226
|
Fix QR, again
|
2023-01-13 03:23:17 +00:00 |
|
Sergey Fedorov
|
4d05765345
|
Altivec fixes for Darwin: do not use unsupported VSX insns
|
2023-01-12 16:33:33 +00:00 |
|
Rasmus Munk Larsen
|
6156797016
|
Revert "Add template to specify QR permutation index type, Fix ColPivHouseholderQR Lapacke bindings"
This reverts commit be7791e097
|
2023-01-11 18:50:52 +00:00 |
|
Charles Schlosser
|
be7791e097
|
Add template to specify QR permutation index type, Fix ColPivHouseholderQR Lapacke bindings
|
2023-01-11 15:57:28 +00:00 |
|
Charles Schlosser
|
9463fc95f4
|
change insert strategy
|
2023-01-11 06:24:49 +00:00 |
|
Martin Burchell
|
c54785b071
|
Fix error: unused parameter 'tmp' [-Werror,-Wunused-parameter] on clang/32-bit arm
|
2023-01-10 21:15:28 +00:00 |
|
Charles Schlosser
|
81172cbdcb
|
Overhaul Sparse Core
|
2023-01-07 22:09:42 +00:00 |
|
Chip Kerchner
|
d20fe21ae4
|
Improve performance for Power10 MMA bfloat16 GEMM
|
2023-01-06 23:08:37 +00:00 |
|
Ryan Senanayake
|
fe7f527787
|
Fix guard macros for emulated FP16 operators on GPU
|
2023-01-06 22:02:51 +00:00 |
|
Antonio Sánchez
|
262194f12c
|
Fix a bunch of minor build and test issues.
|
2023-01-06 16:37:26 +00:00 |
|
Antonio Sánchez
|
3564668908
|
Fix overalign check.
|
2023-01-05 17:10:48 +00:00 |
|
Charles Schlosser
|
f3929ac7ed
|
Fix EIGEN_HAS_CXX17_OVERALIGN for icc
|
2023-01-03 17:30:10 +00:00 |
|
Charles Schlosser
|
a8bab0d8ae
|
Patch SparseLU
|
2022-12-31 04:52:36 +00:00 |
|
Arthur
|
311cc0f9cc
|
Enable NEON pcmp, plset, and complex psqrt
|
2022-12-22 05:38:34 +00:00 |
|
Antonio Sánchez
|
dbf7ae6f9b
|
Fix up C++ version detection macros and cmake tests.
|
2022-12-20 18:06:03 +00:00 |
|
Antonio Sánchez
|
bb6675caf7
|
Fix incorrect NEON native fp16 multiplication.
|
2022-12-19 20:46:44 +00:00 |
|
Rasmus Munk Larsen
|
dd85d26946
|
Revert "Avoid mixing types in CompressedStorage.h"
|
2022-12-19 20:09:37 +00:00 |
|
Arthur Feeney
|
c4fb6af24b
|
Enable NEON pabs for unsigned int types
|
2022-12-19 17:07:36 +00:00 |
|
Rasmus Munk Larsen
|
04e4f0bb24
|
Add missing colon in SparseMatrix.h.
|
2022-12-16 21:50:00 +00:00 |
|
Rasmus Munk Larsen
|
3d8a8def8a
|
Avoid mixing types in CompressedStorage.h
|
2022-12-16 20:11:02 +00:00 |
|
Charles Schlosser
|
4bb2446796
|
Add operators to CompressedStorageIterator
|
2022-12-16 16:48:50 +00:00 |
|
Alexander Richardson
|
37de432907
|
Avoid using std::raise() for divide by zero
|
2022-12-14 20:06:16 +00:00 |
|
Alexander Richardson
|
62de593c40
|
Allow std::initializer_list constructors in constexpr expressions
|
2022-12-14 17:05:37 +00:00 |
|
Charles Schlosser
|
6d3e3678b4
|
optimize equalspace packetop
|
2022-12-13 01:22:25 +00:00 |
|
Charles Schlosser
|
2004831941
|
add EqualSpaced / setEqualSpaced
|
2022-12-13 00:54:57 +00:00 |
|
Melven Roehrig-Zoellner
|
273f803846
|
Add BDCSVD_LAPACKE binding
|
2022-12-09 18:50:12 +00:00 |
|
Antonio Sánchez
|
03c9b4738c
|
Enable direct access for NestByValue.
|
2022-12-07 18:21:45 +00:00 |
|
Chip Kerchner
|
b59f18b4f7
|
Increase L2 and L3 cache size for Power10.
|
2022-12-07 18:20:33 +00:00 |
|
Charles Schlosser
|
44fe539150
|
add sparse sort inner vectors function
|
2022-12-01 19:28:56 +00:00 |
|
Lianhuang Li
|
d194167149
|
Fix the bug using neon instruction fmla for data type half
|
2022-12-01 17:28:57 +00:00 |
|
Pedro Caldeira
|
31ab62d347
|
Add support for Power10 (AltiVec) MMA instructions for bfloat16.
|
2022-11-30 23:33:37 +00:00 |
|
Antonio Sánchez
|
dcb042a87d
|
Fix serialization for non-compressed matrices.
|
2022-11-30 18:16:47 +00:00 |
|
Antonio Sánchez
|
2260e11eb0
|
Fix reshape strides when input has non-zero inner stride.
|
2022-11-29 19:39:29 +00:00 |
|
Alexandre Hoffmann
|
23524ab6fc
|
Changing BiCGSTAB parameters initialization so that it works with custom types
|
2022-11-29 19:37:46 +00:00 |
|
Antonio Sánchez
|
ab2b26fbc2
|
Fix sparseLU solver when destination has a non-unit stride.
|
2022-11-29 19:37:03 +00:00 |
|
Antonio Sánchez
|
e7b1ad0315
|
Add serialization for sparse matrix and sparse vector.
|
2022-11-21 19:43:07 +00:00 |
|
Charles Schlosser
|
044f3f6234
|
Fix bug in handmade_aligned_realloc
|
2022-11-18 22:35:31 +00:00 |
|
Charles Schlosser
|
02805bd56c
|
Fix AVX2 psignbit
|
2022-11-16 13:43:11 +00:00 |
|
Chip Kerchner
|
399ce1ed63
|
Fix duplicate execution code for Power 8 Altivec in pstore_partial.
|
2022-11-16 13:41:42 +00:00 |
|
Gabriele Buondonno
|
6431dfdb50
|
Cross product for vectors of size 2. Fixes #1037
|
2022-11-15 22:39:42 +00:00 |
|
Antonio Sánchez
|
8588d8c74b
|
Correct pnegate for floating-point zero.
|
2022-11-15 18:07:23 +00:00 |
|
Antonio Sanchez
|
5eacb9e117
|
Put brackets around unsigned type names.
|
2022-11-15 09:09:45 -08:00 |
|
Antonio Sánchez
|
37e40dca85
|
Fix ambiguity in PPC for vec_splats call.
|
2022-11-14 18:58:16 +00:00 |
|
Antonio Sánchez
|
7dc6db75d4
|
Fix typo in CholmodSupport
|
2022-11-08 23:49:56 +00:00 |
|
Charles Schlosser
|
9b6d624eab
|
fix neon
|
2022-11-08 20:03:01 +00:00 |
|
Rasmus Munk Larsen
|
7e398e9436
|
Add missing return keyword in psignbit for NEON.
|
2022-11-04 16:13:09 +00:00 |
|
Charles Schlosser
|
82b152dbe7
|
Add signbit function
|
2022-11-04 00:31:20 +00:00 |
|