eigen

CFD/eigen

Author	SHA1	Message	Date
hyunggi-sv	02a0e79c70	fix:typo in dox (has->have)	2021-08-03 00:45:00 +00:00
Antonio Sanchez	9816fe59b4	Fix assignment operator issue for latest MSVC+NVCC. Details are scattered across #920, #1000, #1324, #2291. Summary: some MSVC versions have a bug that requires omitting explicit `operator=` definitions (leads to duplicate definition errors), and some MSVC versions require adding explicit `operator=` definitions (otherwise implicitly deleted errors). This mess tries to cover all the cases encountered. Fixes #2291.	2021-08-03 00:26:10 +00:00
Antonio Sanchez	de2e62c62d	Disable vectorization of comparisons except for bool. Packet input/output types must currently be the same, and since these have a return type of `bool`, vectorization will only work if input is bool.	2021-07-25 13:39:50 -07:00
derekjchow	66ca41bd47	Add support for vectorizing logical comparisons.	2021-07-23 20:07:48 +00:00
arthurfeeney	a77638387d	Fixes #1387 for compilation error in JacobiSVD with HouseholderQRPreconditioner that occurs when input is a compile-time row vector.	2021-07-20 20:11:22 +00:00
Antonio Sanchez	297f0f563d	Fix explicit default cache size typo.	2021-07-20 11:40:17 -07:00
Rohit Santhanam	beea14a18f	Enable extract et. al. for HIP GPU.	2021-07-09 14:58:07 +00:00
Rasmus Munk Larsen	0c361c4899	Defer to std::fill_n when filling a dense object with a constant value.	2021-07-09 03:59:35 +00:00
Antonio Sanchez	1e6c6c1576	Replace memset with fill to work for non-trivial scalars. For custom scalars, zero is not necessarily represented by a zeroed-out memory block (e.g. gnu MPFR). We therefore cannot rely on `memset` if we want to fill a matrix or tensor with zeroes. Instead, we should rely on `fill`, which for trivial types does end up getting converted to a `memset` under-the-hood (at least with gcc/clang). Requires adding a `fill(begin, end, v)` to `TensorDevice`. Replaced all potentially bad instances of memset with fill. Fixes #2245.	2021-07-08 18:34:41 +00:00
Guoqiang QI	4bcd42c271	Make a copy of input matrix when try to do the inverse in place, this fixes #2285 .	2021-07-08 17:05:26 +00:00
Rasmus Munk Larsen	7b35638ddb	Fix breakage of conj_helper in conjunction with custom types introduced in !537 .	2021-07-02 20:42:15 +00:00
Rasmus Munk Larsen	bbfc4d54cd	Use `padd` instead of `+`.	2021-07-02 02:51:48 +00:00
Rasmus Munk Larsen	9312a5bf5c	Implement a generic vectorized version of Smith's algorithms for complex division.	2021-07-01 23:31:12 +00:00
Antonio Sanchez	6035da5283	Fix compile issues for gcc 4.8. - Move constructors can only be defaulted as NOEXCEPT if all members have NOEXCEPT move constructors. - gcc 4.8 has some funny parsing bug in `a < b->c`, thinking `b-` is a template parameter.	2021-07-01 22:58:14 +00:00
Antonio Sanchez	154f00e9ea	Fix inverse nullptr/asan errors for LU. For empty or single-column matrices, the current `PartialPivLU` currently dereferences a `nullptr` or accesses memory out-of-bounds. Here we adjust the checks to avoid this.	2021-07-01 13:41:04 -07:00
Dan Miller	eb04775903	Fix duplicate definitions on Mac	2021-07-01 14:54:12 +00:00
Chip Kerchner	91e99ec1e0	Create the ability to disable the specialized gemm_pack_rhs in Eigen (only PPC) for TensorFlow	2021-06-30 23:05:04 +00:00
Alexander Karatarakis	60400334a9	Make DenseStorage<> trivially_copyable	2021-06-30 04:27:51 +00:00
大河メタル	c81da59a25	Correct declarations for aarch64-pc-windows-msvc	2021-06-30 04:09:46 +00:00
Rasmus Munk Larsen	5aebbe9098	Get rid of redundant `pabs` instruction in complex square root.	2021-06-29 23:26:15 +00:00
Rohit Santhanam	2d132d1736	Commit `52a5f982` broke conjhelper functionality for HIP GPUs. This commit addresses this.	2021-06-25 19:28:00 +00:00
Rasmus Munk Larsen	bffd267d17	Small cleanup: Get rid of the macros EIGEN_HAS_SINGLE_INSTRUCTION_CJMADD and CJMADD, which were effectively unused, apart from on x86, where the change results in identically performing code.	2021-06-24 18:52:17 -07:00
Rasmus Munk Larsen	52a5f98212	Get rid of code duplication for conj_helper. For packets where LhsType=RhsType a single generic implementation suffices. For scalars, the generic implementation of pconj automatically forwards to numext::conj, so much of the existing specialization can be avoided. For mixed types we still need specializations.	2021-06-24 15:47:48 -07:00
Rasmus Munk Larsen	4ad30a73fc	Use internal::ref_selector to avoid holding a reference to a RHS expression.	2021-06-22 14:31:32 +00:00
Antonio Sanchez	35a367d557	Fix fix<> for gcc-4.9.3. There's a missing `EIGEN_HAS_CXX14` -> `EIGEN_HAS_CXX14_VARIABLE_TEMPLATES` replacement. Fixes ##2267	2021-06-18 13:22:54 -07:00
Antonio Sanchez	12e8d57108	Remove pset, replace with ploadu. We can't make guarantees on alignment for existing calls to `pset`, so we should default to loading unaligned. But in that case, we should just use `ploadu` directly. For loading constants, this load should hopefully get optimized away. This is causing segfaults in Google Maps.	2021-06-16 18:41:17 -07:00
Chip-Kerchner	ef1fd341a8	EIGEN_STRONG_INLINE was NOT inlining in some critical needed areas (6.6X slowdown) when used with Tensorflow. Changing to EIGEN_ALWAYS_INLINE where appropiate.	2021-06-16 16:30:31 +00:00
Antonio Sanchez	9e94c59570	Add missing ppc pcmp_lt_or_nan<Packet8bf>	2021-06-15 13:42:17 -07:00
Rasmus Munk Larsen	13fb5ab92c	Fix more enum arithmetic.	2021-06-15 09:09:31 -07:00
Antonio Sanchez	ad82d20cf6	Fix checking of version number for mingw. MinGW spits out version strings like: `x86_64-w64-mingw32-g++ (GCC) 10-win32 20210110`, which causes the version extraction to fail. Added support for this with tests. Also added `make_unsigned` for `long long`, since mingw seems to use that for `uint64_t`. Related to #2268. CMake and build passes for me after this.	2021-06-11 23:19:10 +00:00
Rasmus Munk Larsen	fc87e2cbaa	Use bit_cast to create -0.0 for floating point types to avoid compiler optimization changing sign with --ffast-math enabled.	2021-06-11 02:35:53 +00:00
Rasmus Munk Larsen	f64b2954c7	Fix c++20 warnings about using enums in arithmetic expressions.	2021-06-10 17:17:39 -07:00
Cyril Kaiser	91cd67f057	Remove EIGEN_DEVICE_FUNC from CwiseBinaryOp's default copy constructor.	2021-05-26 19:28:13 +00:00
Antonio Sanchez	dba753a986	Add missing NEON ptranspose implementations. Unified implementation using only `vzip`.	2021-05-25 18:25:35 +00:00
Antonio Sanchez	ebb300d0b4	Modify Unary/Binary/TernaryOp evaluators to work for non-class types. This used to work for non-class types (e.g. raw function pointers) in Eigen 3.3. This was changed in commit `11f55b29` to optimize the evaluator: > `sizeof((A-B).cwiseAbs2())` with A,B Vector4f is now 16 bytes, instead of 48 before this optimization. though I cannot reproduce the 16 byte result. Both before the change and after, with multiple compilers/versions, I always get a result of 40 bytes. https://godbolt.org/z/MsjTc1PGe This change modifies the code slightly to allow non-class types. The final generated code is identical, and the expression remains 40 bytes for the `abs2` sample case. Fixes #2251	2021-05-23 12:44:37 -07:00
Steve Bronder	1720057023	Adds macro for checking if C++14 variable templates are supported	2021-05-21 16:25:32 +00:00
Niall Murphy	391094c507	Use derived object type in conservative_resize_like_impl When calling conservativeResize() on a matrix with DontAlign flag, the temporary variable used to perform the resize should have the same Options as the original matrix to ensure that the correct override of swap is called (i.e. PlainObjectBase::swap(DenseBase<OtherDerived> & other). Calling the base class swap (i.e in DenseBase) results in assertions errors or memory corruption.	2021-05-20 23:17:02 +00:00
Nathan Luehr	7e6a1c129c	Device implementation of log for std::complex types.	2021-05-11 22:02:21 +00:00
Nathan Luehr	6753f0f197	Fix ambiguity due to argument dependent lookup.	2021-05-11 15:41:11 -05:00
guoqiangqi	3d9051ea84	Changing the storage of the SSE complex packets to that of the wrapper. This should fix #2242 .	2021-05-10 23:53:16 +00:00
Rohit Santhanam	39ec31c0ad	Fix for issue where numext::imag and numext::real are used before they are defined.	2021-05-10 19:48:32 +00:00
Antonio Sanchez	c0eb5f89a4	Restore ABI compatibility for conj with 3.3, fix conflict with boost. The boost library unfortunately specializes `conj` for various types and assumes the original two-template-parameter version. This changes restores the second parameter. This also restores ABI compatibility. The specialization for `std::complex` is because `std::conj` is not a device function. For custom complex scalar types, users should provide their own `conj` implementation. We may consider removing the unnecessary second parameter in the future - but this will require modifying boost as well. Fixes #2112.	2021-05-07 18:14:00 +00:00
Antonio Sanchez	90e9a33e1c	Fix numext::arg return type. The cxx11 path for `numext::arg` incorrectly returned the complex type instead of the real type, leading to compile errors. Fixed this and added tests. Related to !477, which uncovered the issue.	2021-05-07 16:26:57 +00:00
Christoph Hertzberg	722ca0b665	Revert addition of unused `paddsub<Packet2cf>`. This fixes #2242	2021-05-06 18:36:47 +02:00
Antonio Sanchez	1c013be2cc	Better CUDA complex division. The original produced NaNs when dividing 0/b for subnormal b. The `complex_divide_stable` was changed to use the more common Smith's algorithm.	2021-04-29 17:39:58 +00:00
Antonio Sanchez	172db7bfc3	Add missing pcmp_lt_or_nan for NEON Packet4bf.	2021-04-27 14:12:11 -07:00
Jakub Lichman	d87648a6be	Tests added and AVX512 bug fixed for pcmp_lt_or_nan	2021-04-25 20:58:56 +00:00
Antonio Sanchez	d213a0bcea	DenseStorage safely copy/swap. Fixes #2229. For dynamic matrices with fixed-sized storage, only copy/swap elements that have been set. Otherwise, this leads to inefficient copying, and potential UB for non-initialized elements.	2021-04-22 18:45:19 +00:00
Rasmus Munk Larsen	85a76a16ea	Make vectorized compute_inverse_size4 compile with AVX.	2021-04-22 15:21:01 +00:00
Chip-Kerchner	06c2760bd1	Fix taking address of rvalue compiler issue with TensorFlow (plus other warnings).	2021-04-21 00:47:13 +00:00
Jakub Lichman	2b1dfd1ba0	HasExp added for AVX512 Packet8d	2021-04-20 19:07:58 +00:00
Antonio Sanchez	1d79c68ba0	Fix ldexp for AVX512 (#2215 ) Wrong shuffle was used. Need to interleave low/high halves with a `permute` instruction. Fixes #2215.	2021-04-20 16:25:22 +00:00
David Tellenbach	3e819d83bf	Before 3.4 branch	2021-04-18 23:36:14 +02:00
Christoph Hertzberg	9357feedc7	Avoid using uninitialized inputs and if available, use slightly more efficient `movsd` instruction for `pset1<Packet2cf>`.	2021-04-13 01:36:59 +02:00
Christoph Hertzberg	1e1c8a735c	Use EIGEN_HAS_CXX11 and EIGEN_COMP_CXXVER macros to detect C++ version for `std::result_of` and `std::invoke_result`. Fixes #2209	2021-04-12 01:26:15 +00:00
Christoph Hertzberg	d58678069c	Make iterators default constructible and assignable, by making...	2021-04-09 17:03:28 +00:00
Antonio Sanchez	fcb5106c6e	Scaled epsilon the wrong way. Should have been 0.5 to widen the bounds, since this is inverse precision. Setting to 0.5, however, leads to many more failing tests at Google, so reverting to 1 for now.	2021-04-07 15:08:39 -07:00
Christoph Hertzberg	6197ce1a35	Replace `-2147483648` by `-0.0f` or `-0.0` constants (this should fix #2189 ). Also, remove unnecessary `pgather` operations.	2021-04-07 11:25:27 +00:00
Rasmus Munk Larsen	22edb46823	Align local arrays to Packet boundary.	2021-04-06 16:22:36 +00:00
Antonio Sanchez	90187a33e1	Fix SelfAdjoingEigenSolver (#2191 ) Adjust the relaxation step to use the condition ``` abs(subdiag[i]) <= epsilon * sqrt(abs(diag[i]) + abs(diag[i+1])) ``` for setting the subdiagonal entry to zero. Also adjust Wilkinson shift for small `e = subdiag[end-1]` - I couldn't find a reference for the original, and it was not consistent with the Wilkinson definition. Fixes #2191.	2021-04-05 11:19:09 -07:00
Rasmus Munk Larsen	3ddc0974ce	Fix two bugs in commit	2021-04-02 22:06:27 +00:00
Chip Kerchner	c24bee6120	Fix address of temporary object errors in clang11. This fixes the problem with taking the address of temporary objects which clang11 treats as errors.	2021-04-02 16:27:08 +00:00
Rasmus Munk Larsen	5bbc9cea93	Add an info() method to the SVDBase class to make it possible to tell the user that the computation failed, possibly due to invalid input. Make Jacobi and divide-and-conquer fail fast and return info() == InvalidInput if the matrix contains NaN or +/-Inf.	2021-03-31 21:09:19 +00:00
Antonio Sanchez	78ee3d6261	Fix CUDA constexpr issues for numeric_limits. Some CUDA/HIP constants fail on device with `constexpr` since they internally rely on non-constexpr functions, e.g. ``` \#define CUDART_INF_F __int_as_float(0x7f800000) ``` This fails for cuda-clang (though passes with nvcc). These constants are currently used by `device::numeric_limits`. For portability, we need to remove `constexpr` from the affected functions. For C++11 or higher, we should be able to rely on the `std::numeric_limits` versions anyways, since the methods themselves are now `constexpr`, so should be supported on device (clang/hipcc natively, nvcc with `--expr-relaxed-constexpr`).	2021-03-30 18:01:27 +00:00
Antonio Sanchez	af1247fbc1	Use Index type in loop over coefficients. Previously was `int`. Brought up by Kyle Snow (Polaris Geospatial Services) on the mailing list.	2021-03-29 17:40:55 +00:00
Antonio Sanchez	87729ea39f	Eliminate `round_impl` double-promotion warnings for c++03.	2021-03-25 16:52:19 +00:00
Deven Desai	748489ef9c	Un-defining EIGEN_HAS_CONSTEXPR on the HIP platform The Eigen unit-tests started failing on the HIP/ROCm platform, after the following commit `e7b8643d70` ``` In file included from /home/rocm-user/eigen/test/main.h:360: In file included from /home/rocm-user/eigen/Eigen/QR:11: In file included from /home/rocm-user/eigen/Eigen/Core:162: /home/rocm-user/eigen/Eigen/src/Core/util/Meta.h:300:17: error: constexpr function never produces a constant expression [-Winvalid-constexpr] static float (max)() { ^ /home/rocm-user/eigen/Eigen/src/Core/util/Meta.h:304:12: note: non-constexpr function '__int_as_float' cannot be used in a constant expression return HIPRT_MAX_NORMAL_F; ^ /home/rocm-user/eigen/Eigen/src/Core/arch/HIP/hcc/math_constants.h:14:28: note: expanded from macro 'HIPRT_MAX_NORMAL_F' #define HIPRT_MAX_NORMAL_F __int_as_float(0x7f7fffff) ^ /opt/rocm/hip/include/hip/hcc_detail/device_functions.h:913:32: note: declared here __device__ static inline float __int_as_float(int x) { ^ ``` The problem seems to that some of the constants defined in the HIP `math_constants.h` have a call to `__int_as_float` routine which is not declared `constexpr` in the HIP runtime header file. Working around this issue for now, be skipping the const_expr support (enabled via the above commit) on HIP	2021-03-25 13:45:52 +00:00
Chip Kerchner	d59ef212e1	Fixed performance issues for complex VSX and P10 MMA in gebp_kernel (level 3).	2021-03-25 11:08:19 +00:00
Steve Bronder	e7b8643d70	Revert "Revert "Adds EIGEN_CONSTEXPR and EIGEN_NOEXCEPT to rows(), cols(), innerStride(), outerStride(), and size()"" This reverts commit `5f0b4a4010`.	2021-03-24 18:14:56 +00:00
Christoph Hertzberg	69a4f70956	Revert "Uses _mm512_abs_pd for Packet8d pabs" This reverts commit `f019b97aca`	2021-03-23 18:52:19 +00:00
David Tellenbach	4811e81966	Remove yet another comma at end of enum	2021-03-18 23:30:00 +01:00
Steve Bronder	f019b97aca	Uses _mm512_abs_pd for Packet8d pabs	2021-03-18 15:47:52 +00:00
Niek Bouman	ed964ba3f1	Proposed fix for issue #2187	2021-03-18 00:55:36 +00:00
Antonio Sanchez	8dfe1029a5	Augment NumTraits with min/max_exponent() again. Replace usage of `std::numeric_limits<...>::min/max_exponent` in codebase where possible. Also replaced some other `numeric_limits` usages in affected tests with the `NumTraits` equivalent. The previous MR !443 failed for c++03 due to lack of `constexpr`. Because of this, we need to keep around the `std::numeric_limits` version in enum expressions until the switch to c++11. Fixes #2148	2021-03-16 20:12:46 -07:00
David Tellenbach	eb71e5db98	Fix another warning on missing commas	2021-03-17 03:07:04 +01:00
David Tellenbach	df4bc2731c	Revert "Augment NumTraits with min/max_exponent()." This reverts commit `75ce9cd2a7`.	2021-03-17 03:06:08 +01:00
Antonio Sanchez	75ce9cd2a7	Augment NumTraits with min/max_exponent(). Replace usage of `std::numeric_limits<...>::min/max_exponent` in codebase. Also replaced some other `numeric_limits` usages in affected tests with the `NumTraits` equivalent. Fixes #2148	2021-03-17 01:00:41 +00:00
David Tellenbach	9fb7062440	Silence warning on comma at end of enumerator list	2021-03-17 01:46:52 +01:00
Theo Fletcher	b8502a9dd6	Updated SelfAdjointEigenSolver documentation to include that the eigenvectors matrix is unitary.	2021-03-16 18:48:02 +00:00
Rasmus Munk Larsen	2e83cbbba9	Add NaN propagation options to minCoeff/maxCoeff visitors.	2021-03-16 17:02:50 +00:00
Antonio Sanchez	f612df2736	Add fmod(half, half). This is to support TensorFlow's `tf.math.floormod` for half.	2021-03-15 13:32:24 -07:00
Antonio Sanchez	14b7ebea11	Fix numext::round pre c++11 for large inputs. This is to resolve an issue for large inputs when +0.5 can actually lead to +1 if the input doesn't have enough precision to resolve the addition - leading to an off-by-one error. See discussion on `9a663973`.	2021-03-15 19:08:04 +00:00
Chip Kerchner	c9d4367fa4	Fix pround and add print	2021-03-15 19:07:43 +00:00
Antonio Sanchez	d24f9f9b55	Fix NVCC+ICC issues. NVCC does not understand `__forceinline`, so we need to use `inline` when compiling for GPU. ICC specializes `std::complex` operators for `float` and `double` by default, which cannot be used on device and conflict with Eigen's workaround in CUDA/Complex.h. This can be prevented by defining `_OVERRIDE_COMPLEX_SPECIALIZATION_` before including `<complex>`. Added this define to the tests and to `Eigen/Core`, but this will not work if the user includes `<complex>` before `<Eigen/Core>`. ICC also seems to generate a duplicate `Map` symbol in `PlainObjectBase`: ``` error: "Map" has already been declared in the current scope static ConstMapType Map(const Scalar *data) ``` I tracked this down to `friend class Eigen::Map`. Putting the `friend` statements at the bottom of the class seems to resolve this issue. Fixes #2180	2021-03-15 18:42:04 +00:00
Antonio Sanchez	14487ed14e	Add increment/decrement operators to Eigen::half. This is for consistency with bfloat16, and to support initialization with `std::iota`.	2021-03-15 10:52:23 -07:00
Antonio Sanchez	d098c4d64c	Disable EIGEN_OPTIMIZATION_BARRIER for PPC clang. Doesn't seem to correctly select the register type, and most types lead to compiler crashes.	2021-03-10 16:05:01 -08:00
Antonio Sanchez	543e34ab9d	Re-implement move assignments. The original swap approach leads to potential undefined behavior (reading uninitialized memory) and results in unnecessary copying of data for static storage. Here we pass down the move assignment to the underlying storage. Static storage does a one-way copy, dynamic storage does a swap. Modified the tests to no longer read from the moved-from matrix/tensor, since that can lead to UB. Added a test to ensure we do not access uninitialized memory in a move. Fixes: #2119	2021-03-10 16:55:20 +00:00
Ben Niu	b8d1857f0d	[MSVC-specific] Define EIGEN_ARCH_x86_64 for native x64 (_M_X64 is defined and _M_ARM64EC is not), and define EIGEN_ARCH_ARM64 for both the native ARM64 (_M_ARM64 is defined) or ARM64EC (_M_ARM64EC is defined). _M_ARM64EC is defined when the code is compiled by MSVC for ARM64EC, a new ARM64 ABI designed to be compatible with x64 application emulation on ARM64. If _M_ARM64EC is defined, _M_X64 and _M_AMD64 are also defined, so x64-specific code (especially intrinsics) is also compiled to ARM64 instructions (compliant with the ARM64EC ABI) for maximum x64 compatibility. Although a majority of x64-specific intrinsics can emulated by ARM64 instructions, it is still a good to simply recompile the native ARM64 code paths to ARM64EC for pure computation tasks, for performance reasons.	2021-03-10 10:21:31 +00:00
Antonio Sanchez	853a5c4b84	Fix ambiguous call to CUDA __half constructor.	2021-03-08 21:06:28 -08:00
Antonio Sanchez	94327dbfba	Fix typo: DEVICE -> GPU	2021-03-08 11:21:00 -08:00
Antonio Sanchez	1296abdf82	Fix non-trivial Half constructor for CUDA. Both CUDA and HIP require trivial default constructors for types used in shared memory. Otherwise failing with ``` error: initialization is not supported for __shared__ variables. ```	2021-03-08 07:32:54 -08:00
Antonio Sanchez	6045243141	Revert stack allocation limit change that crept in. This was accidentally introduced when copying changes between repos.	2021-03-05 14:29:37 -08:00
Deven Desai	1a96d49afe	Changing the Eigen::half implementation for HIP Currently, when compiling with HIP, Eigen::half is derived from the `__half_raw` struct that is defined within the hip_fp16.h header file. This is true for both the "host" compile phase and the "device" compile phase. This was causing a very hard to detect bug in the ROCm TensorFlow build. In the ROCm Tensorflow build, * files that do not contain ant GPU code get compiled via gcc, and * files that contnain GPU code get compiled via hipcc. In certain case, we have a function that is defined in a file that is compiled by hipcc, and is called in a file that is compiled by gcc. If such a function had Eigen::half has a "pass-by-value" argument, its value was getting corrupted, when received by the function. The reason for this seems to be that for the gcc compile, Eigen::half is derived from a `__half_raw` struct that has `uint16_t` as the data-store, and for hipcc the `__half_raw` implementation uses `_Float16` as the data store. There is some ABI incompatibility between gcc / hipcc (which is essentially latest clang), which results in the Eigen::half value (which is correct at the call-site) getting randomly corrupted when passed to the function. Changing the Eigen::half argument to be "pass by reference" seems to workaround the error. In order to fix it such that we do not run into it again in TF, this commit changes the Eigne::half implementation to use the same `__half_raw` implementation as the non-GPU compile, during host compile phase of the hipcc compile.	2021-03-05 19:27:13 +00:00
Antonio Sanchez	2468253c9a	Define EIGEN_CPLUSPLUS and replace most __cplusplus checks. The macro `__cplusplus` is not defined correctly in MSVC unless building with the the `/Zc:__cplusplus` flag. Instead, it defines `_MSVC_LANG` to the specified c++ standard version number. Here we introduce `EIGEN_CPLUSPLUS` which will contain the c++ version number both for MSVC and otherwise. This simplifies checks for supported features. Also replaced most instances of standard version checking via `__cplusplus` with the existing `EIGEN_COMP_CXXVER` macro for better clarity. Fixes: #2170	2021-03-05 18:33:18 +00:00
Antonio Sanchez	82d61af3a4	Fix rint SSE/NEON again, using optimization barrier. This is a new version of !423, which failed for MSVC. Defined `EIGEN_OPTIMIZATION_BARRIER(X)` that uses inline assembly to prevent operations involving `X` from crossing that barrier. Should work on most `GNUC` compatible compilers (MSVC doesn't seem to need this). This is a modified version adapted from what was used in `psincos_float` and tested on more platforms (see #1674, https://godbolt.org/z/73ezTG). Modified `rint` to use the barrier to prevent the add/subtract rounding trick from being optimized away. Also fixed an edge case for large inputs that get bumped up a power of two and ends up rounding away more than just the fractional part. If we are over `2^digits` then just return the input. This edge case was missed in the test since the test was comparing approximate equality, which was still satisfied. Adding a strict equality option catches it.	2021-03-05 08:54:12 -08:00
David Tellenbach	5f0b4a4010	Revert "Adds EIGEN_CONSTEXPR and EIGEN_NOEXCEPT to rows(), cols(), innerStride(), outerStride(), and size()" This reverts commit `6cbb3038ac` because it breaks clang-10 builds on x86 and aarch64 when C++11 is enabled.	2021-03-05 13:16:43 +01:00
Steve Bronder	6cbb3038ac	Adds EIGEN_CONSTEXPR and EIGEN_NOEXCEPT to rows(), cols(), innerStride(), outerStride(), and size()	2021-03-04 18:58:08 +00:00
Antonio Sánchez	9a663973b4	Revert "Fix rint for SSE/NEON." This reverts commit `e72dfeb8b9`	2021-03-03 18:51:51 +00:00
Antonio Sanchez	e72dfeb8b9	Fix rint for SSE/NEON. It seems sometimes with aggressive optimizations the combination `psub(padd(a, b), b)` trick to force rounding is compiled away. Here we replace with inline assembly to prevent this (I tried `volatile`, but that leads to additional loads from memory). Also fixed an edge case for large inputs `a` where adding `b` bumps the value up a power of two and ends up rounding away more than just the fractional part. If we are over `2^digits` then just return the input. This edge case was missed in the test since the test was comparing approximate equality, which was still satisfied. Adding a strict equality option catches it.	2021-03-03 09:41:46 -08:00
Antonio Sanchez	1e0c7d4f49	Add print for SSE/NEON, use NEON rounding intrinsics if available. In SSE, by adding/subtracting 2^MantissaBits, we force rounding according to the current rounding mode. For NEON, we use the provided intrinsics for rint/floor/ceil if available (armv8). Related to #1969.	2021-02-27 22:42:07 +00:00
Antonio Sanchez	c65c2b31d4	Make half/bfloat16 constructor take inputs by value, fix powerpc test. Since `numeric_limits<half>::max_exponent` is a static inline constant, it cannot be directly passed by reference. This triggers a linker error in recent versions of `g++-powerpc64le`. Changing `half` to take inputs by value fixes this. Wrapping `max_exponent` with `int(...)` to make an addressable integer also fixes this and may help with other custom `Scalar` types down-the-road. Also eliminated some compile warnings for powerpc.	2021-02-27 21:32:06 +00:00
Christoph Hertzberg	39a590dfb6	Remove unused include	2021-02-27 19:02:33 +01:00
Christoph Hertzberg	8f686ac4ec	clang 10 aggressively warns about precision loss when converting int to float (or long to double) (cherry picked from commit cd541ad52c8152340469cae210312c0e27829c8d)	2021-02-27 18:44:26 +01:00
Christoph Hertzberg	a3521d743c	Fix some enum-enum conversion warnings (cherry picked from commit 838f3d8ce22a5549ef10c7386fb03040721749a0)	2021-02-27 18:44:26 +01:00
Christoph Hertzberg	ca528593f4	Fixed/masked more implicit copy constructor warnings (cherry picked from commit 2883e91ce5a99c391fbf28e20160176b70854992)	2021-02-27 18:44:26 +01:00
Christoph Hertzberg	4fb3459a23	Fix double-promotion warnings (cherry picked from commit c22c103e932e511e96645186831363585a44b7a3)	2021-02-27 18:44:26 +01:00
Antonio Sanchez	29ebd84cb7	Fix NEON sqrt for 32-bit, add prsqrt. With !406, we accidentally broke arm 32-bit NEON builds, since `vsqrt_f32` is only available for 64-bit. Here we add back the `rsqrt` implementation for 32-bit, relying on a `prsqrt` implementation with better handling of edge cases. Note that several of the 32-bit NEON packet tests are currently failing - either due to denormal handling (NEON versions flush to zero, but scalar paths don't) or due to accuracy (e.g. sin/cos).	2021-02-26 14:08:40 -08:00
Rasmus Munk Larsen	fe19714f80	Merge branch 'rmlarsen1/eigen-nan_prop'	2021-02-26 09:21:24 -08:00
Rasmus Munk Larsen	e67672024d	Merge branch 'nan_prop' of https://gitlab.com/rmlarsen1/eigen into nan_prop	2021-02-26 09:12:44 -08:00
Rasmus Munk Larsen	5e7d4c33d6	Add TODO.	2021-02-26 09:08:45 -08:00
Rasmus Munk Larsen	fb5b59641a	Defer default for minCoeff/maxCoeff to templated variant.	2021-02-26 09:07:00 -08:00
Antonio Sanchez	e19829c3b0	Fix floor/ceil for NEON fp16. Forgot to test this. Fixes bug introduced in !416.	2021-02-25 20:39:56 -08:00
Antonio Sanchez	5529db7524	Fix SSE/NEON pfloor/pceil for saturated values. The original will saturate if the input does not fit into an integer type. Here we fix this, returning the input if it doesn't have enough precision to have a fractional part. Also added `pceil` for NEON. Fixes #1969.	2021-02-25 14:39:26 -08:00
Rasmus Munk Larsen	51eba8c3e2	Fix indentation.	2021-02-25 18:21:21 +00:00
Rasmus Munk Larsen	5297b7162a	Make it possible to specify NaN propagation strategy for maxCoeff/minCoeff reductions.	2021-02-25 18:21:21 +00:00
Chip-Kerchner	6eebe97bab	Fix clang compile when no MMA flags are set. Simplify MMA compiler detection.	2021-02-24 20:43:23 -06:00
Rasmus Munk Larsen	4cb0592af7	Fix indentation.	2021-02-24 17:59:36 -08:00
Rasmus Munk Larsen	0065f9d322	Make it possible to specify NaN propagation strategy for maxCoeff/minCoeff reductions.	2021-02-25 01:54:36 +00:00
Rasmus Munk Larsen	113e61f364	Remove unused function scalar_cmp_with_cast.	2021-02-24 23:59:35 +00:00
Rasmus Munk Larsen	98ca58b02c	Cast anonymous enums to int when used in expressions.	2021-02-24 23:50:06 +00:00
Chip-Kerchner	c31ead8a15	Having forward template function declarations in a P10 file causes bad code in certain situations.	2021-02-24 23:43:30 +00:00
Antonio Sanchez	a31effc3bc	Add `invoke_result` and eliminate `result_of` warnings for C++17+. The `std::result_of` meta struct is deprecated in C++17 and removed in C++20. It was still slipping through due to a faulty definition of `EIGEN_HAS_STD_RESULT_OF`. Added a new macro `EIGEN_HAS_STD_INVOKE_RESULT` and `Eigen::internal::invoke_result` implementation with fallback for pre C++17. Replaces the `result_of` definition with one based on `std::invoke_result` for C++17 and higher. For completeness, added nullary op support for c++03. Fixes #1850.	2021-02-24 21:36:14 +00:00
Chip-Kerchner	8523d447a1	Fixes to support old and new versions of the compilers for built-ins. Cast to non-const when using vector_pair with certain built-ins.	2021-02-24 20:49:15 +00:00
Antonio Sanchez	5908aeeaba	Fix CUDA device new and delete, and add test. HIP does not support new/delete on device, so test is skipped.	2021-02-24 11:31:41 -08:00
Antonio Sanchez	6cf0ab5e99	Disable fast psqrt for NEON. Accuracy is too poor - requires at least two Newton iterations, but then it is no longer significantly faster than `vsqrt`. Fixes #2094.	2021-02-23 19:52:55 -08:00
Antonio Sanchez	aba3998278	Fix check if GPU compile phase for std::hash	2021-02-23 19:52:08 -08:00
Antonio Sanchez	db5691ff2b	Fix some CUDA warnings. Added `EIGEN_HAS_STD_HASH` macro, checking for C++11 support and not running on GPU. `std::hash<float>` is not a device function, so cannot be used by `std::hash<bfloat16>`. Removed `EIGEN_DEVICE_FUNC` and only define if `EIGEN_HAS_STD_HASH`. Same for `half`. Added `EIGEN_CUDA_HAS_FP16_ARITHMETIC` to improve readability, eliminate warnings about `EIGEN_CUDA_ARCH` not being defined. Replaced a couple C-style casts with `reinterpret_cast` for aligned loading of `half` to `half2`. This eliminates `-Wcast-align` warnings in clang. Although not ideal due to potential type aliasing, this is how CUDA handles these conversions internally.	2021-02-24 00:16:31 +00:00
Rasmus Munk Larsen	88d4c6d4c8	Accurate pow, part 2. This change adds specializations of log2 and exp2 for double that make pow<double> accurate the 1 ULP. Speed for AVX-512 is within 0.5% of the currect implementation.	2021-02-23 23:11:03 +00:00
Adam Shapiro	2ac0b78739	Fixed sparse conservativeResize() when both num cols and rows decreased. The previous implementation caused a buffer overflow trying to calculate non- zero counts for columns that no longer exist.	2021-02-23 21:32:39 +00:00
Chip-Kerchner	10c77b0ff4	Fix compilation errors with later versions of GCC and use of MMA.	2021-02-22 15:01:47 -06:00
Christoph Hertzberg	73922b0174	Fixes Bug #1925 . Packets should be passed by const reference, even to inline functions.	2021-02-20 18:56:42 +01:00
Christoph Hertzberg	a7749c09bc	Bug #1910 : Make SparseCholesky work for RowMajor matrices	2021-02-19 19:36:18 +01:00
Antonio Sánchez	128eebf05e	Revert "add EIGEN_DEVICE_FUNC to EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF macros (only if not HIPCC)." This reverts commit `12fd3dd655`	2021-02-19 17:09:16 +00:00
Rasmus Munk Larsen	7f09d3487d	Use the Cephes double subtraction trick in pexp<float> even when FMA is available. Otherwise the accuracy drops from 1 ulp to 3 ulp.	2021-02-18 20:49:18 +00:00
Masaki Murooka	12fd3dd655	add EIGEN_DEVICE_FUNC to EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF macros (only if not HIPCC).	2021-02-17 22:55:47 +00:00
David Tellenbach	aa8b22e776	Bump to 3.4.99	2021-02-17 23:23:17 +01:00
David Tellenbach	5336ad8591	Define internal::make_unsigned for [unsigned]long long on macOS. macOS defines int64_t as long long even for C++03 and therefore expects a template specialization internal::make_unsigned<long long>, for C++03. Since other platforms define int64_t as long for C++03 we cannot add the specialization for all cases.	2021-02-17 23:03:10 +01:00
Antonio Sanchez	0845df7f77	Fix uninitialized warning on AVX.	2021-02-17 13:13:39 -08:00
Chip Kerchner	9b51dc7972	Fixed performance issues for VSX and P10 MMA in general_matrix_matrix_product	2021-02-17 17:49:23 +00:00
Rasmus Munk Larsen	be0574e215	New accurate algorithm for pow(x,y). This version is accurate to 1.4 ulps for float, while still being 10x faster than std::pow for AVX512. A future change will introduce a specialization for double.	2021-02-17 02:50:32 +00:00
Antonio Sanchez	7ff0b7a980	Updated pfrexp implementation. The original implementation fails for 0, denormals, inf, and NaN. See #2150	2021-02-17 02:23:24 +00:00
Ashutosh Sharma	f702792a7c	missing method in packetmath.h void ptranspose(PacketBlock<Packet16uc, 4>& kernel)	2021-02-16 16:33:59 +00:00
Jan van Dijk	db61b8d478	Avoid -Wunused warnings in NDEBUG builds. In two places in SuperLUSupport.h, a local variable 'size' is created that is used only inside an eigen_assert. Remove these, just fetch the required values inside the assert statements. This avoids annoying -Wunused warnings (and -Werror=unused errors) in NDEBUG builds.	2021-02-12 18:35:35 +00:00
Antonio Sanchez	90ee821c56	Use vrsqrts for rsqrt Newton iterations. It's slightly faster and slightly more accurate, allowing our current packetmath tests to pass for sqrt with a single iteration.	2021-02-11 11:33:51 -08:00
Antonio Sanchez	9fde9cce5d	Adjust bounds for pexp_float/double The original clamping bounds on `_x` actually produce finite values: ``` exp(88.3762626647950) = 2.40614e+38 < 3.40282e+38 exp(709.437) = 1.27226e+308 < 1.79769e+308 ``` so with an accurate `ldexp` implementation, `pexp` fails for large inputs, producing finite values instead of `inf`. This adjusts the bounds slightly outside the finite range so that the output will overflow to +/- `inf` as expected.	2021-02-10 22:48:05 +00:00
Antonio Sanchez	4cb563a01e	Fix ldexp implementations. The previous implementations produced garbage values if the exponent did not fit within the exponent bits. See #2131 for a complete discussion, and !375 for other possible implementations. Here we implement the 4-factor version. See `pldexp_impl` in `GenericPacketMathFunctions.h` for a full description. The SSE `pcmp*` methods were moved down since `pcmp_le<Packet4i>` requires `por`. Left as a "TODO" is to delegate to a faster version if we know the exponent does fit within the exponent bits. Fixes #2131.	2021-02-10 22:45:41 +00:00
Ashutosh Sharma	7eb07da538	loop less ptranspose	2021-02-10 10:21:37 -08:00
David Tellenbach	36200b7855	Remove vim specific comments to recognoize correct file-type. As discussed in #2143 we remove editor specific comments.	2021-02-09 09:13:09 +01:00
David Tellenbach	54589635ad	Replace nullptr by NULL in SparseLU.h to be C++03 compliant.	2021-02-09 09:08:06 +01:00
Ralf Hannemann-Tamas	984d010b7b	add specialization of check_sparse_solving() for SuperLU solver, in order to test adjoint and transpose solves	2021-02-08 22:00:31 +00:00
Nikolaus Demmel	b578930657	Fix documentation typos in LDLT.h	2021-02-08 21:07:29 +00:00
Antonio Sanchez	66841ea070	Enable bdcsvd on host. Currently if compiled by NVCC, the `MatrixBase::bdcSvd()` implementation is skipped, leading to a linker error. This prevents it from running on the host as well. Seems it was disabled 6 years ago (`5384e891`) to match `jacobiSvd`, but `jacobiSvd` is now enabled on host. Tested and runs fine on host, but will not compile/run for device (though it's not labelled as a device function, so this should be fine). Fixes #2139	2021-02-08 12:56:23 -08:00
Rasmus Munk Larsen	6e3b795f81	Add more tests for pow and fix a corner case for huge exponent where the result is always zero or infinite unless x is one.	2021-02-05 16:58:49 -08:00
Antonio Sanchez	abcde69a79	Disable vectorized pow for half/bfloat16. We are potentially seeing some accuracy issues with these. Ideally we would hand off to `float`, but that's not trivial with the current setup. We may want to consider adding `ppow<Packet>` and `HasPow`, so implementations can more easily specialize this.	2021-02-05 12:17:34 -08:00
Antonio Sanchez	f85038b7f3	Fix excessive GEBP register spilling for 32-bit NEON. Clang does a poor job of optimizing the GEBP microkernel on 32-bit ARM, leading to excessive 16-byte register spills, slowing down basic f32 matrix multiplication by approx 50%. By specializing `gebp_traits`, we can eliminate the register spills. Volatile inline ASM both acts as a barrier to prevent reordering and enforces strict register use. In a simple f32 matrix multiply example, this modification reduces 16-byte spills from 109 instances to zero, leading to a 1.5x speed increase (search for `16-byte Spill` in the assembly in https://godbolt.org/z/chsPbE). This is a replacement of !379. See there for further discussion. Also moved `gebp_traits` specializations for NEON to `Eigen/src/Core/arch/NEON/GeneralBlockPanelKernel.h` to be alongside other NEON-specific code. Fixes #2138.	2021-02-03 09:01:48 -08:00
Antonio Sanchez	56c8b14d87	Eliminate implicit conversions from float to double.	2021-02-01 15:31:01 -08:00
Antonio Sanchez	fb4548e27b	Implement bit_* for device. Unfortunately `std::bit_and` and the like are host-only functions prior to c++14 (since they are not `constexpr`). They also never exist in the global namespace, so the current implementation always fails to compile via NVCC - since `EIGEN_USING_STD` tries to import the symbol from the global namespace on device. To overcome these limitations, we implement these functionals here.	2021-02-01 13:27:45 -08:00
Antonio Sanchez	1615a27993	Fix altivec packetmath. Allows the altivec packetmath tests to pass. There were a few issues: - `pstoreu` was missing MSQ on `_BIG_ENDIAN` systems - `cmp_*` didn't properly handle conversion of bool flags (0x7FC instead of 0xFFFF) - `pfrexp` needed to set the `exponent` argument. Related to !370, #2128 cc: @ChipKerchner @pdrocaldeira Tested on `_BIG_ENDIAN` running on QEMU with VSX. Couldn't figure out build flags to get it to work for little endian.	2021-01-28 18:37:09 +00:00
Chip Kerchner	1414e2212c	Fix clang compilation for AltiVec from previous check-in	2021-01-28 18:36:40 +00:00
David Tellenbach	170a504c2f	Add the following functions DenseBase::setConstant(NoChange_t, Index, const Scalar&) DenseBase::setConstant(Index, NoChange_t, const Scalar&) to close #663.	2021-01-28 15:13:07 +01:00
David Tellenbach	598e1b6e54	Add the following functions: DenseBase::setZero(NoChange_t, Index) DenseBase::setZero(Index, NoChange_t) DenseBase::setOnes(NoChange_t, Index) DenseBase::setOnes(Index, NoChange_t) DenseBase::setRandom(NoChange_t, Index) DenseBase::setRandom(Index, NoChange_t) This closes #663.	2021-01-28 01:10:36 +01:00
Gael Guennebaud	0668c68b03	Allow for negative strides. Note that using a stride of -1 is still not possible because it would clash with the definition of Eigen::Dynamic. This fixes #747.	2021-01-27 23:32:12 +01:00
Antonio Sanchez	3f4684f87d	Include `<cstdint>` in one place, remove custom typedefs Originating from [this SO issue](https://stackoverflow.com/questions/65901014/how-to-solve-this-all-error-2-in-this-case), some win32 compilers define `__int32` as a `long`, but MinGW defines `std::int32_t` as an `int`, leading to a type conflict. To avoid this, we remove the custom `typedef` definitions for win32. The Tensor module requires C++11 anyways, so we are guaranteed to have included `<cstdint>` already in `Eigen/Core`. Also re-arranged the headers to only include `<cstdint>` in one place to avoid this type of error again.	2021-01-26 14:23:05 -08:00
Chip Kerchner	0784d9f87b	Fix sqrt, ldexp and frexp compilation errors.	2021-01-25 15:22:19 -06:00
Florian Maurin	c35965b381	Remove unused variable in SparseLU.h	2021-01-22 22:24:11 +00:00
Antonio Sanchez	f0e46ed5d4	Fix pow and other cwise ops for half/bfloat16. The new `generic_pow` implementation was failing for half/bfloat16 since their construction from int/float is not `constexpr`. Modified in `GenericPacketMathFunctions` to remove `constexpr`. While adding tests for half/bfloat16, found other issues related to implicit conversions. Also needed to implement `numext::arg` for non-integer, non-complex, non-float/double/long double types. These seem to be implicitly converted to `std::complex<T>`, which then fails for half/bfloat16.	2021-01-22 11:10:54 -08:00
Antonio Sanchez	f19bcffee6	Specialize std::complex operators for use on GPU device. NVCC and older versions of clang do not fully support `std::complex` on device, leading to either compile errors (Cannot call `__host__` function) or worse, runtime errors (Illegal instruction). For most functions, we can implement specialized `numext` versions. Here we specialize the standard operators (with the exception of stream operators and member function operators with a scalar that are already specialized in `<complex>`) so they can be used in device code as well. To import these operators into the current scope, use `EIGEN_USING_STD_COMPLEX_OPERATORS`. By default, these are imported into the `Eigen`, `Eigen:internal`, and `Eigen::numext` namespaces. This allow us to remove specializations of the sum/difference/product/quotient ops, and allow us to treat complex numbers like most other scalars (e.g. in tests).	2021-01-22 18:19:19 +00:00
David Tellenbach	65e2169c45	Add support for Arm SVE This patch adds support for Arm's new vector extension SVE (Scalable Vector Extension). In contrast to other vector extensions that are supported by Eigen, SVE types are inherently sizeless. For the use in Eigen we fix their size at compile-time (note that this is not necessary in general, SVE is length agnostic). During compilation the flag `-msve-vector-bits=N` has to be set where `N` is a power of two in the range of `128`to `2048`, indicating the length of an SVE vector. Since SVE is rather young, we decided to disable it by default even if it would be available. A user has to enable it explicitly by defining `EIGEN_ARM64_USE_SVE`. This patch introduces the packet types `PacketXf` and `PacketXi` for packets of `float` and `int32_t` respectively. The size of these packets depends on the SVE vector length. E.g. if `-msve-vector-bits=512` is set, `PacketXf` will contain `512/32 = 16` elements. This MR is joint work with Miguel Tairum <miguel.tairum@arm.com>.	2021-01-21 21:11:57 +00:00
Antonio Sanchez	b2126fd6b5	Fix pfrexp/pldexp for half. The recent addition of vectorized pow (!330) relies on `pfrexp` and `pldexp`. This was missing for `Eigen::half` and `Eigen::bfloat16`. Adding tests for these packet ops also exposed an issue with handling negative values in `pfrexp`, returning an incorrect exponent. Added the missing implementations, corrected the exponent in `pfrexp1`, and added `packetmath` tests.	2021-01-21 19:32:28 +00:00
Antonio Sanchez	d5b7981119	Fix signed-unsigned comparison. Hex literals are interpreted as unsigned, leading to a comparison between signed max supported function `abcd[0]` (which was negative) to the unsigned literal `0x80000006`. Should not change result since signed is implicitly converted to unsigned for the comparison, but eliminates the warning.	2021-01-20 08:34:00 -08:00
Ivan Popivanov	e409795d6b	Proper CPUID	2021-01-18 17:10:11 +00:00
Rasmus Munk Larsen	cdd8fdc32e	Vectorize `pow(x, y)`. This closes https://gitlab.com/libeigen/eigen/-/issues/2085 , which also contains a description of the algorithm. I ran some testing (comparing to `std::pow(double(x), double(y)))` for `x` in the set of all (positive) floats in the interval `[std::sqrt(std::numeric_limits<float>::min()), std::sqrt(std::numeric_limits<float>::max())]`, and `y` in `{2, sqrt(2), -sqrt(2)}` I get the following error statistics: ``` max_rel_error = 8.34405e-07 rms_rel_error = 2.76654e-07 ``` If I widen the range to all normal float I see lower accuracy for arguments where the result is subnormal, e.g. for `y = sqrt(2)`: ``` max_rel_error = 0.666667 rms = 6.8727e-05 count = 1335165689 argmax = 2.56049e-32, 2.10195e-45 != 1.4013e-45 ``` which seems reasonable, since these results are subnormals with only couple of significant bits left.	2021-01-18 13:25:16 +00:00
Antonio Sanchez	bde6741641	Improved std::complex sqrt and rsqrt. Replaces `std::sqrt` with `complex_sqrt` for all platforms (previously `complex_sqrt` was only used for CUDA and MSVC), and implements custom `complex_rsqrt`. Also introduces `numext::rsqrt` to simplify implementation, and modified `numext::hypot` to adhere to IEEE IEC 6059 for special cases. The `complex_sqrt` and `complex_rsqrt` implementations were found to be significantly faster than `std::sqrt<std::complex<T>>` and `1/numext::sqrt<std::complex<T>>`. Benchmark file attached. ``` GCC 10, Intel Xeon, x86_64: --------------------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------------------- BM_Sqrt<std::complex<float>> 9.21 ns 9.21 ns 73225448 BM_StdSqrt<std::complex<float>> 17.1 ns 17.1 ns 40966545 BM_Sqrt<std::complex<double>> 8.53 ns 8.53 ns 81111062 BM_StdSqrt<std::complex<double>> 21.5 ns 21.5 ns 32757248 BM_Rsqrt<std::complex<float>> 10.3 ns 10.3 ns 68047474 BM_DivSqrt<std::complex<float>> 16.3 ns 16.3 ns 42770127 BM_Rsqrt<std::complex<double>> 11.3 ns 11.3 ns 61322028 BM_DivSqrt<std::complex<double>> 16.5 ns 16.5 ns 42200711 Clang 11, Intel Xeon, x86_64: --------------------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------------------- BM_Sqrt<std::complex<float>> 7.46 ns 7.45 ns 90742042 BM_StdSqrt<std::complex<float>> 16.6 ns 16.6 ns 42369878 BM_Sqrt<std::complex<double>> 8.49 ns 8.49 ns 81629030 BM_StdSqrt<std::complex<double>> 21.8 ns 21.7 ns 31809588 BM_Rsqrt<std::complex<float>> 8.39 ns 8.39 ns 82933666 BM_DivSqrt<std::complex<float>> 14.4 ns 14.4 ns 48638676 BM_Rsqrt<std::complex<double>> 9.83 ns 9.82 ns 70068956 BM_DivSqrt<std::complex<double>> 15.7 ns 15.7 ns 44487798 Clang 9, Pixel 2, aarch64: --------------------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------------------- BM_Sqrt<std::complex<float>> 24.2 ns 24.1 ns 28616031 BM_StdSqrt<std::complex<float>> 104 ns 103 ns 6826926 BM_Sqrt<std::complex<double>> 31.8 ns 31.8 ns 22157591 BM_StdSqrt<std::complex<double>> 128 ns 128 ns 5437375 BM_Rsqrt<std::complex<float>> 31.9 ns 31.8 ns 22384383 BM_DivSqrt<std::complex<float>> 99.2 ns 98.9 ns 7250438 BM_Rsqrt<std::complex<double>> 46.0 ns 45.8 ns 15338689 BM_DivSqrt<std::complex<double>> 119 ns 119 ns 5898944 ```	2021-01-17 08:50:57 -08:00
Guoqiang QI	38ae5353ab	1)provide a better generic paddsub op implementation 2)make paddsub op support the Packet2cf/Packet4f/Packet2f in NEON 3)make paddsub op support the Packet2cf/Packet4f in SSE	2021-01-13 22:54:03 +00:00
Antonio Sanchez	352f1422d3	Remove `inf` local variable. Apparently `inf` is a macro on iOS for `std::numeric_limits<T>::infinity()`, causing a compile error here. We don't need the local anyways since it's only used in one spot.	2021-01-12 10:33:15 -08:00
Antonio Sanchez	2044084979	Remove TODO from Transform::computeScaleRotation() Upon investigation, `JacobiSVD` is significantly faster than `BDCSVD` for small matrices (twice as fast for 2x2, 20% faster for 3x3, 1% faster for 10x10). Since the majority of cases will be small, let's stick with `JacobiSVD`. See !361.	2021-01-11 11:30:01 -08:00
Antonio Sanchez	3daf92c7a5	Transform::computeScalingRotation flush determinant to +/- 1. In the previous code, in attempting to correct for a negative determinant, we end up multiplying and dividing by a number that is often very near, but not exactly +/-1. By flushing to +/-1, we can replace a division with a multiplication, and results are more numerically consistent.	2021-01-11 10:13:38 -08:00
Antonio Sanchez	587fd6ab70	Only specialize complex `sqrt_impl` for CUDA if not MSVC. We already specialize `sqrt_impl` on windows due to MSVC's mishandling of `inf` (!355).	2021-01-11 09:15:45 -08:00
Deven Desai	2a6addb4f9	Fix for breakage in ROCm support - 210108 The following commit breaks ROCm support for Eigen `f149e0ebc3` All unit tests fail with the following error ``` Building HIPCC object test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o In file included from /home/rocm-user/eigen/test/gpu_basic.cu:19: In file included from /home/rocm-user/eigen/test/main.h:356: In file included from /home/rocm-user/eigen/Eigen/QR:11: In file included from /home/rocm-user/eigen/Eigen/Core:166: /home/rocm-user/eigen/Eigen/src/Core/MathFunctionsImpl.h:105:35: error: __host__ __device__ function 'complex_sqrt' cannot overload __host__ function 'complex_sqrt' EIGEN_DEVICE_FUNC std::complex<T> complex_sqrt(const std::complex<T>& z) { ^ /home/rocm-user/eigen/Eigen/src/Core/MathFunctions.h:342:38: note: previous declaration is here template<typename T> std::complex<T> complex_sqrt(const std::complex<T>& a_x); ^ 1 error generated when compiling for gfx900. CMake Error at gpu_basic_generated_gpu_basic.cu.o.cmake:192 (message): Error generating file /home/rocm-user/eigen/build/test/CMakeFiles/gpu_basic.dir//./gpu_basic_generated_gpu_basic.cu.o test/CMakeFiles/gpu_basic.dir/build.make:63: recipe for target 'test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o' failed make[3]: * [test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o] Error 1 CMakeFiles/Makefile2:16618: recipe for target 'test/CMakeFiles/gpu_basic.dir/all' failed make[2]: * [test/CMakeFiles/gpu_basic.dir/all] Error 2 CMakeFiles/Makefile2:16625: recipe for target 'test/CMakeFiles/gpu_basic.dir/rule' failed make[1]: * [test/CMakeFiles/gpu_basic.dir/rule] Error 2 Makefile:5401: recipe for target 'gpu_basic' failed make: * [gpu_basic] Error 2 ``` The error message is accurate, and the fix (provided in thsi commit) is trivial.	2021-01-08 18:04:40 +00:00
Antonio Sanchez	f149e0ebc3	Fix MSVC complex sqrt and packetmath test. MSVC incorrectly handles `inf` cases for `std::sqrt<std::complex<T>>`. Here we replace it with a custom version (currently used on GPU). Also fixed the `packetmath` test, which previously skipped several corner cases since `CHECK_CWISE1` only tests the first `PacketSize` elements.	2021-01-08 01:17:19 +00:00
Essex Edwards	e741b43668	Make Transform::computeRotationScaling(0,&S) continuous	2021-01-07 17:45:14 +00:00
David Tellenbach	0bdc0dba20	Add missing #endif directive in Macros.h	2021-01-07 12:32:41 +01:00
shrek1402	cb654b1c45	#define was defined incorrectly because the result_of function was deprecated in c++17 and removed in c++20. Also, EIGEN_COMP_MSVC (which is _MSC_VER) only affects result_of indirectly, which can cause errors.	2021-01-07 10:12:25 +00:00
Antonio Sanchez	52d1dd979a	Fix Ref initialization. Since `eigen_assert` is a macro, the statements can become noops (e.g. when compiling for GPU), so they may not execute the contained logic -- which in this case is the entire `Ref` construction. We need to separate the assert from statements which have consequences. Fixes #2113	2021-01-06 13:14:20 -08:00
Antonio Sanchez	166fcdecdb	Allow CwiseUnaryView to be used on device. Added `EIGEN_DEVICE_FUNC` to methods.	2021-01-06 09:16:52 -08:00
Antonio Sanchez	bb1de9dbde	Fix Ref Stride checks. The existing `Ref` class failed to consider cases where the Ref's `Stride` setting could match the underlying referred object's stride, but didn't at runtime. This led to trying to set invalid stride values, causing runtime failures in some cases, and garbage due to mismatched strides in others. Here we add the missing runtime checks. This involves computing the strides necessary to align with the referred object's storage, and verifying we can actually set those strides at runtime. In the `const` case, if it may be possible to refer to the original storage at compile-time but fails at runtime, then we defer to the `construct(...)` method that makes a copy. Added more tests to check these cases. Fixes #2093.	2021-01-05 10:41:25 -08:00
Christoph Hertzberg	12dda34b15	Eliminate boolean product warnings by factoring out a `combine_scalar_factors` helper function.	2021-01-05 18:15:30 +00:00
Antonio Sanchez	070d303d56	Add CUDA complex sqrt. This is to support scalar `sqrt` of complex numbers `std::complex<T>` on device, requested by Tensorflow folks. Technically `std::complex` is not supported by NVCC on device (though it is by clang), so the default `sqrt(std::complex<T>)` function only works on the host. Here we create an overload to add back the functionality. Also modified the CMake file to add `--relaxed-constexpr` (or equivalent) flag for NVCC to allow calling constexpr functions from device functions, and added support for specifying compute architecture for NVCC (was already available for clang).	2020-12-22 23:25:23 -08:00
rgreenblatt	fdf2ee62c5	Fix missing EIGEN_DEVICE_FUNC	2020-12-20 23:22:53 -05:00
Rasmus Munk Larsen	05754100fe	* Add iterative psqrt<double> for AVX and SSE when FMA is available. This provides a ~10% speedup. * Write iterative sqrt explicitly in terms of pmadd. This gives up to 7% speedup for psqrt<float> with AVX & SSE with FMA. * Remove iterative psqrt<double> for NEON, because the initial rsqrt apprimation is not accurate enough for convergence in 2 Newton-Raphson steps and with 3 steps, just calling the builtin sqrt insn is faster. The following benchmarks were compiled with clang "-O2 -fast-math -mfma" and with and without -mavx. AVX+FMA (float) name old cpu/op new cpu/op delta BM_eigen_sqrt_float/1 1.08ns ± 0% 1.09ns ± 1% ~ BM_eigen_sqrt_float/8 2.07ns ± 0% 2.08ns ± 1% ~ BM_eigen_sqrt_float/64 12.4ns ± 0% 12.4ns ± 1% ~ BM_eigen_sqrt_float/512 95.7ns ± 0% 95.5ns ± 0% ~ BM_eigen_sqrt_float/4k 776ns ± 0% 763ns ± 0% -1.67% BM_eigen_sqrt_float/32k 6.57µs ± 1% 6.13µs ± 0% -6.69% BM_eigen_sqrt_float/256k 83.7µs ± 3% 83.3µs ± 2% ~ BM_eigen_sqrt_float/1M 335µs ± 2% 332µs ± 2% ~ SSE+FMA (float) name old cpu/op new cpu/op delta BM_eigen_sqrt_float/1 1.08ns ± 0% 1.09ns ± 0% ~ BM_eigen_sqrt_float/8 2.07ns ± 0% 2.06ns ± 0% ~ BM_eigen_sqrt_float/64 12.4ns ± 0% 12.4ns ± 1% ~ BM_eigen_sqrt_float/512 95.7ns ± 0% 96.3ns ± 4% ~ BM_eigen_sqrt_float/4k 774ns ± 0% 763ns ± 0% -1.50% BM_eigen_sqrt_float/32k 6.58µs ± 2% 6.11µs ± 0% -7.06% BM_eigen_sqrt_float/256k 82.7µs ± 1% 82.6µs ± 1% ~ BM_eigen_sqrt_float/1M 330µs ± 1% 329µs ± 2% ~ SSE+FMA (double) BM_eigen_sqrt_double/1 1.63ns ± 0% 1.63ns ± 0% ~ BM_eigen_sqrt_double/8 6.51ns ± 0% 6.08ns ± 0% -6.68% BM_eigen_sqrt_double/64 52.1ns ± 0% 46.5ns ± 1% -10.65% BM_eigen_sqrt_double/512 417ns ± 0% 374ns ± 1% -10.29% BM_eigen_sqrt_double/4k 3.33µs ± 0% 2.97µs ± 1% -11.00% BM_eigen_sqrt_double/32k 26.7µs ± 0% 23.7µs ± 0% -11.07% BM_eigen_sqrt_double/256k 213µs ± 0% 206µs ± 1% -3.31% BM_eigen_sqrt_double/1M 862µs ± 0% 870µs ± 2% +0.96% AVX+FMA (double) name old cpu/op new cpu/op delta BM_eigen_sqrt_double/1 1.63ns ± 0% 1.63ns ± 0% ~ BM_eigen_sqrt_double/8 6.51ns ± 0% 6.06ns ± 0% -6.95% BM_eigen_sqrt_double/64 52.1ns ± 0% 46.5ns ± 1% -10.80% BM_eigen_sqrt_double/512 417ns ± 0% 373ns ± 1% -10.59% BM_eigen_sqrt_double/4k 3.33µs ± 0% 2.97µs ± 1% -10.79% BM_eigen_sqrt_double/32k 26.7µs ± 0% 23.8µs ± 0% -10.94% BM_eigen_sqrt_double/256k 214µs ± 0% 208µs ± 2% -2.76% BM_eigen_sqrt_double/1M 866µs ± 3% 923µs ± 7% ~	2020-12-16 18:16:11 +00:00
Rasmus Munk Larsen	6cee8d347e	Add an additional step of Newton-Raphson for `psqrt<double>` on Arm, which otherwise has an error of ~1000 ulps.	2020-12-15 04:06:41 +00:00
David Tellenbach	751f18f2c0	Remove comma at the end of enumeration list to silence C++03 warnings	2020-12-13 18:11:02 +01:00
Antonio Sanchez	5dc2fbabee	Fix implicit cast to double. Triggers `-Wimplicit-float-conversion`, causing a bunch of build errors in Google due to `-Wall`.	2020-12-12 09:26:20 -08:00
Antonio Sanchez	55967f87d1	Fix NEON pmax<PropagateNumbers,Packet4bf>. Simple typo, the max impl called pmin instead of pmax for floats.	2020-12-11 21:50:52 -08:00
Antonio Sanchez	839aa505c3	Fix typo in AVX512 packet math.	2020-12-11 21:35:44 -08:00
David Tellenbach	536c8a79f2	Remove unused macro in Half.h	2020-12-12 00:53:26 +01:00
Antonio Sanchez	8c9976d7f0	Fix more SSE/AVX packet conversions for peven. MSVC doesn't like function-style casts and forces us to use intrinsics.	2020-12-11 15:46:42 -08:00
Antonio Sanchez	c6efc4e0ba	Replace M_LOG2E and M_LN2 with custom macros. For these to exist we would need to define `_USE_MATH_DEFINES` before `cmath` or `math.h` is first included. However, we don't control the include order for projects outside Eigen, so even defining the macro in `Eigen/Core` does not fix the issue for projects that end up including `<cmath>` before Eigen does (explicitly or transitively). To fix this, we define `EIGEN_LOG2E` and `EIGEN_LN2` ourselves.	2020-12-11 14:34:31 -08:00
Antonio Sanchez	e82722a4a7	Fix MSVC SSE casts. MSVC doesn't like __m128(__m128i) c-style casts, so packets need to be converted using intrinsic methods.	2020-12-11 08:52:59 -08:00
Deven Desai	f3d2ea48f5	Fix for broken ROCm/HIP Support The following commit introduced a breakage in ROCm/HIP support for Eigen. `5ec4907434 (1958e65719641efe5483abc4ce0b61806270f6f3_525_517)` ``` Building HIPCC object test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o In file included from /home/rocm-user/eigen/test/gpu_basic.cu:20: In file included from /home/rocm-user/eigen/test/main.h:356: In file included from /home/rocm-user/eigen/Eigen/QR:11: In file included from /home/rocm-user/eigen/Eigen/Core:222: /home/rocm-user/eigen/Eigen/src/Core/arch/GPU/PacketMath.h:556:10: error: use of undeclared identifier 'half2half2'; did you mean '__half2half2'? return half2half2(from); ^~~~~~~~~~ __half2half2 /opt/rocm/hip/include/hip/hcc_detail/hip_fp16.h:547:21: note: '__half2half2' declared here __half2 __half2half2(__half x) ^ 1 error generated when compiling for gfx900. ``` The cause seems to be a copy-paster error, and the fix is trivial	2020-12-11 16:14:57 +00:00

... 2 3 4 5 6 ...

6721 Commits