eigen

CFD/eigen

Author	SHA1	Message	Date
Antonio Sanchez	45e67a6fda	Use reinterpret_cast on GPU for bit_cast. This seems to be the recommended approach for doing type punning in CUDA. See for example - https://stackoverflow.com/questions/47037104/cuda-type-punning-memcpy-vs-ub-union - https://developer.nvidia.com/blog/faster-parallel-reductions-kepler/ (the latter puns a double to an `int2`). The issue is that for CUDA, the `memcpy` is not elided, and ends up being an expensive operation. We already have similar `reintepret_cast`s across the Eigen codebase for GPU (as does TensorFlow).	2021-10-20 21:34:40 +00:00
Antonio Sanchez	95bb645e92	Fix MSVC+NVCC EIGEN_INHERIT_ASSIGNMENT_EQUAL_OPERATOR compilation. Looks like we need to update the `EIGEN_INHERIT_ASSIGNMENT_EQUAL_OPERATOR` for newer versions of MSVC as well when compiling with NVCC. Fixes build issues for VS 2017.	2021-10-20 19:38:14 +00:00
Antonio Sanchez	fd5f48e465	Fix tuple compilation for VS2017. VS2017 doesn't like deducing alias types, leading to a bunch of compile errors for functions involving the `tuple` alias. Replacing with `TupleImpl` seems to solve this, allowing the test to compile/pass.	2021-10-20 19:18:34 +00:00
Antonio Sanchez	d0d34524a1	Move CUDA/Complex.h to GPU/Complex.h, remove TensorReductionCuda.h The `Complex.h` file applies equally to HIP/CUDA, so placing under the generic `GPU` folder. The `TensorReductionCuda.h` has already been deprecated, now removing for the next Eigen version.	2021-10-20 12:00:19 -07:00
Rasmus Munk Larsen	f2c9c2d2f7	Vectorize Visitor.h.	2021-10-20 16:58:01 +00:00
Antonio Sanchez	f0f1d7938b	Disable testing of complex compound assignment operators for MSVC. MSVC does not support specializing compound assignments for `std::complex`, since it already specializes them (contrary to the standard). Trying to use one of these on device will currently lead to a duplicate definition error. This is still probably preferable to no error though. If we remove the definitions for MSVC, then it will compile, but the kernel will fail silently. The only proper solution would be to define our own custom `Complex` type.	2021-09-27 15:15:11 -07:00
Antonio Sanchez	21640612be	Disable more CUDA warnings. For cuda 9.2 and 11.4, they changed the numbers again. Fixes #2331.	2021-09-24 21:31:14 -07:00
Antonio Sanchez	e9e90892fe	Disable another device warning	2021-09-23 13:43:18 -07:00
Antonio Sanchez	86c0decc48	Disable more NVCC warnings. The 2979 warning is yet another "calling a __host__ function from a __host__ device__ function. Although we probably should eventually address these, they are flooding the logs. Most of these are harmless since we only call the original from the host. In cases where these are actually called from device, an error is generated instead anyways. The 2977 warning is a bit strange - although the warning suggests the `__device__` annotation is ignored, this doesn't actually seem to be the case. Without the `__device__` declarations, the kernel actually fails to run when attempting to construct such objects. Again, these warnings are flooding the logs, so disabling for now.	2021-09-23 10:52:39 -07:00
Kolja Brix	afa616bc9e	Fix some typos found	2021-09-23 15:22:00 +00:00
sciencewhiz	4b6036e276	fix various typos	2021-09-22 16:15:06 +00:00
Alexander Grund	b5eaa42695	Fix alias violation in BFloat16 reinterpret_cast between unrelated types is undefined behavior and leads to misoptimizations on some platforms. Use the safer (and faster) version via bit_cast	2021-09-20 10:37:50 +02:00
Antonio Sanchez	3c724c44cf	Fix strict aliasing bug causing product_small failure. Packet loading is skipped due to aliasing violation, leading to nullopt matrix multiplication. Fixes #2327.	2021-09-17 21:09:34 +00:00
Antonio Sanchez	5dac0b53c9	Move Eigen::all,last,lastp1,lastN to Eigen::placeholders::. These names are so common, IMO they should not exist directly in the `Eigen::` namespace. This prevents us from using the `last` or `all` names for any parameters or local variables, otherwise spitting out warnings about shadowing or hiding the global values. Many external projects (and our own examples) also heavily use ``` using namespace Eigen; ``` which means these conflict with external libraries as well, e.g. `std::fill(first,last,value)`. It seems originally these were placed in a separate namespace `Eigen::placeholders`, which has since been deprecated. I propose to un-deprecate this, and restore the original locations. These symbols are also imported into `Eigen::indexing`, which additionally imports `fix` and `seq`. An alternative is to remove the `placeholders` namespace and stick with `indexing`. NOTE: this is an API-breaking change. Fixes #2321.	2021-09-17 10:21:42 -07:00
Rasmus Munk Larsen	6cadab6896	Clean up EIGEN_STATIC_ASSERT to only use standard c++11 static_assert.	2021-09-16 20:43:54 +00:00
Rasmus Munk Larsen	7b975acb1f	Remove unused variable.	2021-09-16 20:27:13 +00:00
Rasmus Munk Larsen	92849d814b	Remove unused variable.	2021-09-16 20:21:31 +00:00
Rasmus Munk Larsen	da027fa20a	Remove unused variable.	2021-09-16 20:02:42 +00:00
Antonio Sanchez	cb50730993	Default eigen_packet_wrapper constructor. This makes it trivial, allowing use of `memcpy`. Fixes #2326	2021-09-14 10:57:22 -07:00
Rasmus Munk Larsen	d7d0bf832d	Issue an error in case of direct inclusion of internal headers.	2021-09-10 19:12:26 +00:00
Antonio Sanchez	26e5beb8cb	Device-compatible Tuple implementation. An analogue of `std::tuple` that works on device. Context: I've tried `std::tuple` in various versions of NVCC and clang, and although code seems to compile, it often fails to run - generating "illegal memory access" errors, or "illegal instruction" errors. This replacement does work on device.	2021-09-08 13:34:19 -07:00
Antonio Sanchez	fcd73b4884	Add a simple serialization mechanism. The `Serializer<T>` class implements a binary serialization that can write to (`serialize`) and read from (`deserialize`) a byte buffer. Also added convenience routines for serializing a list of arguments. This will mainly be for testing, specifically to transfer data to and from the GPU.	2021-09-08 09:38:59 -07:00
Antonio Sanchez	7792b1e909	Fix AVX2 PacketMath.h. There were a couple typos ps -> epi32, and an unaligned load issue.	2021-09-03 19:47:57 +00:00
Antonio Sanchez	5bf35383e0	Disable MSVC constant condition warning. We use extensive use of `if (CONSTANT)`, and cannot use c++17's `if constexpr`.	2021-09-03 11:07:18 -07:00
Antonio Sanchez	def145547f	Add missing packet types in pset1 call. Oops, introduced this when "fixing" integer packets.	2021-09-02 16:21:07 -07:00
Antonio Sanchez	3b48a3b964	Remove stray DynamicSparseMatrix references. DynamicSparseMatrix has been removed. These shouldn't be here anymore.	2021-09-02 19:47:26 +00:00
Antonio Sanchez	ebd4b17d2f	Fix tridiagonalization_inplace_selector. The `Options` of the new `hCoeffs` vector do not necessarily match those of the `MatrixType`, leading to build errors. Having the `CoeffVectorType` be a template parameter relieves this restriction.	2021-09-02 12:23:27 -07:00
Antonio Sanchez	998bab4b04	Missing EIGEN_DEVICE_FUNCs to get `gpu_basic` passing with CUDA 9. CUDA 9 seems to require labelling defaulted constructors as `EIGEN_DEVICE_FUNC`, despite giving warnings that such labels are ignored. Without these labels, the `gpu_basic` test fails to compile, with errors about calling `__host__` functions from `__host__ __device__` functions.	2021-09-01 19:49:53 -07:00
Antonio Sanchez	3d4ba855e0	Fix AVX integer packet issues. Most are instances of AVX2 functions not protected by `EIGEN_VECTORIZE_AVX2`. There was also a missing semi-colon for AVX512.	2021-09-01 14:14:43 -07:00
Antonio Sanchez	3a6296d4f1	Fix EIGEN_OPTIMIZATION_BARRIER for arm-clang. Clang doesn't like !621, needs the "g" constraint back. The "g" constraint also works for GCC >= 5. This fixes our gitlab CI.	2021-09-01 09:19:55 -07:00
Antonio Sanchez	ff07a8a639	GCC 4.8 arm EIGEN_OPTIMIZATION_BARRIER fix (#2315 ). GCC 4.8 doesn't seem to like the `g` register constraint, failing to compile with "error: 'asm' operand requires impossible reload". Tested `r` instead, and that seems to work, even with latest compilers. Also fixed some minor macro issues to eliminate warnings on armv7. Fixes #2315.	2021-08-31 20:20:47 +00:00
Antonio Sanchez	cc3573ab44	Disable cuda Eigen::half vectorization on host. All cuda `__half` functions are device-only in CUDA 9, including conversions. Host-side conversions were added in CUDA 10. The existing code doesn't build prior to 10.0. All arithmetic functions are always device-only, so there's therefore no reason to use vectorization on the host at all. Modified the code to disable vectorization for `__half` on host, which required also updating the `TensorReductionGpu` implementation which previously made assumptions about available packets.	2021-08-31 19:13:12 +00:00
Adam Kallai	1415817d8d	win: include intrin header in Windows on ARM intrin header is needed for _BitScanReverse and _BitScanReverse64	2021-08-31 10:57:34 +02:00
Antonio Sanchez	5db9e5c779	Fix fix<N> when variable templates are not supported. There were some typos that checked `EIGEN_HAS_CXX14` that should have checked `EIGEN_HAS_CXX14_VARIABLE_TEMPLATES`, causing a mismatch in some of the `Eigen::fix<N>` assumptions. Also fixed the `symbolic_index` test when `EIGEN_HAS_CXX14_VARIABLE_TEMPLATES` is 0. Fixes #2308	2021-08-30 08:06:55 -07:00
Jakub Lichman	dc5b1f7d75	AVX512 and AVX2 support for Packet16i and Packet8i added	2021-08-25 19:38:23 +00:00
Han-Kuan Chen	ab28419298	optimize predux if architecture is aarch64	2021-08-25 19:18:54 +00:00
Rasmus Munk Larsen	82dd3710da	Update version of master branch to 3.4.90.	2021-08-18 13:46:05 -07:00
Antonio Sanchez	2b410ecbef	Workaround VS 2017 arg bug. In VS 2017, `std::arg` for real inputs always returns 0, even for negative inputs. It should return `PI` for negative real values. This seems to be fixed in VS 2019 (MSVC 1920).	2021-08-18 18:39:18 +00:00
Jakob Struye	53a29c7e35	Clearer doc for squaredNorm	2021-08-18 15:11:15 +00:00
Antonio Sanchez	fc9d352432	Renamed shift_left/shift_right to shiftLeft/shiftRight. For naming consistency. Also moved to ArrayCwiseUnaryOps, and added test.	2021-08-17 20:04:48 -07:00
Antonio Sanchez	2cc6ee0d2e	Add missing PPC packet comparisons. This is to fix the packetmath tests on the ppc pipeline.	2021-08-17 07:42:04 -07:00
Chip-Kerchner	8dcf3e38ba	Fix unaligned loads in ploadLhs & ploadRhs for P8.	2021-08-16 20:28:22 -05:00
andiwand	5c6b3efead	minor doc fix in Map.h	2021-08-16 12:02:33 +00:00
Chip-Kerchner	e07227c411	Reverse compare logic in F32ToBf16 since vec_cmpne is not available in Power8 - now compiles for clang10 default (P8).	2021-08-13 11:21:28 -05:00
Chip Kerchner	66499f0f17	Get rid of used uninitialized warnings for EIGEN_UNUSED_VARIABLE in gcc11+	2021-08-12 21:38:54 +00:00
Rasmus Munk Larsen	8ce341caf2	* revise the meta_least_common_multiple function template, add a bool variable to check whether the A is larger than B. * This can make less compile_time if A is smaller than B. and avoid failure in compile if we get a little A and a great B. Authored by @awoniu.	2021-08-11 18:10:01 +00:00
ChipKerchner	413bc491f1	Fix errors on older compilers (gcc 7.5 - lack of vec_neg, clang10 - can not use const pointers with vec_xl).	2021-08-10 15:03:18 -05:00
Gauri Deshpande	e6a5a594a7	remove denormal flushing in fp32tobf16 for avx & avx512	2021-08-09 22:15:21 +00:00
Rasmus Munk Larsen	a5a7faeb45	Avoid memory allocation in tridiagonalization_inplace_selector::run.	2021-08-06 20:48:10 +00:00
Alexander Karatarakis	4ba872bd75	Avoid leading underscore followed by cap in template identifiers	2021-08-04 22:41:52 +00:00
Antonio Sanchez	5ad8b9bfe2	Make inverse 3x3 faster and avoid gcc bug. There seems to be a gcc 4.7 bug that incorrectly flags the current 3x3 inverse as using uninitialized memory. I'm pretty sure it's a false positive, but it's hard to trigger. The same warning does not trigger with clang or later compiler versions. In trying to find a work-around, this implementation turns out to be faster anyways for static-sized matrices. ``` name old cpu/op new cpu/op delta BM_Inverse3x3<DynamicMatrix3T<float>> 423ns ± 2% 433ns ± 3% +2.32% (p=0.000 n=98+96) BM_Inverse3x3<DynamicMatrix3T<double>> 425ns ± 2% 427ns ± 3% +0.48% (p=0.003 n=99+96) BM_Inverse3x3<StaticMatrix3T<float>> 7.10ns ± 2% 0.80ns ± 1% -88.67% (p=0.000 n=114+112) BM_Inverse3x3<StaticMatrix3T<double>> 7.45ns ± 2% 1.34ns ± 1% -82.01% (p=0.000 n=105+111) BM_AliasedInverse3x3<DynamicMatrix3T<float>> 409ns ± 3% 419ns ± 3% +2.40% (p=0.000 n=100+98) BM_AliasedInverse3x3<DynamicMatrix3T<double>> 414ns ± 3% 413ns ± 2% ~ (p=0.322 n=98+98) BM_AliasedInverse3x3<StaticMatrix3T<float>> 7.57ns ± 1% 0.80ns ± 1% -89.37% (p=0.000 n=111+114) BM_AliasedInverse3x3<StaticMatrix3T<double>> 9.09ns ± 1% 2.58ns ±41% -71.60% (p=0.000 n=113+116) ```	2021-08-04 21:18:44 +00:00
Antonio Sanchez	3d98a6ef5c	Modify scalar pzero, ptrue, pselect, and p<binary> operations to avoid memset. The `memset` function and bitwise manipulation only apply to POD types that do not require initialization, otherwise resulting in UB. We currently violate this in `ptrue` and `pzero`, we assume bitmasks for `pselect`, and bitwise operations are applied byte-by-byte in the generic implementations. This is causing issues for scalar types that do require initialization or that contain non-POD info such as pointers (#2201). We either break them, or force specializations of these functions for custom scalars, even if they are not vectorized. Here we modify these functions for scalars only - instead using only scalar operations: - `pzero`: `Scalar(0)` for all scalars. - `ptrue`: `Scalar(1)` for non-trivial scalars, bitset to one bits for trivial scalars. - `pselect`: ternary select comparing mask to `Scalar(0)` for all scalars - `pand`, `por`, `pxor`, `pnot`: use operators `&`, `\|`, `^`, `~` for all integer or non-trivial scalars, otherwise apply bytewise. For non-scalar types, the original implementations are used to maintain compatibility and minimize the number of changes. Fixes #2201.	2021-08-03 08:44:28 -07:00
Antonio Sanchez	7880f10526	Enable equality comparisons on GPU. Since `std::equal_to::operator()` is not a device function, it fails on GPU. On my device, I seem to get a silent crash in the kernel (no reported error, but the kernel does not complete). Replacing this with a portable version enables comparisons on device. Addresses #2292 - would need to be cherry-picked. The 3.3 branch also requires adding `EIGEN_DEVICE_FUNC` in `BooleanRedux.h` to get fully working.	2021-08-03 01:53:31 +00:00
hyunggi-sv	02a0e79c70	fix:typo in dox (has->have)	2021-08-03 00:45:00 +00:00
Antonio Sanchez	9816fe59b4	Fix assignment operator issue for latest MSVC+NVCC. Details are scattered across #920, #1000, #1324, #2291. Summary: some MSVC versions have a bug that requires omitting explicit `operator=` definitions (leads to duplicate definition errors), and some MSVC versions require adding explicit `operator=` definitions (otherwise implicitly deleted errors). This mess tries to cover all the cases encountered. Fixes #2291.	2021-08-03 00:26:10 +00:00
Antonio Sanchez	de2e62c62d	Disable vectorization of comparisons except for bool. Packet input/output types must currently be the same, and since these have a return type of `bool`, vectorization will only work if input is bool.	2021-07-25 13:39:50 -07:00
derekjchow	66ca41bd47	Add support for vectorizing logical comparisons.	2021-07-23 20:07:48 +00:00
arthurfeeney	a77638387d	Fixes #1387 for compilation error in JacobiSVD with HouseholderQRPreconditioner that occurs when input is a compile-time row vector.	2021-07-20 20:11:22 +00:00
Antonio Sanchez	297f0f563d	Fix explicit default cache size typo.	2021-07-20 11:40:17 -07:00
Rohit Santhanam	beea14a18f	Enable extract et. al. for HIP GPU.	2021-07-09 14:58:07 +00:00
Rasmus Munk Larsen	0c361c4899	Defer to std::fill_n when filling a dense object with a constant value.	2021-07-09 03:59:35 +00:00
Antonio Sanchez	1e6c6c1576	Replace memset with fill to work for non-trivial scalars. For custom scalars, zero is not necessarily represented by a zeroed-out memory block (e.g. gnu MPFR). We therefore cannot rely on `memset` if we want to fill a matrix or tensor with zeroes. Instead, we should rely on `fill`, which for trivial types does end up getting converted to a `memset` under-the-hood (at least with gcc/clang). Requires adding a `fill(begin, end, v)` to `TensorDevice`. Replaced all potentially bad instances of memset with fill. Fixes #2245.	2021-07-08 18:34:41 +00:00
Guoqiang QI	4bcd42c271	Make a copy of input matrix when try to do the inverse in place, this fixes #2285 .	2021-07-08 17:05:26 +00:00
Rasmus Munk Larsen	7b35638ddb	Fix breakage of conj_helper in conjunction with custom types introduced in !537 .	2021-07-02 20:42:15 +00:00
Rasmus Munk Larsen	bbfc4d54cd	Use `padd` instead of `+`.	2021-07-02 02:51:48 +00:00
Rasmus Munk Larsen	9312a5bf5c	Implement a generic vectorized version of Smith's algorithms for complex division.	2021-07-01 23:31:12 +00:00
Antonio Sanchez	6035da5283	Fix compile issues for gcc 4.8. - Move constructors can only be defaulted as NOEXCEPT if all members have NOEXCEPT move constructors. - gcc 4.8 has some funny parsing bug in `a < b->c`, thinking `b-` is a template parameter.	2021-07-01 22:58:14 +00:00
Antonio Sanchez	154f00e9ea	Fix inverse nullptr/asan errors for LU. For empty or single-column matrices, the current `PartialPivLU` currently dereferences a `nullptr` or accesses memory out-of-bounds. Here we adjust the checks to avoid this.	2021-07-01 13:41:04 -07:00
Dan Miller	eb04775903	Fix duplicate definitions on Mac	2021-07-01 14:54:12 +00:00
Chip Kerchner	91e99ec1e0	Create the ability to disable the specialized gemm_pack_rhs in Eigen (only PPC) for TensorFlow	2021-06-30 23:05:04 +00:00
Alexander Karatarakis	60400334a9	Make DenseStorage<> trivially_copyable	2021-06-30 04:27:51 +00:00
大河メタル	c81da59a25	Correct declarations for aarch64-pc-windows-msvc	2021-06-30 04:09:46 +00:00
Rasmus Munk Larsen	5aebbe9098	Get rid of redundant `pabs` instruction in complex square root.	2021-06-29 23:26:15 +00:00
Rohit Santhanam	2d132d1736	Commit `52a5f982` broke conjhelper functionality for HIP GPUs. This commit addresses this.	2021-06-25 19:28:00 +00:00
Rasmus Munk Larsen	bffd267d17	Small cleanup: Get rid of the macros EIGEN_HAS_SINGLE_INSTRUCTION_CJMADD and CJMADD, which were effectively unused, apart from on x86, where the change results in identically performing code.	2021-06-24 18:52:17 -07:00
Rasmus Munk Larsen	52a5f98212	Get rid of code duplication for conj_helper. For packets where LhsType=RhsType a single generic implementation suffices. For scalars, the generic implementation of pconj automatically forwards to numext::conj, so much of the existing specialization can be avoided. For mixed types we still need specializations.	2021-06-24 15:47:48 -07:00
Rasmus Munk Larsen	4ad30a73fc	Use internal::ref_selector to avoid holding a reference to a RHS expression.	2021-06-22 14:31:32 +00:00
Antonio Sanchez	35a367d557	Fix fix<> for gcc-4.9.3. There's a missing `EIGEN_HAS_CXX14` -> `EIGEN_HAS_CXX14_VARIABLE_TEMPLATES` replacement. Fixes ##2267	2021-06-18 13:22:54 -07:00
Antonio Sanchez	12e8d57108	Remove pset, replace with ploadu. We can't make guarantees on alignment for existing calls to `pset`, so we should default to loading unaligned. But in that case, we should just use `ploadu` directly. For loading constants, this load should hopefully get optimized away. This is causing segfaults in Google Maps.	2021-06-16 18:41:17 -07:00
Chip-Kerchner	ef1fd341a8	EIGEN_STRONG_INLINE was NOT inlining in some critical needed areas (6.6X slowdown) when used with Tensorflow. Changing to EIGEN_ALWAYS_INLINE where appropiate.	2021-06-16 16:30:31 +00:00
Antonio Sanchez	9e94c59570	Add missing ppc pcmp_lt_or_nan<Packet8bf>	2021-06-15 13:42:17 -07:00
Rasmus Munk Larsen	13fb5ab92c	Fix more enum arithmetic.	2021-06-15 09:09:31 -07:00
Antonio Sanchez	ad82d20cf6	Fix checking of version number for mingw. MinGW spits out version strings like: `x86_64-w64-mingw32-g++ (GCC) 10-win32 20210110`, which causes the version extraction to fail. Added support for this with tests. Also added `make_unsigned` for `long long`, since mingw seems to use that for `uint64_t`. Related to #2268. CMake and build passes for me after this.	2021-06-11 23:19:10 +00:00
Rasmus Munk Larsen	fc87e2cbaa	Use bit_cast to create -0.0 for floating point types to avoid compiler optimization changing sign with --ffast-math enabled.	2021-06-11 02:35:53 +00:00
Rasmus Munk Larsen	f64b2954c7	Fix c++20 warnings about using enums in arithmetic expressions.	2021-06-10 17:17:39 -07:00
Cyril Kaiser	91cd67f057	Remove EIGEN_DEVICE_FUNC from CwiseBinaryOp's default copy constructor.	2021-05-26 19:28:13 +00:00
Antonio Sanchez	dba753a986	Add missing NEON ptranspose implementations. Unified implementation using only `vzip`.	2021-05-25 18:25:35 +00:00
Antonio Sanchez	ebb300d0b4	Modify Unary/Binary/TernaryOp evaluators to work for non-class types. This used to work for non-class types (e.g. raw function pointers) in Eigen 3.3. This was changed in commit `11f55b29` to optimize the evaluator: > `sizeof((A-B).cwiseAbs2())` with A,B Vector4f is now 16 bytes, instead of 48 before this optimization. though I cannot reproduce the 16 byte result. Both before the change and after, with multiple compilers/versions, I always get a result of 40 bytes. https://godbolt.org/z/MsjTc1PGe This change modifies the code slightly to allow non-class types. The final generated code is identical, and the expression remains 40 bytes for the `abs2` sample case. Fixes #2251	2021-05-23 12:44:37 -07:00
Steve Bronder	1720057023	Adds macro for checking if C++14 variable templates are supported	2021-05-21 16:25:32 +00:00
Niall Murphy	391094c507	Use derived object type in conservative_resize_like_impl When calling conservativeResize() on a matrix with DontAlign flag, the temporary variable used to perform the resize should have the same Options as the original matrix to ensure that the correct override of swap is called (i.e. PlainObjectBase::swap(DenseBase<OtherDerived> & other). Calling the base class swap (i.e in DenseBase) results in assertions errors or memory corruption.	2021-05-20 23:17:02 +00:00
Nathan Luehr	7e6a1c129c	Device implementation of log for std::complex types.	2021-05-11 22:02:21 +00:00
Nathan Luehr	6753f0f197	Fix ambiguity due to argument dependent lookup.	2021-05-11 15:41:11 -05:00
guoqiangqi	3d9051ea84	Changing the storage of the SSE complex packets to that of the wrapper. This should fix #2242 .	2021-05-10 23:53:16 +00:00
Rohit Santhanam	39ec31c0ad	Fix for issue where numext::imag and numext::real are used before they are defined.	2021-05-10 19:48:32 +00:00
Antonio Sanchez	c0eb5f89a4	Restore ABI compatibility for conj with 3.3, fix conflict with boost. The boost library unfortunately specializes `conj` for various types and assumes the original two-template-parameter version. This changes restores the second parameter. This also restores ABI compatibility. The specialization for `std::complex` is because `std::conj` is not a device function. For custom complex scalar types, users should provide their own `conj` implementation. We may consider removing the unnecessary second parameter in the future - but this will require modifying boost as well. Fixes #2112.	2021-05-07 18:14:00 +00:00
Antonio Sanchez	90e9a33e1c	Fix numext::arg return type. The cxx11 path for `numext::arg` incorrectly returned the complex type instead of the real type, leading to compile errors. Fixed this and added tests. Related to !477, which uncovered the issue.	2021-05-07 16:26:57 +00:00
Christoph Hertzberg	722ca0b665	Revert addition of unused `paddsub<Packet2cf>`. This fixes #2242	2021-05-06 18:36:47 +02:00
Antonio Sanchez	1c013be2cc	Better CUDA complex division. The original produced NaNs when dividing 0/b for subnormal b. The `complex_divide_stable` was changed to use the more common Smith's algorithm.	2021-04-29 17:39:58 +00:00
Antonio Sanchez	172db7bfc3	Add missing pcmp_lt_or_nan for NEON Packet4bf.	2021-04-27 14:12:11 -07:00
Jakub Lichman	d87648a6be	Tests added and AVX512 bug fixed for pcmp_lt_or_nan	2021-04-25 20:58:56 +00:00

1 2 3 4 5 ...

6674 Commits