Commit Graph

6582 Commits

Author SHA1 Message Date
Rasmus Munk Larsen
3147391d94 Change version to 3.4.0. 2021-08-18 13:41:58 -07:00
Antonio Sanchez
115591b9e3 Workaround VS 2017 arg bug.
In VS 2017, `std::arg` for real inputs always returns 0, even for
negative inputs.  It should return `PI` for negative real values.
This seems to be fixed in VS 2019 (MSVC 1920).


(cherry picked from commit 2b410ecbef)
2021-08-18 19:04:50 +00:00
Jakob Struye
1ec173b54e Clearer doc for squaredNorm
(cherry picked from commit 53a29c7e35)
2021-08-18 15:12:36 +00:00
Antonio Sanchez
aef926abf6 Renamed shift_left/shift_right to shiftLeft/shiftRight.
For naming consistency.  Also moved to ArrayCwiseUnaryOps, and added
test.


(cherry picked from commit fc9d352432)
2021-08-18 14:44:31 +00:00
Antonio Sanchez
f1032255d3 Add missing PPC packet comparisons.
This is to fix the packetmath tests on the ppc pipeline.


(cherry picked from commit 2cc6ee0d2e)
2021-08-17 15:33:55 +00:00
Chip-Kerchner
f57dec64ef Fix unaligned loads in ploadLhs & ploadRhs for P8.
(cherry picked from commit 8dcf3e38ba)
2021-08-17 12:48:36 +00:00
andiwand
cd474d4cd0 minor doc fix in Map.h
(cherry picked from commit 5c6b3efead)
2021-08-16 14:26:39 +00:00
Chip-Kerchner
0b56b62f30 Reverse compare logic ƒin F32ToBf16 since vec_cmpne is not available in Power8 - now compiles for clang10 default (P8).
(cherry picked from commit e07227c411)
2021-08-13 18:01:15 +00:00
Chip Kerchner
44cc96e1a1 Get rid of used uninitialized warnings for EIGEN_UNUSED_VARIABLE in gcc11+
(cherry picked from commit 66499f0f17)
2021-08-12 21:39:17 +00:00
Rasmus Munk Larsen
6d2506040c * revise the meta_least_common_multiple function template, add a bool variable to check whether the A is larger than B.
* This can make less compile_time if A is smaller than B. and avoid failure in compile if we get a little A and a great B.

Authored by @awoniu.

(cherry picked from commit 8ce341caf2)
2021-08-11 18:11:26 +00:00
ChipKerchner
13d7658c5d Fix errors on older compilers (gcc 7.5 - lack of vec_neg, clang10 - can not use const pointers with vec_xl).
(cherry picked from commit 413bc491f1)
2021-08-10 20:40:54 +00:00
Gauri Deshpande
93bff85a42 remove denormal flushing in fp32tobf16 for avx & avx512
(cherry picked from commit e6a5a594a7)
2021-08-09 22:15:42 +00:00
Rasmus Munk Larsen
4e0357c6dd Avoid memory allocation in tridiagonalization_inplace_selector::run.
(cherry picked from commit a5a7faeb45)
2021-08-06 21:48:00 +00:00
Antonio Sanchez
5b83d3c4bc Make inverse 3x3 faster and avoid gcc bug.
There seems to be a gcc 4.7 bug that incorrectly flags the current
3x3 inverse as using uninitialized memory.  I'm *pretty* sure it's
a false positive, but it's hard to trigger.  The same warning
does not trigger with clang or later compiler versions.

In trying to find a work-around, this implementation turns out to be
faster anyways for static-sized matrices.

```
name                                            old cpu/op  new cpu/op  delta
BM_Inverse3x3<DynamicMatrix3T<float>>            423ns ± 2%   433ns ± 3%   +2.32%    (p=0.000 n=98+96)
BM_Inverse3x3<DynamicMatrix3T<double>>           425ns ± 2%   427ns ± 3%   +0.48%    (p=0.003 n=99+96)
BM_Inverse3x3<StaticMatrix3T<float>>            7.10ns ± 2%  0.80ns ± 1%  -88.67%  (p=0.000 n=114+112)
BM_Inverse3x3<StaticMatrix3T<double>>           7.45ns ± 2%  1.34ns ± 1%  -82.01%  (p=0.000 n=105+111)
BM_AliasedInverse3x3<DynamicMatrix3T<float>>     409ns ± 3%   419ns ± 3%   +2.40%   (p=0.000 n=100+98)
BM_AliasedInverse3x3<DynamicMatrix3T<double>>    414ns ± 3%   413ns ± 2%     ~       (p=0.322 n=98+98)
BM_AliasedInverse3x3<StaticMatrix3T<float>>     7.57ns ± 1%  0.80ns ± 1%  -89.37%  (p=0.000 n=111+114)
BM_AliasedInverse3x3<StaticMatrix3T<double>>    9.09ns ± 1%  2.58ns ±41%  -71.60%  (p=0.000 n=113+116)
```


(cherry picked from commit 5ad8b9bfe2)
2021-08-04 22:06:52 +00:00
Antonio Sanchez
237c59a2aa Modify scalar pzero, ptrue, pselect, and p<binary> operations to avoid memset.
The `memset` function and bitwise manipulation only apply to POD types
that do not require initialization, otherwise resulting in UB. We currently
violate this in `ptrue` and `pzero`, we assume bitmasks for `pselect`, and
bitwise operations are applied byte-by-byte in the generic implementations.

This is causing issues for scalar types that do require initialization
or that contain non-POD info such as pointers (#2201). We either break
them, or force specializations of these functions for custom scalars,
even if they are not vectorized.

Here we modify these functions for scalars only - instead using only
scalar operations:
- `pzero`: `Scalar(0)` for all scalars.
- `ptrue`: `Scalar(1)` for non-trivial scalars, bitset to one bits for trivial scalars.
- `pselect`: ternary select comparing mask to `Scalar(0)` for all scalars
- `pand`, `por`, `pxor`, `pnot`: use operators `&`, `|`, `^`, `~` for all integer or non-trivial scalars, otherwise apply bytewise.

For non-scalar types, the original implementations are used to maintain
compatibility and minimize the number of changes.

Fixes #2201.


(cherry picked from commit 3d98a6ef5c)
2021-08-03 16:32:59 +00:00
Antonio Sanchez
3dc42eeaec Enable equality comparisons on GPU.
Since `std::equal_to::operator()` is not a device function, it
fails on GPU.  On my device, I seem to get a silent crash in the
kernel (no reported error, but the kernel does not complete).

Replacing this with a portable version enables comparisons on device.

Addresses #2292 - would need to be cherry-picked.  The 3.3 branch
also requires adding `EIGEN_DEVICE_FUNC` in `BooleanRedux.h` to get
fully working.


(cherry picked from commit 7880f10526)
2021-08-03 16:15:44 +00:00
hyunggi-sv
7adc1545b4 fix:typo in dox (has->have)
(cherry picked from commit 02a0e79c70)
2021-08-03 00:54:41 +00:00
Antonio Sanchez
c0c7b695cd Fix assignment operator issue for latest MSVC+NVCC.
Details are scattered across #920, #1000, #1324, #2291.

Summary: some MSVC versions have a bug that requires omitting explicit
`operator=` definitions (leads to duplicate definition errors), and
some MSVC versions require adding explicit `operator=` definitions
(otherwise implicitly deleted errors).  This mess tries to cover
all the cases encountered.

Fixes #2291.


(cherry picked from commit 9816fe59b4)
2021-08-03 00:52:21 +00:00
arthurfeeney
9c90d5d832 Fixes #1387 for compilation error in JacobiSVD with HouseholderQRPreconditioner that occurs when input is a compile-time row vector.
(cherry picked from commit a77638387d)
2021-07-22 18:01:55 +00:00
Antonio Sanchez
5d37114fc0 Fix explicit default cache size typo.
(cherry picked from commit 297f0f563d)
2021-07-20 18:42:25 +00:00
Rohit Santhanam
930696fc53 Enable extract et. al. for HIP GPU.
(cherry picked from commit beea14a18f)
2021-07-09 16:14:19 +00:00
Rasmus Munk Larsen
56966fd2e6 Defer to std::fill_n when filling a dense object with a constant value.
(cherry picked from commit 0c361c4899)
2021-07-09 03:59:56 +00:00
Guoqiang QI
69ec4907da Make a copy of input matrix when try to do the inverse in place, this fixes #2285.
(cherry picked from commit 4bcd42c271)
2021-07-08 17:07:54 +00:00
Rasmus Munk Larsen
05bab8139a Fix breakage of conj_helper in conjunction with custom types introduced in !537.
(cherry picked from commit 7b35638ddb)
2021-07-02 20:59:50 +00:00
Chip Kerchner
eebde572d9 Create the ability to disable the specialized gemm_pack_rhs in Eigen (only PPC) for TensorFlow
(cherry picked from commit 91e99ec1e0)
2021-07-01 23:32:38 +00:00
Antonio Sanchez
8190739f12 Fix compile issues for gcc 4.8.
- Move constructors can only be defaulted as NOEXCEPT if all members
have NOEXCEPT move constructors.
- gcc 4.8 has some funny parsing bug in `a < b->c`, thinking `b-` is a template parameter.


(cherry picked from commit 6035da5283)
2021-07-01 23:18:10 +00:00
Antonio Sanchez
b6db013435 Fix inverse nullptr/asan errors for LU.
For empty or single-column matrices, the current `PartialPivLU`
currently dereferences a `nullptr` or accesses memory out-of-bounds.
Here we adjust the checks to avoid this.


(cherry picked from commit 154f00e9ea)
2021-07-01 22:57:25 +00:00
Dan Miller
1f6b1c1a1f Fix duplicate definitions on Mac
(cherry picked from commit eb04775903)
2021-07-01 20:49:05 +00:00
Alexander Karatarakis
517294d6e1 Make DenseStorage<> trivially_copyable
(cherry picked from commit 60400334a9)
2021-07-01 20:48:47 +00:00
大河メタル
94e2250b36 Correct declarations for aarch64-pc-windows-msvc
(cherry picked from commit c81da59a25)
2021-06-30 04:10:04 +00:00
Rasmus Munk Larsen
380d0e4916 Get rid of redundant pabs instruction in complex square root.
(cherry picked from commit 5aebbe9098)
2021-06-29 23:27:09 +00:00
Rohit Santhanam
e83af2cc24 Commit 52a5f982 broke conjhelper functionality for HIP GPUs.
This commit addresses this.


(cherry picked from commit 2d132d1736)
2021-06-25 19:56:18 +00:00
Rasmus Munk Larsen
413ff2b531 Small cleanup: Get rid of the macros EIGEN_HAS_SINGLE_INSTRUCTION_CJMADD and CJMADD, which were effectively unused, apart from on x86, where the change results in identically performing code.
(cherry picked from commit bffd267d17)
2021-06-25 17:13:12 +00:00
Rasmus Munk Larsen
a235ddef39 Get rid of code duplication for conj_helper. For packets where LhsType=RhsType a single generic implementation suffices. For scalars, the generic implementation of pconj automatically forwards to numext::conj, so much of the existing specialization can be avoided. For mixed types we still need specializations.
(cherry picked from commit 52a5f98212)
2021-06-24 23:30:42 +00:00
Antonio Sanchez
c2c0f6f64b Fix fix<> for gcc-4.9.3.
There's a missing `EIGEN_HAS_CXX14` -> `EIGEN_HAS_CXX14_VARIABLE_TEMPLATES`
replacement.

Fixes ##2267


(cherry picked from commit 35a367d557)
2021-06-21 17:26:07 +00:00
Antonio Sanchez
ee4e099aa2 Remove pset, replace with ploadu.
We can't make guarantees on alignment for existing calls to `pset`,
so we should default to loading unaligned.  But in that case, we should
just use `ploadu` directly. For loading constants, this load should hopefully
get optimized away.

This is causing segfaults in Google Maps.


(cherry picked from commit 12e8d57108)
2021-06-17 17:11:08 +00:00
Chip-Kerchner
9fc93ce31a EIGEN_STRONG_INLINE was NOT inlining in some critical needed areas (6.6X slowdown) when used with Tensorflow. Changing to EIGEN_ALWAYS_INLINE where appropiate.
(cherry picked from commit ef1fd341a8)
2021-06-16 22:14:17 +00:00
Antonio Sanchez
1374f49f28 Add missing ppc pcmp_lt_or_nan<Packet8bf>
(cherry picked from commit 9e94c59570)
2021-06-15 22:12:22 +00:00
Rasmus Munk Larsen
47722a66f2 Fix more enum arithmetic.
(cherry picked from commit 13fb5ab92c)
2021-06-15 16:40:35 +00:00
Antonio Sanchez
5e75331b9f Fix checking of version number for mingw.
MinGW spits out version strings like: `x86_64-w64-mingw32-g++ (GCC)
10-win32 20210110`, which causes the version extraction to fail.
Added support for this with tests.

Also added `make_unsigned` for `long long`, since mingw seems to
use that for `uint64_t`.

Related to #2268.  CMake and build passes for me after this.


(cherry picked from commit ad82d20cf6)
2021-06-12 00:02:26 +00:00
Rasmus Munk Larsen
1cb1ffd5b2 Use bit_cast to create -0.0 for floating point types to avoid compiler optimization changing sign with --ffast-math enabled.
(cherry picked from commit fc87e2cbaa)
2021-06-11 02:57:02 +00:00
Rasmus Munk Larsen
4b502a7215 Fix c++20 warnings about using enums in arithmetic expressions.
(cherry picked from commit f64b2954c7)
2021-06-11 02:35:19 +00:00
Cyril Kaiser
573570b6c9 Remove EIGEN_DEVICE_FUNC from CwiseBinaryOp's default copy constructor.
(cherry picked from commit 91cd67f057)
2021-05-26 19:45:25 +00:00
Antonio Sanchez
98cf1e076f Add missing NEON ptranspose implementations.
Unified implementation using only `vzip`.


(cherry picked from commit dba753a986)
2021-05-25 19:09:50 +00:00
Antonio Sanchez
ee2a8f7139 Modify Unary/Binary/TernaryOp evaluators to work for non-class types.
This used to work for non-class types (e.g. raw function pointers) in
Eigen 3.3.  This was changed in commit 11f55b29 to optimize the
evaluator:

> `sizeof((A-B).cwiseAbs2())` with A,B Vector4f is now 16 bytes, instead of 48 before this optimization.

though I cannot reproduce the 16 byte result.  Both before the change
and after, with multiple compilers/versions, I always get a result of 40 bytes.

https://godbolt.org/z/MsjTc1PGe

This change modifies the code slightly to allow non-class types.  The
final generated code is identical, and the expression remains 40 bytes
for the `abs2` sample case.

Fixes #2251


(cherry picked from commit ebb300d0b4)
2021-05-25 18:19:53 +00:00
Steve Bronder
4fbd01cd4b Adds macro for checking if C++14 variable templates are supported
(cherry picked from commit 1720057023)
2021-05-21 16:43:30 +00:00
Niall Murphy
a883a8797c Use derived object type in conservative_resize_like_impl
When calling conservativeResize() on a matrix with DontAlign flag, the
temporary variable used to perform the resize should have the same
Options as the original matrix to ensure that the correct override of
swap is called (i.e. PlainObjectBase::swap(DenseBase<OtherDerived> &
other). Calling the base class swap (i.e in DenseBase) results in
assertions errors or memory corruption.


(cherry picked from commit 391094c507)
2021-05-20 23:43:57 +00:00
guoqiangqi
2f908f8255 Changing the storage of the SSE complex packets to that of the wrapper. This should fix #2242 .
(cherry picked from commit 3d9051ea84)
2021-05-12 17:02:19 +00:00
Nathan Luehr
d1825cbb68 Device implementation of log for std::complex types.
(cherry picked from commit 7e6a1c129c)
2021-05-11 22:31:53 +00:00
Nathan Luehr
d9288f078d Fix ambiguity due to argument dependent lookup.
(cherry picked from commit 6753f0f197)
2021-05-11 22:00:36 +00:00