Commit Graph

2465 Commits

Author SHA1 Message Date
Essex Edwards
e741b43668 Make Transform::computeRotationScaling(0,&S) continuous 2021-01-07 17:45:14 +00:00
Antonio Sanchez
bb1de9dbde Fix Ref Stride checks.
The existing `Ref` class failed to consider cases where the Ref's
`Stride` setting *could* match the underlying referred object's stride,
but **didn't** at runtime.  This led to trying to set invalid stride values,
causing runtime failures in some cases, and garbage due to mismatched
strides in others.

Here we add the missing runtime checks.  This involves computing the
strides necessary to align with the referred object's storage, and
verifying we can actually set those strides at runtime.

In the `const` case, if it *may* be possible to refer to the original
storage at compile-time but fails at runtime, then we defer to the
`construct(...)` method that makes a copy.

Added more tests to check these cases.

Fixes #2093.
2021-01-05 10:41:25 -08:00
Christoph Hertzberg
12dda34b15 Eliminate boolean product warnings by factoring out a
`combine_scalar_factors` helper function.
2021-01-05 18:15:30 +00:00
Antonio Sanchez
070d303d56 Add CUDA complex sqrt.
This is to support scalar `sqrt` of complex numbers `std::complex<T>` on
device, requested by Tensorflow folks.

Technically `std::complex` is not supported by NVCC on device
(though it is by clang), so the default `sqrt(std::complex<T>)` function only
works on the host. Here we create an overload to add back the
functionality.

Also modified the CMake file to add `--relaxed-constexpr` (or
equivalent) flag for NVCC to allow calling constexpr functions from
device functions, and added support for specifying compute architecture for
NVCC (was already available for clang).
2020-12-22 23:25:23 -08:00
Antonio Sanchez
c6efc4e0ba Replace M_LOG2E and M_LN2 with custom macros.
For these to exist we would need to define `_USE_MATH_DEFINES` before
`cmath` or `math.h` is first included.  However, we don't
control the include order for projects outside Eigen, so even defining
the macro in `Eigen/Core` does not fix the issue for projects that
end up including `<cmath>` before Eigen does (explicitly or transitively).

To fix this, we define `EIGEN_LOG2E` and `EIGEN_LN2` ourselves.
2020-12-11 14:34:31 -08:00
Rasmus Munk Larsen
125cc9a5df Implement vectorized complex square root.
Closes #1905

Measured speedup for sqrt of `complex<float>` on Skylake:

SSE:
```
name                      old time/op             new time/op  delta
BM_eigen_sqrt_ctype/1     49.4ns ± 0%             54.3ns ± 0%  +10.01%
BM_eigen_sqrt_ctype/8      332ns ± 0%               50ns ± 1%  -84.97%
BM_eigen_sqrt_ctype/64    2.81µs ± 1%             0.38µs ± 0%  -86.49%
BM_eigen_sqrt_ctype/512   23.8µs ± 0%              3.0µs ± 0%  -87.32%
BM_eigen_sqrt_ctype/4k     202µs ± 0%               24µs ± 2%  -88.03%
BM_eigen_sqrt_ctype/32k   1.63ms ± 0%             0.19ms ± 0%  -88.18%
BM_eigen_sqrt_ctype/256k  13.0ms ± 0%              1.5ms ± 1%  -88.20%
BM_eigen_sqrt_ctype/1M    52.1ms ± 0%              6.2ms ± 0%  -88.18%
```

AVX2:
```
name                      old cpu/op  new cpu/op  delta
BM_eigen_sqrt_ctype/1     53.6ns ± 0%  55.6ns ± 0%   +3.71%
BM_eigen_sqrt_ctype/8      334ns ± 0%    27ns ± 0%  -91.86%
BM_eigen_sqrt_ctype/64    2.79µs ± 0%  0.22µs ± 2%  -92.28%
BM_eigen_sqrt_ctype/512   23.8µs ± 1%   1.7µs ± 1%  -92.81%
BM_eigen_sqrt_ctype/4k     201µs ± 0%    14µs ± 1%  -93.24%
BM_eigen_sqrt_ctype/32k   1.62ms ± 0%  0.11ms ± 1%  -93.29%
BM_eigen_sqrt_ctype/256k  13.0ms ± 0%   0.9ms ± 1%  -93.31%
BM_eigen_sqrt_ctype/1M    52.0ms ± 0%   3.5ms ± 1%  -93.31%
```

AVX512:
```
name                      old cpu/op  new cpu/op  delta
BM_eigen_sqrt_ctype/1     53.7ns ± 0%  56.2ns ± 1%   +4.75%
BM_eigen_sqrt_ctype/8      334ns ± 0%    18ns ± 2%  -94.63%
BM_eigen_sqrt_ctype/64    2.79µs ± 0%  0.12µs ± 1%  -95.54%
BM_eigen_sqrt_ctype/512   23.9µs ± 1%   1.0µs ± 1%  -95.89%
BM_eigen_sqrt_ctype/4k     202µs ± 0%     8µs ± 1%  -96.13%
BM_eigen_sqrt_ctype/32k   1.63ms ± 0%  0.06ms ± 1%  -96.15%
BM_eigen_sqrt_ctype/256k  13.0ms ± 0%   0.5ms ± 4%  -96.11%
BM_eigen_sqrt_ctype/1M    52.1ms ± 0%   2.0ms ± 1%  -96.13%
```
2020-12-08 18:13:35 -08:00
Rasmus Munk Larsen
f9fac1d5b0 Add log2() to Eigen. 2020-12-04 21:45:09 +00:00
Rasmus Munk Larsen
f23dc5b971 Revert "Add log2() operator to Eigen"
This reverts commit 4d91519a9b.
2020-12-03 14:32:45 -08:00
Rasmus Munk Larsen
4d91519a9b Add log2() operator to Eigen 2020-12-03 22:31:44 +00:00
Antonio Sanchez
eb4d4ae070 Include chrono in main for c++11.
Hack to fix tensor tests, since min/max are overridden by `main.h`.
2020-12-03 11:27:32 -08:00
Antonio Sanchez
89f90b585d AVX512 missing ops.
This allows the `packetmath` tests to pass for AVX512 on skylake.
Made `half` and `bfloat16` consistent in terms of ops they support.

Note the `log` tests are currently disabled for `bfloat16` since
they fail due to poor precision (they were previously disabled for
`Packet8bf` via test function specialization -- I just removed that
specialization and disabled it in the generic test).
2020-11-30 16:28:57 +00:00
Bowie Owens
9842366bba Make inclusion of doc sub-directory optional by adjusting options.
Allows exclusion of doc and related targets to help when using eigen via add_subdirectory().

Requested by:

https://gitlab.com/libeigen/eigen/-/issues/1842

Also required making EIGEN_TEST_BUILD_DOCUMENTATION a dependent option on EIGEN_BUILD_DOC. This ensures documentation targets are properly defined when EIGEN_TEST_BUILD_DOCUMENTATION is ON.
2020-11-27 08:11:49 +11:00
Rasmus Munk Larsen
79818216ed Revert "Fix Half NaN definition and test."
This reverts commit c770746d70.
2020-11-24 12:57:28 -08:00
Rasmus Munk Larsen
c770746d70 Fix Half NaN definition and test.
The `half_float` test was failing with `-mcpu=cortex-a55` (native `__fp16`) due
to a bad NaN bit-pattern comparison (in the case of casting a float to `__fp16`,
the signaling `NaN` is quieted). There was also an inconsistency between
`numeric_limits<half>::quiet_NaN()` and `NumTraits::quiet_NaN()`.  Here we
correct the inconsistency and compare NaNs according to the IEEE 754
definition.

Also modified the `bfloat16_float` test to match.

Tested with `cortex-a53` and `cortex-a55`.
2020-11-24 20:53:07 +00:00
Antonio Sanchez
a3b300f1af Implement missing AVX half ops.
Minimal implementation of AVX `Eigen::half` ops to bring in line
with `bfloat16`.  Allows `packetmath_13` to pass.

Also adjusted `bfloat16` packet traits to match the supported set
of ops (e.g. Bessel is not actually implemented).
2020-11-24 16:46:41 +00:00
Antonio Sanchez
38abf2be42 Fix Half NaN definition and test.
The `half_float` test was failing with `-mcpu=cortex-a55` (native `__fp16`) due
to a bad NaN bit-pattern comparison (in the case of casting a float to `__fp16`,
the signaling `NaN` is quieted). There was also an inconsistency between
`numeric_limits<half>::quiet_NaN()` and `NumTraits::quiet_NaN()`.  Here we
correct the inconsistency and compare NaNs according to the IEEE 754
definition.

Also modified the `bfloat16_float` test to match.

Tested with `cortex-a53` and `cortex-a55`.
2020-11-23 14:13:59 -08:00
Antonio Sanchez
4cf01d2cf5 Update AVX half packets, disable test.
The AVX half implementation is incomplete, causing the `packetmath_13` test
to fail.  This disables the test.

Also refactored the existing AVX implementation to use `bit_cast`
instead of direct access to `.x`.
2020-11-21 09:05:10 -08:00
Antonio Sanchez
a8fdcae55d Fix sparse_extra_3, disable counting temporaries for testing DynamicSparseMatrix.
Multiplication of column-major `DynamicSparseMatrix`es involves three
temporaries:
- two for transposing twice to sort the coefficients
(`ConservativeSparseSparseProduct.h`, L160-161)
- one for a final copy assignment (`SparseAssign.h`, L108)
The latter is avoided in an optimization for `SparseMatrix`.

Since `DynamicSparseMatrix` is deprecated in favor of `SparseMatrix`, it's not
worth the effort to optimize further, so I simply disabled counting
temporaries via a macro.

Note that due to the inclusion of `sparse_product.cpp`, the `sparse_extra`
tests actually re-run all the original `sparse_product` tests as well.

We may want to simply drop the `DynamicSparseMatrix` tests altogether, which
would eliminate the test duplication.

Related to #2048
2020-11-18 23:15:33 +00:00
David Tellenbach
11e4056f6b Re-enable Arm Neon Eigen::half packets of size 8
- Add predux_half_dowto4
- Remove explicit casts in Half.h to match the behaviour of BFloat16.h
- Enable more packetmath tests for Eigen::half
2020-11-18 23:02:21 +00:00
Antonio Sanchez
17268b155d Add bit_cast for half/bfloat to/from uint16_t, fix TensorRandom
The existing `TensorRandom.h` implementation makes the assumption that
`half` (`bfloat16`) has a `uint16_t` member `x` (`value`), which is not
always true. This currently fails on arm64, where `x` has type `__fp16`.
Added `bit_cast` specializations to allow casting to/from `uint16_t`
for both `half` and `bfloat16`.  Also added tests in
`half_float`, `bfloat16_float`, and `cxx11_tensor_random` to catch
these errors in the future.
2020-11-18 20:32:35 +00:00
Antonio Sanchez
41d5d5334b Initialize primitives to fix -Wuninitialized-const-reference.
The `meta` test generates warnings with the latest version of clang due
to passing uninitialized variables as const reference arguments.
```
test/meta.cpp:102:45: error: variable 'f' is uninitialized when passed as a const reference argument here [-Werror,-Wuninitialized-const-reference]
    VERIFY(( check_is_convertible(a.dot(b), f) ));
```
We don't actually use the variables, but initializing them eliminates the
new warning.

Fixes #2067.
2020-11-18 20:23:20 +00:00
Antonio Sanchez
8e9cc5b10a Eliminate double-promotion warnings.
Clang currently complains about implicit conversions, e.g.
```
test/packetmath.cpp:680:59: warning: implicit conversion increases floating-point precision: 'typename Eigen::internal::random_retval<typename Eigen::internal::global_math_functions_filtering_base<double>::type>::type' (aka 'double') to 'long double' [-Wdouble-promotion]
          data1[0] = Scalar((2 * k + k1) * EIGEN_PI / 2 * internal::random<double>(0.8, 1.2));
                                                        ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
test/packetmath.cpp:681:40: warning: implicit conversion increases floating-point precision: 'float' to 'long double' [-Wdouble-promotion]
          data1[1] = Scalar((2 * k + 2 + k1) * EIGEN_PI / 2 * internal::random<double>(0.8, 1.2));
```

Modified to explicitly cast to double.
2020-11-16 10:39:09 -08:00
Antonio Sanchez
bb69a8db5d Explicit casts of S -> std::complex<T>
When calling `internal::cast<S, std::complex<T>>(x)`, clang often
generates an implicit conversion warning due to an implicit cast
from type `S` to `T`.  This currently affects the following tests:
- `basicstuff`
- `bfloat16_float`
- `cxx11_tensor_casts`

The implicit cast leads to widening/narrowing float conversions.
Widening warnings only seem to be generated by clang (`-Wdouble-promotion`).

To eliminate the warning, we explicitly cast the real-component first
from `S` to `T`.  We also adjust tests to use `internal::cast` instead
of `static_cast` when a complex type may be involved.
2020-11-14 05:50:42 +00:00
Christoph Hertzberg
90f6d9d23e Suppress ignored-attributes warning (same as in vectorization_logic). Remove redundant include and using namespace. 2020-11-13 16:21:53 +01:00
Everton Constantino
348a48682e Fix erroneous forward declaration of boost nvp. 2020-11-10 13:07:34 -03:00
Deven Desai
9d11e2c03e CMakefile update for ROCm 4.0
Starting with ROCm 4.0, the `hipconfig --platform` command will return `amd` (prior return value was `hcc`). Updating the CMakeLists.txt files in the test dirs to account for this change.
2020-10-29 18:06:31 +00:00
David Tellenbach
e265f7ed8e Add support for Armv8.2-a __fp16
Armv8.2-a provides a native half-precision floating point (__fp16 aka.
float16_t). This patch introduces

* __fp16 as underlying type of Eigen::half if this type is available
* the packet types Packet4hf and Packet8hf representing float16x4_t and
  float16x8_t respectively
* packet-math for the above packets with corresponding scalar type Eigen::half

The packet-math functionality has been implemented by Ashutosh Sharma
<ashutosh.sharma@amperecomputing.com>.

This closes #1940.
2020-10-28 20:15:09 +00:00
Rasmus Munk Larsen
c6953f799b Add packet generic ops predux_fmin, predux_fmin_nan, predux_fmax, and predux_fmax_nan that implement reductions with PropagateNaN, and PropagateNumbers semantics. Add (slow) generic implementations for most reductions. 2020-10-13 21:48:31 +00:00
Rasmus Munk Larsen
4e4d3f32d1 Clean up packetmath tests and fix various bugs to make bfloat16 pass (almost) all packetmath tests with SSE, AVX, and AVX512. 2020-10-09 20:05:49 +00:00
David Tellenbach
7a8d3d5b81 Disable test exceptions when using OpenMP. 2020-10-09 17:49:07 +02:00
Rasmus Munk Larsen
b431024404 Don't make assumptions about NaN-propagation for pmin/pmax - it various across platforms.
Change test to only test for NaN-propagation for pfmin/pfmax.
2020-10-07 19:05:18 +00:00
Rasmus Munk Larsen
3b445d9bf2 Add a generic packet ops corresponding to {std}::fmin and {std}::fmax. The non-sensical NaN-propagation rules for std::min std::max implemented by pmin and pmax in Eigen is a longstanding source og confusion and bug report. This change is a first step towards addressing it, as discussing in issue #564. 2020-10-01 16:54:31 +00:00
Antonio Sanchez
d5a0d89491 Fix alignedbox 32-bit precision test failure.
The current `test/geo_alignedbox` tests fail on 32-bit arm due to small floating-point errors.

In particular, the following is not guaranteed to hold:
```
IsometryTransform identity = IsometryTransform::Identity();
BoxType transformedC;
transformedC.extend(c.transformed(identity));
VERIFY(transformedC.contains(c));
```
since `c.transformed(identity)` is ever-so-slightly different from `c`. Instead, we replace this test with one that checks an identity transform is within floating-point precision of `c`.

Also updated the condition on `AlignedBox::transform(...)` to only accept `Affine`, `AffineCompact`, and `Isometry` modes explicitly.  Otherwise, invalid combinations of modes would also incorrectly pass the assertion.
2020-09-30 08:42:03 -07:00
Martin Pecka
6425e875a1 Added AlignedBox::transform(AffineTransform). 2020-09-28 18:06:23 +00:00
David Tellenbach
493a7c773c Remove EIGEN_CONSTEXPR from NumTraits<boost::multiprecision::number<...>> 2020-09-21 12:43:41 +02:00
Rasmus Munk Larsen
e55182ac09 Get rid of initialization logic for blueNorm by making the computed constants static const or constexpr.
Move macro definition EIGEN_CONSTEXPR to Core and make all methods in NumTraits constexpr when EIGEN_HASH_CONSTEXPR is 1.
2020-09-18 17:38:58 +00:00
Tim Shen
bb56a62582 Make bfloat16(float(-nan)) produce -nan, not nan. 2020-09-15 13:24:23 -07:00
Pedro Caldeira
35d149e34c Add missing functions for Packet8bf in Altivec architecture.
Including new tests for bfloat16 Packets.
Fix prsqrt on GenericPacketMath.
2020-09-08 09:22:11 -05:00
Everton Constantino
6fe88a3c9d MatrixProuct enhancements:
- Changes to Altivec/MatrixProduct
  Adapting code to gcc 10.
  Generic code style and performance enhancements.
  Adding PanelMode support.
  Adding stride/offset support.
  Enabling float64, std::complex and std::complex.
  Fixing lack of symm_pack.
  Enabling mixedtypes.
- Adding std::complex tests to blasutil.
- Adding an implementation of storePacketBlock when Incr!= 1.
2020-09-02 18:21:36 -03:00
Gael Guennebaud
25424d91f6 Fix #1974: assertion when reserving an empty sparse matrix 2020-08-26 12:32:20 +02:00
Deven Desai
603e213d13 Fixing a CUDA / P100 regression introduced by PR 181
PR 181 ( https://gitlab.com/libeigen/eigen/-/merge_requests/181 ) adds `__launch_bounds__(1024)` attribute to GPU kernels, that did not have that attribute explicitly specified.

That PR seems to cause regressions on the CUDA platform. This PR/commit makes the changes in PR 181, to be applicable for HIP only
2020-08-20 00:29:57 +00:00
David Tellenbach
fe8c3ef3cb Add possibility to split test suit build targets and improved CI configuration
- Introduce CMake option `EIGEN_SPLIT_TESTSUITE` that allows to divide the single test build target into several subtargets
- Add CI pipeline for merge request that can be run by GitLab's shared runners
- Add nightly CI pipeline
2020-08-19 18:27:45 +00:00
David Tellenbach
d2bb6cf396 Fix compilation error in blasutil test 2020-08-14 18:15:18 +02:00
David Tellenbach
c6820a6316 Replace the call to int64_t in the blasutil test by explicit types
Some platforms define int64_t to be long long even for C++03. If this is
the case we miss the definition of internal::make_unsigned for this
type. If we just define the template we get duplicated definitions
errors for platforms defining int64_t as signed long for C++03.

We need to find a way to distinguish both cases at compile-time.
2020-08-14 17:24:37 +02:00
Pedro Caldeira
704798d1df Add support for Bfloat16 to use vector instructions on Altivec
architecture
2020-08-10 13:22:01 -05:00
Deven Desai
46f8a18567 Adding an explicit launch_bounds(1024) attribute for GPU kernels.
Starting with ROCm 3.5, the HIP compiler will change from HCC to hip-clang.

This compiler change introduce a change in the default value of the `__launch_bounds__` attribute associated with a GPU kernel. (default value means the value assumed by the compiler as the `__launch_bounds attribute__` value, when it is not explicitly specified by the user)

Currently (i.e. for HIP with ROCm 3.3 and older), the default value is 1024. That changes to 256 with ROCm 3.5 (i.e. hip-clang compiler). As a consequence of this change, if a GPU kernel with a `__luanch_bounds__` attribute of 256 is launched at runtime with a threads_per_block value > 256, it leads to a runtime error. This is leading to a couple of Eigen unit test failures with ROCm 3.5.

This commit adds an explicit `__launch_bounds(1024)__` attribute to every GPU kernel that currently does not have it explicitly specified (and hence will end up getting the default value of 256 with the change to hip-clang)
2020-08-05 01:46:34 +00:00
David Tellenbach
c1ffe452fc Fix bfloat16 casts
If we have explicit conversion operators available (C++11) we define
explicit casts from bfloat16 to other types. If not (C++03), we don't
define conversion operators but rely on implicit conversion chains from
bfloat16 over float to other types.
2020-07-23 20:55:06 +00:00
Rasmus Munk Larsen
1b84f21e32 Revert change that made conversion from bfloat16 to {float, double} implicit.
Add roundtrip tests for casting between bfloat16 and complex types.
2020-07-22 18:09:00 -07:00
Niels Dekker
0e1a33a461 Faster conversion from integer types to bfloat16
Specialized `bfloat16_impl::float_to_bfloat16_rtne(float)` for normal floating point numbers, infinity and zero, in order to improve the performance of `bfloat16::bfloat16(const T&)` for integer argument types.

A reduction of more than 20% of the runtime duration of conversion from int to bfloat16 was observed, using Visual C++ 2019 on Windows 10.
2020-07-22 19:25:49 +02:00
Niels Dekker
4ab32e2de2 Allow implicit conversion from bfloat16 to float and double
Conversion from `bfloat16` to `float` and `double` is lossless. It seems natural to allow the conversion to be implicit, as the C++ language also support implicit conversion from a smaller to a larger floating point type.

Intel's OneDLL bfloat16 implementation also has an implicit `operator float()`: https://github.com/oneapi-src/oneDNN/blob/v1.5/src/common/bfloat16.hpp
2020-07-11 13:32:28 +02:00
Rasmus Munk Larsen
dcf7655b3d Guard operator<< test by EIGEN_NO_IO. 2020-07-09 19:54:48 +00:00
Rasmus Munk Larsen
fb77b7288c Add operator<< to print a quaternion. 2020-07-09 12:49:58 -07:00
David Tellenbach
ee4715ff48 Fix test basic stuff
- Guard fundamental types that are not available pre C++11
- Separate subsequent angle brackets >> by spaces
- Allow casting of Eigen::half and Eigen::bfloat16 to complex types
2020-07-09 17:24:00 +00:00
Rasmus Munk Larsen
6964ae8d52 Change the sign operator in Eigen to return NaN for NaN arguments, not zero. 2020-07-07 01:54:04 +00:00
David Tellenbach
cb63153183 Make test packetmath C++98 compliant 2020-07-01 20:41:59 +02:00
Kan Chen
8731452b97 Delete duplicate test cases in vectorization_logic.cpp 2020-07-01 00:51:15 +00:00
Antonio Sanchez
9cb8771e9c Fix tensor casts for large packets and casts to/from std::complex
The original tensor casts were only defined for
`SrcCoeffRatio`:`TgtCoeffRatio` 1:1, 1:2, 2:1, 4:1. Here we add the
missing 1:N and 8:1.

We also add casting `Eigen::half` to/from `std::complex<T>`, which
was missing to make it consistent with `Eigen:bfloat16`, and
generalize the overload to work for any complex type.

Tests were added to `basicstuff`, `packetmath`, and
`cxx11_tensor_casts` to test all cast configurations.
2020-06-30 18:53:55 +00:00
Antonio Sanchez
145e51516f Fix denormal check pre c++11.
`float_denorm_style` is an old-style `enum`, so the `denorm_present`
symbol only exists in the `std` namespace prior to c++11.
2020-06-30 17:28:30 +00:00
David Tellenbach
689b57070d Report custom C++ flags in CMake testing summary 2020-06-30 17:18:54 +00:00
Antonio Sanchez
7222f0b6b5 Fix packetmath_1 float tests for arm/aarch64.
Added missing `pmadd<Packet2f>` for NEON. This leads to significant
improvement in precision than previous `pmul+padd`, which was causing
the `pcos` tests to fail. Also added an approx test with
`std::sin`/`std::cos` since otherwise returning any `a^2+b^2=1` would
pass.

Modified `log(denorm)` tests.  Denorms are not always supported by all
systems (returns `::min`), are always flushed to zero on 32-bit arm,
and configurably flush to zero on sse/avx/aarch64. This leads to
inconsistent results across different systems (i.e. `-inf` vs `nan`).
Added a check for existence and exclude ARM.

Removed logistic exactness test, since scalar and vectorized versions
follow different code-paths due to differences in `pexp` and `pmadd`,
which result in slightly different values. For example, exactness always
fails on arm, aarch64, and altivec.
2020-06-24 14:03:35 -07:00
Antonio Sanchez
03ebdf6acb Added missing NEON pcasts, update packetmath tests.
The NEON `pcast` operators are all implemented and tested for existing
packets. This requires adding a `pcast(a,b,c,d,e,f,g,h)` for casting
between `int64_t` and `int8_t` in `GenericPacketMath.h`.

Removed incorrect `HasHalfPacket`  definition for NEON's
`Packet2l`/`Packet2ul`.

Adjustments were also made to the `packetmath` tests. These include
- minor bug fixes for cast tests (i.e. 4:1 casts, only casting for
  packets that are vectorizable)
- added 8:1 cast tests
- random number generation
  - original had uninteresting 0 to 0 casts for many casts between
    floating-point and integers, and exhibited signed overflow
    undefined behavior

Tested:
```
$ aarch64-linux-gnu-g++ -static -I./ '-DEIGEN_TEST_PART_ALL=1' test/packetmath.cpp -o packetmath
$ adb push packetmath /data/local/tmp/
$ adb shell "/data/local/tmp/packetmath"
```
2020-06-21 09:32:31 -07:00
Teng Lu
386d809bde Support BFloat16 in Eigen 2020-06-20 19:16:24 +00:00
Sebastien Boisvert
39cbd6578f Fix #1911: add benchmark for move semantics with fixed-size matrix
$ clang++ -O3 bench/bench_move_semantics.cpp -I. -std=c++11 \
        -o bench_move_semantics

$ ./bench_move_semantics
float copy semantics: 1755.97 ms
float move semantics: 55.063 ms
double copy semantics: 2457.65 ms
double move semantics: 55.034 ms
2020-06-11 23:43:25 +00:00
Antonio Sanchez
a7d2552af8 Remove HasCast and fix packetmath cast tests.
The use of the `packet_traits<>::HasCast` field is currently inconsistent with
`type_casting_traits<>`, and is unused apart from within
`test/packetmath.cpp`. In addition, those packetmath cast tests do not
currently reflect how casts are performed in practice: they ignore the
`SrcCoeffRatio` and `TgtCoeffRatio` fields, assuming a 1:1 ratio.

Here we remove the unsed `HasCast`, and modify the packet cast tests to
better reflect their usage.
2020-06-11 17:26:56 +00:00
Sebastien Boisvert
463ec86648 Fix #1757: remove the word 'suicide' 2020-06-11 00:56:54 +00:00
Rasmus Munk Larsen
c2ab36f47a Fix broken packetmath test for logistic on Arm. 2020-06-04 16:24:47 -07:00
Gael Guennebaud
029a76e115 Bug #1777: make the scalar and packet path consistent for the logistic function + respective unit test 2020-05-31 00:53:37 +02:00
Gael Guennebaud
ab615e4114 Save one extra temporary when assigning a sparse product to a row-major sparse matrix 2020-05-30 23:15:12 +02:00
David Tellenbach
5328cd62b3 Guard usage of decltype since it's a C++11 feature
This fixes https://gitlab.com/libeigen/eigen/-/issues/1897
2020-05-20 16:04:16 +02:00
Rasmus Munk Larsen
cc86a31e20 Add guard around specialization for bool, which is only currently implemented for SSE. 2020-05-19 16:21:56 -07:00
Everton Constantino
8a7f360ec3 - Vectorizing MMA packing.
- Optimizing MMA kernel.
- Adding PacketBlock store to blas_data_mapper.
2020-05-19 19:24:11 +00:00
Rasmus Munk Larsen
9b411757ab Add missing packet ops for bool, and make it pass the same packet op unit tests as other arithmetic types.
This change also contains a few minor cleanups:
  1. Remove packet op pnot, which is not needed for anything other than pcmp_le_or_nan,
     which can be done in other ways.
  2. Remove the "HasInsert" enum, which is no longer needed since we removed the
     corresponding packet ops.
  3. Add faster pselect op for Packet4i when SSE4.1 is supported.

Among other things, this makes the fast transposeInPlace() method available for Matrix<bool>.

Run on ************** (72 X 2994 MHz CPUs); 2020-05-09T10:51:02.372347913-07:00
CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB
Benchmark                        Time(ns)        CPU(ns)     Iterations
-----------------------------------------------------------------------
BM_TransposeInPlace<float>/4            9.77           9.77    71670320
BM_TransposeInPlace<float>/8           21.9           21.9     31929525
BM_TransposeInPlace<float>/16          66.6           66.6     10000000
BM_TransposeInPlace<float>/32         243            243        2879561
BM_TransposeInPlace<float>/59         844            844         829767
BM_TransposeInPlace<float>/64         933            933         750567
BM_TransposeInPlace<float>/128       3944           3945         177405
BM_TransposeInPlace<float>/256      16853          16853          41457
BM_TransposeInPlace<float>/512     204952         204968           3448
BM_TransposeInPlace<float>/1k     1053889        1053861            664
BM_TransposeInPlace<bool>/4            14.4           14.4     48637301
BM_TransposeInPlace<bool>/8            36.0           36.0     19370222
BM_TransposeInPlace<bool>/16           31.5           31.5     22178902
BM_TransposeInPlace<bool>/32          111            111        6272048
BM_TransposeInPlace<bool>/59          626            626        1000000
BM_TransposeInPlace<bool>/64          428            428        1632689
BM_TransposeInPlace<bool>/128        1677           1677         417377
BM_TransposeInPlace<bool>/256        7126           7126          96264
BM_TransposeInPlace<bool>/512       29021          29024          24165
BM_TransposeInPlace<bool>/1k       116321         116330           6068
2020-05-14 22:39:13 +00:00
Felipe Attanasio
d640276d31 Added support for reverse iterators for Vectorwise operations. 2020-05-14 22:38:20 +00:00
Christopher Moore
fa8fd4b4d5 Indexed view should have RowMajorBit when there is staticly a single row 2020-05-14 22:11:19 +00:00
Christopher Moore
a187ffea28 Resolve "IndexedView of a vector should allow linear access" 2020-05-13 19:24:42 +00:00
Rasmus Munk Larsen
c1d944dd91 Remove packet ops pinsertfirst and pinsertlast that are only used in a single place, and can be replaced by other ops when constructing the first/final packet in linspaced_op_impl::packetOp.
I cannot measure any performance changes for SSE, AVX, or AVX512.

name                                 old time/op             new time/op             delta
BM_LinSpace<float>/1                 1.63ns ± 0%             1.63ns ± 0%   ~             (p=0.762 n=5+5)
BM_LinSpace<float>/8                 4.92ns ± 3%             4.89ns ± 3%   ~             (p=0.421 n=5+5)
BM_LinSpace<float>/64                34.6ns ± 0%             34.6ns ± 0%   ~             (p=0.841 n=5+5)
BM_LinSpace<float>/512                217ns ± 0%              217ns ± 0%   ~             (p=0.421 n=5+5)
BM_LinSpace<float>/4k                1.68µs ± 0%             1.68µs ± 0%   ~             (p=1.000 n=5+5)
BM_LinSpace<float>/32k               13.3µs ± 0%             13.3µs ± 0%   ~             (p=0.905 n=5+4)
BM_LinSpace<float>/256k               107µs ± 0%              107µs ± 0%   ~             (p=0.841 n=5+5)
BM_LinSpace<float>/1M                 427µs ± 0%              427µs ± 0%   ~             (p=0.690 n=5+5)
2020-05-08 15:41:50 -07:00
Rasmus Munk Larsen
225ab040e0 Remove unused packet op "palign".
Clean up a compiler warning in c++03 mode in AVX512/Complex.h.
2020-05-07 17:14:26 -07:00
Rasmus Munk Larsen
74ec8e6618 Make size odd for transposeInPlace test to make sure we hit the scalar path. 2020-05-07 17:29:56 +00:00
Rasmus Munk Larsen
ab773c7e91 Extend support for Packet16b:
* Add ptranspose<*,4> to support matmul and add unit test for Matrix<bool> * Matrix<bool>
* work around a bug in slicing of Tensor<bool>.
* Add tensor tests

This speeds up matmul for boolean matrices by about 10x

name                            old time/op             new time/op             delta
BM_MatMul<bool>/8                267ns ± 0%              479ns ± 0%  +79.25%          (p=0.008 n=5+5)
BM_MatMul<bool>/32              6.42µs ± 0%             0.87µs ± 0%  -86.50%          (p=0.008 n=5+5)
BM_MatMul<bool>/64              43.3µs ± 0%              5.9µs ± 0%  -86.42%          (p=0.008 n=5+5)
BM_MatMul<bool>/128              315µs ± 0%               44µs ± 0%  -85.98%          (p=0.008 n=5+5)
BM_MatMul<bool>/256             2.41ms ± 0%             0.34ms ± 0%  -85.68%          (p=0.008 n=5+5)
BM_MatMul<bool>/512             18.8ms ± 0%              2.7ms ± 0%  -85.53%          (p=0.008 n=5+5)
BM_MatMul<bool>/1k               149ms ± 0%               22ms ± 0%  -85.40%          (p=0.008 n=5+5)
2020-04-28 16:12:47 +00:00
Rasmus Munk Larsen
b47c777993 Block transposeInPlace() when the matrix is real and square. This yields a large speedup because we transpose in registers (or L1 if we spill), instead of one packet at a time, which in the worst case makes the code write to the same cache line PacketSize times instead of once.
rmlarsen@rmlarsen4:.../eigen_bench/google3$ benchy --benchmarks=.*TransposeInPlace.*float.* --reference=srcfs experimental/users/rmlarsen/bench:matmul_bench
 10 / 10 [====================================================================================================================================================================================================================] 100.00% 2m50s
(Generated by http://go/benchy. Settings: --runs 5 --benchtime 1s --reference "srcfs" --benchmarks ".*TransposeInPlace.*float.*" experimental/users/rmlarsen/bench:matmul_bench)

name                                       old time/op             new time/op             delta
BM_TransposeInPlace<float>/4               9.84ns ± 0%             6.51ns ± 0%  -33.80%          (p=0.008 n=5+5)
BM_TransposeInPlace<float>/8               23.6ns ± 1%             17.6ns ± 0%  -25.26%          (p=0.016 n=5+4)
BM_TransposeInPlace<float>/16              78.8ns ± 0%             60.3ns ± 0%  -23.50%          (p=0.029 n=4+4)
BM_TransposeInPlace<float>/32               302ns ± 0%              229ns ± 0%  -24.40%          (p=0.008 n=5+5)
BM_TransposeInPlace<float>/59              1.03µs ± 0%             0.84µs ± 1%  -17.87%          (p=0.016 n=5+4)
BM_TransposeInPlace<float>/64              1.20µs ± 0%             0.89µs ± 1%  -25.81%          (p=0.008 n=5+5)
BM_TransposeInPlace<float>/128             8.96µs ± 0%             3.82µs ± 2%  -57.33%          (p=0.008 n=5+5)
BM_TransposeInPlace<float>/256              152µs ± 3%               17µs ± 2%  -89.06%          (p=0.008 n=5+5)
BM_TransposeInPlace<float>/512              837µs ± 1%              208µs ± 0%  -75.15%          (p=0.008 n=5+5)
BM_TransposeInPlace<float>/1k              4.28ms ± 2%             1.08ms ± 2%  -74.72%          (p=0.008 n=5+5)
2020-04-28 16:08:16 +00:00
Rasmus Munk Larsen
e80ec24357 Remove unused packet op "preduxp". 2020-04-23 18:17:14 +00:00
Rasmus Munk Larsen
2f6ddaa25c Add partial vectorization for matrices and tensors of bool. This speeds up boolean operations on Tensors by up to 25x.
Benchmark numbers for the logical and of two NxN tensors:

name                                               old time/op             new time/op             delta
BM_booleanAnd_1T/3   [using 1 threads]             14.6ns ± 0%             14.4ns ± 0%   -0.96%
BM_booleanAnd_1T/4   [using 1 threads]             20.5ns ±12%              9.0ns ± 0%  -56.07%
BM_booleanAnd_1T/7   [using 1 threads]             41.7ns ± 0%             10.5ns ± 0%  -74.87%
BM_booleanAnd_1T/8   [using 1 threads]             52.1ns ± 0%             10.1ns ± 0%  -80.59%
BM_booleanAnd_1T/10  [using 1 threads]             76.3ns ± 0%             13.8ns ± 0%  -81.87%
BM_booleanAnd_1T/15  [using 1 threads]              167ns ± 0%               16ns ± 0%  -90.45%
BM_booleanAnd_1T/16  [using 1 threads]              188ns ± 0%               16ns ± 0%  -91.57%
BM_booleanAnd_1T/31  [using 1 threads]              667ns ± 0%               34ns ± 0%  -94.83%
BM_booleanAnd_1T/32  [using 1 threads]              710ns ± 0%               35ns ± 0%  -95.01%
BM_booleanAnd_1T/64  [using 1 threads]             2.80µs ± 0%             0.11µs ± 0%  -95.93%
BM_booleanAnd_1T/128 [using 1 threads]             11.2µs ± 0%              0.4µs ± 0%  -96.11%
BM_booleanAnd_1T/256 [using 1 threads]             44.6µs ± 0%              2.5µs ± 0%  -94.31%
BM_booleanAnd_1T/512 [using 1 threads]              178µs ± 0%               10µs ± 0%  -94.35%
BM_booleanAnd_1T/1k  [using 1 threads]              717µs ± 0%               78µs ± 1%  -89.07%
BM_booleanAnd_1T/2k  [using 1 threads]             2.87ms ± 0%             0.31ms ± 1%  -89.08%
BM_booleanAnd_1T/4k  [using 1 threads]             11.7ms ± 0%              1.9ms ± 4%  -83.55%
BM_booleanAnd_1T/10k [using 1 threads]             70.3ms ± 0%             17.2ms ± 4%  -75.48%
2020-04-20 20:16:28 +00:00
Christoph Hertzberg
d46d726e9d CommaInitializer wrongfully asserted for 0-sized blocks
commainitialier unit-test never actually called `test_block_recursion`, which also was not correctly implemented and would have caused too deep template recursion.
2020-04-13 16:41:20 +02:00
Antonio Sanchez
8e875719b3 Replace norm() with squaredNorm() to address integer overflows
For random matrices with integer coefficients, many of the tests here lead to
integer overflows. When taking the norm() of a row/column, the squaredNorm()
often overflows to a negative value, leading to domain errors when taking the
sqrt(). This leads to a crash on some systems. By replacing the norm() call by
a squaredNorm(), the values still overflow, but at least there is no domain
error.

Addresses https://gitlab.com/libeigen/eigen/-/issues/1856
2020-04-07 19:48:28 +00:00
Rasmus Munk Larsen
4fd5d1477b Fix packetmath test build for AVX. 2020-03-27 17:05:39 +00:00
Rasmus Munk Larsen
55c8fe8d0f Fix bug in 52d54278be 2020-03-27 16:41:15 +00:00
Joel Holdsworth
52d54278be Additional NEON packet-math operations 2020-03-26 20:18:19 +00:00
Aaron Franke
5c22c7a7de Make file formatting comply with POSIX and Unix standards
UTF-8, LF, no BOM, and newlines at the end of files
2020-03-23 18:09:02 +00:00
Joel Holdsworth
d5c665742b Add absolute_difference coefficient-wise binary Array function 2020-03-19 17:45:20 +00:00
Joel Holdsworth
54aa8fa186 Implement integer square-root for NEON 2020-03-19 17:05:13 +00:00
Joel Holdsworth
88337acae2 test/packetmath: Add tests for all integer types 2020-03-10 22:46:19 +00:00
Joel Holdsworth
9e68977578 test/packetmath: Made negate non-mandatory 2020-03-10 22:46:19 +00:00
Rasmus Munk Larsen
6ac37768a9 Revert "add some static checks for packet-picking logic"
This reverts commit 7769600245
2020-02-25 01:07:04 +00:00
Rasmus Munk Larsen
87cfa4862f Revert "Disable test in test/vectorization_logic.cpp, which is currently failing with AVX."
This reverts commit b625adffd8
2020-02-25 01:04:56 +00:00
Rasmus Munk Larsen
b625adffd8 Disable test in test/vectorization_logic.cpp, which is currently failing with AVX. 2020-02-24 23:28:25 +00:00
Francesco Mazzoli
7769600245 add some static checks for packet-picking logic 2020-02-07 18:16:16 +01:00
Christoph Hertzberg
1d0c45122a Removing executable bit from file mode 2020-01-11 15:02:29 +01:00
Christoph Hertzberg
35219cea68 Bug #1790: Make areApprox check numext::isnan instead of bitwise equality (NaNs don't have to be bitwise equal). 2020-01-11 14:57:22 +01:00
Srinivas Vasudevan
2e099e8d8f Added special_packetmath test and tweaked bounds on tests.
Refactor shared packetmath code to header file.
(Squashed from PR !38)
2020-01-11 10:31:21 +00:00
Christoph Hertzberg
8333e03590 Use data.data() instead of &data (since it is not obvious that Array is trivially copyable) 2020-01-09 11:38:19 +01:00