Commit Graph

3098 Commits

Author SHA1 Message Date
Mehdi Goli
b523120687 [SYCL-2020 Support] Enabling Intel DPCPP Compiler support to Eigen 2023-01-16 07:04:08 +00:00
Charles Schlosser
fa0bd2c34e improve sparse permutations 2023-01-15 03:21:25 +00:00
Antonio Sánchez
262194f12c Fix a bunch of minor build and test issues. 2023-01-06 16:37:26 +00:00
Antonio Sánchez
551eebc8ca Add synchronize method to all devices. 2022-11-29 19:35:02 +00:00
Chris
6728683938 Small cleanup of IDRS.h 2022-11-16 13:51:23 +00:00
Antonio Sánchez
e5794873cb Replace assert with eigen_assert. 2022-10-04 17:11:23 +00:00
Rasmus Munk Larsen
3c4637640b Remove unused typedef. 2022-09-23 19:11:31 +00:00
Chao Chen
5ffe7b92e0 [ROCm] fixed gpuGetDevice unused message 2022-09-20 21:38:20 +00:00
chuckyschluz
8acbf5c11c re-enable pow for complex types 2022-08-26 17:29:02 -04:00
Charles Schlosser
76a669fb45 add fixed power unary operation 2022-08-16 21:32:36 +00:00
Romain Biessy
2f7cce2dd5 [SYCL] Fix some SYCL tests 2022-08-16 17:37:54 +00:00
Antonio Sánchez
b8e93bf589 Eliminate bool bitwise warnings. 2022-08-09 22:42:30 +00:00
Julian Kent
69714ff613 Add Sparse Subset of Matrix Inverse 2022-07-28 18:04:35 +00:00
Antonio Sánchez
e1165dbf9a AutoDiff depends on Core, so include appropriate header. 2022-07-09 23:57:09 +00:00
Antonio Sánchez
bb51d9f4fa Fix ODR violations. 2022-07-09 04:56:36 +00:00
Antonio Sanchez
0e18714167 Fix clang-tidy warnings about function definitions in headers. 2022-06-24 15:10:58 +00:00
Antonio Sánchez
8ed3b9dcd6 Skip f16/bf16 bessel specializations on AVX512 if unavailable. 2022-06-24 15:10:36 +00:00
Antonio Sánchez
8c2e0e3cb8 Fix ambiguous comparisons for c++20 (again again) 2022-06-07 17:06:17 +00:00
Antonio Sánchez
76cf6204f3 Revert "Fix c++20 ambiguity of comparisons."
This reverts commit 4f6354128f
2022-06-04 02:32:10 +00:00
Antonio Sánchez
4f6354128f Fix c++20 ambiguity of comparisons. 2022-06-03 05:11:07 +00:00
Oleg Shirokobrod
f542b0a71f Adding an MKL adapter in FFT module. 2022-06-02 18:10:43 +00:00
Mario Rincon-Nigro
e99163e732 fix: issue 2481: LDLT produce wrong results with AutoDiffScalar 2022-05-25 15:26:10 +00:00
Antonio Sánchez
477eb7f630 Revert "Avoid ambiguous Tensor comparison operators for C++20 compatibility"
This reverts commit 5c2179b6c3
2022-05-24 16:09:59 +00:00
Mehdi Goli
c5a5ac680c [SYCL] SYCL-2020 range does not have default constructor. 2022-05-24 03:11:46 +00:00
Benjamin Kramer
5c2179b6c3 Avoid ambiguous Tensor comparison operators for C++20 compatibility 2022-05-23 17:36:03 +00:00
Chip Kerchner
aa8b7e2c37 Add subMappers to Power GEMM packing - simplifies the address calculations (10% faster) 2022-05-23 15:18:29 +00:00
Mehdi Goli
cbe03f3531 [SYCL] Extending SYCL queue interface extension. 2022-05-23 14:45:27 +00:00
Eisuke Kawashima
ac5c83a3f5
unset executable flag 2022-05-22 22:47:43 +09:00
Tobias Wood
a9868bd5be Add arg() to tensor 2022-05-20 03:33:01 +00:00
Antonio Sánchez
9b9496ad98 Revert "Add AVX512 optimizations for matrix multiply"
This reverts commit 25db0b4a82
2022-05-13 18:50:33 +00:00
aaraujom
25db0b4a82 Add AVX512 optimizations for matrix multiply 2022-05-12 23:41:19 +00:00
Guoqiang QI
00b75375e7 Adding PocketFFT support in FFT module since kissfft has some flaw in accuracy and performance 2022-05-11 17:44:22 +00:00
Rasmus Munk Larsen
73d65dbc43 Update README.md. Remove obsolete comment about RowMajor not being fully supported. 2022-05-06 18:19:35 +00:00
Antonio Sánchez
f7b31f864c Revert "Replace call to FixedDimensions() with a singleton instance of"
This reverts commit 19e6496ce0
2022-04-10 15:30:33 +00:00
Tobias Schlüter
f3ba220c5d Remove EIGEN_EMPTY_STRUCT_CTOR 2022-04-08 18:27:26 +00:00
Antonio Sánchez
5ed7a86ae9 Fix MSVC+CUDA issues. 2022-04-08 18:05:32 +00:00
Erik Schultheis
e1df3636b2 More constexpr helpers 2022-04-04 18:38:34 +00:00
Erik Schultheis
64909b82bd static const class members turned into constexpr 2022-04-04 17:33:33 +00:00
Antonio Sanchez
9bc9992dd3 Eliminate trace unused warning. 2022-03-29 22:04:50 +00:00
Erik Schultheis
b9d2900e8f added a missing typename and fixed a unused typedef warning 2022-03-24 12:07:18 +02:00
Essex Edwards
cd3c81c3bc Add a NNLS solver to unsupported - issue #655 2022-03-23 20:20:44 +00:00
Romain Biessy
f2a3e03e9b Fix usages of wrong namespace 2022-03-21 15:07:53 +00:00
Erik Schultheis
421cbf0866 Replace Eigen type metaprogramming with corresponding std types and make use of alias templates 2022-03-16 16:43:40 +00:00
Antonio Sánchez
9296bb4b93 Fix edge-case in zeta for large inputs. 2022-03-08 21:21:20 +00:00
Antonio Sánchez
008ff3483a Fix broken tensor executor test, allow tensor packets of size 1. 2022-03-07 20:30:37 +00:00
Antonio Sánchez
d819a33bf6 Remove poor non-convergence checks in NonLinearOptimization. 2022-03-02 19:31:20 +00:00
Antonio Sanchez
1c2690ed24 Adjust tolerance of matrix_power test for MSVC. 2022-03-01 23:33:05 +00:00
Antonio Sánchez
ae86a146b1 Modify test expression to avoid numerical differences (#2402). 2022-02-23 16:37:03 +00:00
Romain Biessy
2dd879d4b0 [SYCL] Fix CMake for SYCL support 2022-02-22 16:53:27 +00:00
Antonio Sanchez
bded5028a5 Fix ODR failures in TensorRandom. 2022-02-11 23:28:33 -08:00
Rasmus Munk Larsen
18eab8f997 Add convenience method constexpr std::size_t size() const to Eigen::IndexList 2022-02-12 04:23:03 +00:00
Antonio Sánchez
9441d94dcc Revert "Make fixed-size Matrix and Array trivially copyable after C++20"
This reverts commit 47eac21072
2022-02-05 04:40:29 +00:00
Antonio Sánchez
cafeadffef Fix ODR violations. 2022-02-04 19:01:07 +00:00
Rasmus Munk Larsen
ea2c02060c Add reciprocal packet op and fast specializations for float with SSE, AVX, and AVX512. 2022-01-21 23:49:18 +00:00
Erik Schultheis
970640519b Cleanup 2022-01-21 01:48:59 +00:00
Kolja Brix
8d81a2339c Reduce usage of reserved names 2022-01-10 20:53:29 +00:00
Matthias Möller
c9df98b071 Fix Gcc8.5 warning about missing base class initialisation (#2404) 2022-01-07 19:16:53 +00:00
Lingzhu Xiang
47eac21072 Make fixed-size Matrix and Array trivially copyable after C++20
Making them trivially copyable allows using std::memcpy() without undefined
behaviors.

Only Matrix and Array with trivially copyable DenseStorage are marked as
trivially copyable with an additional type trait.

As described in http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p0848r3.html
it requires extremely verbose SFINAE to make the special member functions of
fixed-size Matrix and Array trivial, unless C++20 concepts are available to
simplify the selection of trivial special member functions given template
parameters. Therefore only make this feature available to compilers that support
C++20 P0848R3.

Fix #1855.
2022-01-07 19:04:35 +00:00
Erik Schultheis
c20e908ebc turn some macros intro constexpr functions 2021-12-10 19:27:01 +00:00
Erik Schultheis
c35679af27 fixed customIndices2Array forgetting first index 2021-12-10 16:41:59 +00:00
Erik Schultheis
e4c40b092a disambiguate overloads for empty index list 2021-12-07 19:40:09 +00:00
Jens Wehner
c6fa0ca162 Idrsstabl 2021-12-06 20:00:00 +00:00
Erik Schultheis
cc11e240ac Some further cleanup 2021-12-06 18:01:15 +00:00
Erik Schultheis
cd83f34d3a fix typo StableNorm -> stableNorm 2021-12-04 14:52:09 +00:00
Jens Wehner
4ee2e9b340 Idrs refactoring 2021-12-02 23:32:07 +00:00
Jens Wehner
f63c6dd1f9 Bicgstabl 2021-12-02 22:48:22 +00:00
Erik Schultheis
2f65ec5302 fixed leftover else branch 2021-12-02 18:13:19 +00:00
Xinle Liu
7ef5f0641f Remove macro EIGEN_GPU_TEST_C99_MATH
Remove macro EIGEN_GPU_TEST_C99_MATH which is used in a single test file only and always defaults to true.
2021-12-01 14:48:56 +00:00
Erik Schultheis
ec2fd0f7ed Require recent GCC and MSCV and removed EIGEN_HAS_CXX14 and some other feature test macros 2021-12-01 00:48:34 +00:00
Erik Schultheis
4a76880351 Updated CMake
This patch updates the minimum required CMake version to 3.10 and removes the EIGEN_TEST_CXX11 CMake option, including corresponding logic.
2021-11-29 20:24:20 +00:00
Erik Schultheis
f33a31b823 removed EIGEN_HAS_CXX11_* and redundant EIGEN_COMP_CXXVER checks 2021-11-29 19:18:57 +00:00
David Tellenbach
08da52eb85 Remove DenseBase::nonZeros() which just calls DenseBase::size()
Fixes #2382.
2021-11-27 14:31:00 +00:00
Erik Schultheis
ec4efbd696 remove EIGEN_HAS_CXX11 2021-11-24 20:08:49 +00:00
Rasmus Munk Larsen
cfdb3ce3f0 Fix warnings about shadowing definitions. 2021-11-23 14:34:47 -08:00
Rasmus Munk Larsen
5e89573e2a Implement Eigen::array<...>::reverse_iterator if std::reverse_iterator exists. 2021-11-20 00:22:46 +00:00
Rasmus Munk Larsen
11cb7b8372 Add basic iterator support for Eigen::array to ease transition to std::array in third-party libraries. 2021-11-19 05:14:30 +00:00
Antonio Sanchez
c107bd6102 Fix errors for windows build. 2021-11-19 04:23:25 +00:00
Rasmus Munk Larsen
96aeffb013 Make the new TensorIO implementation work with TensorMap with const elements. 2021-11-17 18:16:04 -08:00
Rasmus Munk Larsen
824d06eb36 Include <numeric> to get std::iota. 2021-11-18 00:47:18 +00:00
Antonio Sanchez
ffb78e23a1 Fix tensor broadcast off-by-one error.
Caught by JAX unit tests.  Triggered if broadcast is smaller than packet
size.
2021-11-16 17:37:38 +00:00
cpp977
f73c95c032 Reimplemented the Tensor stream output. 2021-11-16 17:36:58 +00:00
Ben Barsdell
50df8d3d6d Avoid integer overflow in EigenMetaKernel indexing
- The current implementation computes `size + total_threads`, which can
  overflow and cause CUDA_ERROR_ILLEGAL_ADDRESS when size is close to
  the maximum representable value.
- The num_blocks calculation can also overflow due to the implementation
  of divup().
- This patch prevents these overflows and allows the kernel to work
  correctly for the full representable range of tensor sizes.
- Also adds relevant tests.
2021-11-05 16:39:37 +11:00
Rasmus Munk Larsen
55e3ae02ac Compare summation results against forward error bound. 2021-11-04 18:04:04 -07:00
Antonio Sanchez
8f8c2ba2fe Remove bad "take" impl that causes g++-11 crash.
For some reason, having `take<n, numeric_list<T>>` for `n > 0` causes
g++-11 to ICE with
```
sorry, unimplemented: unexpected AST of kind nontype_argument_pack
```
It does work with other versions of gcc, and with clang.
I filed a GCC bug
[here](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102999).

Technically we should never actually run into this case, since you
can't take n > 0 elements from an empty list.  Commenting it out
allows our Eigen tests to pass.
2021-11-01 17:04:41 +00:00
Antonio Sanchez
f6c8cc0e99 Fix TensorReduction warnings and error bound for sum accuracy test.
The sum accuracy test currently uses the default test precision for
the given scalar type.  However, scalars are generated via a normal
distribution, and given a large enough count and strong enough random
generator, the expected sum is zero.  This causes the test to
periodically fail.

Here we estimate an upper-bound for the error as `sqrt(N) * prec` for
summing N values, with each having an approximate epsilon of `prec`.

Also fixed a few warnings generated by MSVC when compiling the
reduction test.
2021-10-30 14:59:00 -07:00
Rasmus Munk Larsen
b3bea43a2d Don't use unrolled loops for stateful reducers. The problem is the combination step, e.g.
reducer0.reducePacket(accum1, accum0);
reducer0.reducePacket(accum2, accum0);
reducer0.reducePacket(accum3, accum0);

For the mean reducer this will increment the count as well as adding together the accumulators and result in the wrong count being divided into the sum at the end.
2021-10-28 23:52:54 +00:00
Fabian Keßler
19cacd3ecb optimize cmake scripts for subproject use 2021-10-28 16:08:02 +02:00
Rohit Santhanam
48e40b22bf Preliminary HIP bfloat16 GPU support. 2021-10-27 18:36:45 +00:00
Antonio Sánchez
185ad0e610 Revert "Avoid integer overflow in EigenMetaKernel indexing"
This reverts commit 100d7caf92
2021-10-27 14:55:25 +00:00
Ben Barsdell
100d7caf92 Avoid integer overflow in EigenMetaKernel indexing
- The current implementation computes `size + total_threads`, which can
  overflow and cause CUDA_ERROR_ILLEGAL_ADDRESS when size is close to
  the maximum representable value.
- The num_blocks calculation can also overflow due to the implementation
  of divup().
- This patch prevents these overflows and allows the kernel to work
  correctly for the full representable range of tensor sizes.
- Also adds relevant tests.
2021-10-26 00:04:28 +00:00
Antonio Sanchez
a500da1dc0 Fix broadcasting oob error.
For vectorized 1-dimensional inputs that do not take the special
blocking path (e.g. `std::complex<...>`), there was an
index-out-of-bounds error causing the broadcast size to be
computed incorrectly.  Here we fix this, and make other minor
cleanup changes.

Fixes #2351.
2021-10-25 19:31:12 +00:00
Nico
b17bcddbca Fix -Wbitwise-instead-of-logical clang warning
& and | short-circuit, && and || don't. When both arguments to those
are boolean, the short-circuiting version is usually the desired one, so
clang warns on this.

Here, it is inconsequential, so switch to && and || to suppress the warning.
2021-10-21 23:32:45 -04:00
Antonio Sanchez
24ebb37f38 Disable Tree reduction for GPU.
For moderately sized inputs, running the Tree reduction quickly
fills/overflows the GPU thread stack space, leading to memory errors.
This was happening in the `cxx11_tensor_complex_gpu` test, for example.
Disabling tree reduction on GPU fixes this.
2021-10-20 20:42:37 +00:00
Rasmus Munk Larsen
360290fc42 Improve accuracy of full tensor reduction for half and bfloat16 by reducing leaf size in tree reduction.
Add more unit tests for summation accuracy.
2021-10-20 19:54:06 +00:00
Antonio Sanchez
d0d34524a1 Move CUDA/Complex.h to GPU/Complex.h, remove TensorReductionCuda.h
The `Complex.h` file applies equally to HIP/CUDA, so placing under the
generic `GPU` folder.

The `TensorReductionCuda.h` has already been deprecated, now removing
for the next Eigen version.
2021-10-20 12:00:19 -07:00
Rasmus Munk Larsen
1d75fab368 Speed up tensor reduction 2021-10-02 14:58:23 +00:00
Antonio Sanchez
be9e7d205f Reduce tensor_contract_gpu test.
The original test times out after 60 minutes on Windows, even when
setting flags to optimize for speed.  Reducing the number of
contractions performed from 3600->27 for subtests 8,9 allow the
two to run in just over a minute each.
2021-10-02 04:36:15 +00:00
Antonio Sanchez
701f5d1c91 Fix gpu special function tests.
Some checks used incorrect values, partly from copy-paste errors,
partly from the change in behaviour introduced in !398.

Modified results to match scipy, simplified tests by updating
`VERIFY_IS_CWISE_APPROX` to work for scalars.
2021-10-01 10:20:50 -07:00
Antonio Sanchez
de218b471d Add -arch=<arch> argument for nvcc.
Without this flag, when compiling with nvcc, if the compute architecture of a card does
not exactly match any of those listed for `-gencode arch=compute_<arch>,code=sm_<arch>`,
then the kernel will fail to run with:
```
cudaErrorNoKernelImageForDevice: no kernel image is available for execution on the device.
```
This can happen, for example, when compiling with an older cuda version
that does not support a newer architecture (e.g. T4 is `sm_75`, but cuda
9.2 only supports up to `sm_70`).

With the `-arch=<arch>` flag, the code will compile and run at the
supplied architecture.
2021-09-24 20:48:01 -07:00
Antonio Sanchez
846d34384a Rename EIGEN_CUDA_FLAGS to EIGEN_CUDA_CXX_FLAGS
Also add a missing space for clang.
2021-09-24 20:15:55 -07:00