Commit Graph

3097 Commits

Author SHA1 Message Date
Antonio Sánchez
9441d94dcc Revert "Make fixed-size Matrix and Array trivially copyable after C++20"
This reverts commit 47eac21072
2022-02-05 04:40:29 +00:00
Antonio Sánchez
cafeadffef Fix ODR violations. 2022-02-04 19:01:07 +00:00
Rasmus Munk Larsen
ea2c02060c Add reciprocal packet op and fast specializations for float with SSE, AVX, and AVX512. 2022-01-21 23:49:18 +00:00
Erik Schultheis
970640519b Cleanup 2022-01-21 01:48:59 +00:00
Kolja Brix
8d81a2339c Reduce usage of reserved names 2022-01-10 20:53:29 +00:00
Matthias Möller
c9df98b071 Fix Gcc8.5 warning about missing base class initialisation (#2404) 2022-01-07 19:16:53 +00:00
Lingzhu Xiang
47eac21072 Make fixed-size Matrix and Array trivially copyable after C++20
Making them trivially copyable allows using std::memcpy() without undefined
behaviors.

Only Matrix and Array with trivially copyable DenseStorage are marked as
trivially copyable with an additional type trait.

As described in http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p0848r3.html
it requires extremely verbose SFINAE to make the special member functions of
fixed-size Matrix and Array trivial, unless C++20 concepts are available to
simplify the selection of trivial special member functions given template
parameters. Therefore only make this feature available to compilers that support
C++20 P0848R3.

Fix #1855.
2022-01-07 19:04:35 +00:00
Erik Schultheis
c20e908ebc turn some macros intro constexpr functions 2021-12-10 19:27:01 +00:00
Erik Schultheis
c35679af27 fixed customIndices2Array forgetting first index 2021-12-10 16:41:59 +00:00
Erik Schultheis
e4c40b092a disambiguate overloads for empty index list 2021-12-07 19:40:09 +00:00
Jens Wehner
c6fa0ca162 Idrsstabl 2021-12-06 20:00:00 +00:00
Erik Schultheis
cc11e240ac Some further cleanup 2021-12-06 18:01:15 +00:00
Erik Schultheis
cd83f34d3a fix typo StableNorm -> stableNorm 2021-12-04 14:52:09 +00:00
Jens Wehner
4ee2e9b340 Idrs refactoring 2021-12-02 23:32:07 +00:00
Jens Wehner
f63c6dd1f9 Bicgstabl 2021-12-02 22:48:22 +00:00
Erik Schultheis
2f65ec5302 fixed leftover else branch 2021-12-02 18:13:19 +00:00
Xinle Liu
7ef5f0641f Remove macro EIGEN_GPU_TEST_C99_MATH
Remove macro EIGEN_GPU_TEST_C99_MATH which is used in a single test file only and always defaults to true.
2021-12-01 14:48:56 +00:00
Erik Schultheis
ec2fd0f7ed Require recent GCC and MSCV and removed EIGEN_HAS_CXX14 and some other feature test macros 2021-12-01 00:48:34 +00:00
Erik Schultheis
4a76880351 Updated CMake
This patch updates the minimum required CMake version to 3.10 and removes the EIGEN_TEST_CXX11 CMake option, including corresponding logic.
2021-11-29 20:24:20 +00:00
Erik Schultheis
f33a31b823 removed EIGEN_HAS_CXX11_* and redundant EIGEN_COMP_CXXVER checks 2021-11-29 19:18:57 +00:00
David Tellenbach
08da52eb85 Remove DenseBase::nonZeros() which just calls DenseBase::size()
Fixes #2382.
2021-11-27 14:31:00 +00:00
Erik Schultheis
ec4efbd696 remove EIGEN_HAS_CXX11 2021-11-24 20:08:49 +00:00
Rasmus Munk Larsen
cfdb3ce3f0 Fix warnings about shadowing definitions. 2021-11-23 14:34:47 -08:00
Rasmus Munk Larsen
5e89573e2a Implement Eigen::array<...>::reverse_iterator if std::reverse_iterator exists. 2021-11-20 00:22:46 +00:00
Rasmus Munk Larsen
11cb7b8372 Add basic iterator support for Eigen::array to ease transition to std::array in third-party libraries. 2021-11-19 05:14:30 +00:00
Antonio Sanchez
c107bd6102 Fix errors for windows build. 2021-11-19 04:23:25 +00:00
Rasmus Munk Larsen
96aeffb013 Make the new TensorIO implementation work with TensorMap with const elements. 2021-11-17 18:16:04 -08:00
Rasmus Munk Larsen
824d06eb36 Include <numeric> to get std::iota. 2021-11-18 00:47:18 +00:00
Antonio Sanchez
ffb78e23a1 Fix tensor broadcast off-by-one error.
Caught by JAX unit tests.  Triggered if broadcast is smaller than packet
size.
2021-11-16 17:37:38 +00:00
cpp977
f73c95c032 Reimplemented the Tensor stream output. 2021-11-16 17:36:58 +00:00
Ben Barsdell
50df8d3d6d Avoid integer overflow in EigenMetaKernel indexing
- The current implementation computes `size + total_threads`, which can
  overflow and cause CUDA_ERROR_ILLEGAL_ADDRESS when size is close to
  the maximum representable value.
- The num_blocks calculation can also overflow due to the implementation
  of divup().
- This patch prevents these overflows and allows the kernel to work
  correctly for the full representable range of tensor sizes.
- Also adds relevant tests.
2021-11-05 16:39:37 +11:00
Rasmus Munk Larsen
55e3ae02ac Compare summation results against forward error bound. 2021-11-04 18:04:04 -07:00
Antonio Sanchez
8f8c2ba2fe Remove bad "take" impl that causes g++-11 crash.
For some reason, having `take<n, numeric_list<T>>` for `n > 0` causes
g++-11 to ICE with
```
sorry, unimplemented: unexpected AST of kind nontype_argument_pack
```
It does work with other versions of gcc, and with clang.
I filed a GCC bug
[here](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102999).

Technically we should never actually run into this case, since you
can't take n > 0 elements from an empty list.  Commenting it out
allows our Eigen tests to pass.
2021-11-01 17:04:41 +00:00
Antonio Sanchez
f6c8cc0e99 Fix TensorReduction warnings and error bound for sum accuracy test.
The sum accuracy test currently uses the default test precision for
the given scalar type.  However, scalars are generated via a normal
distribution, and given a large enough count and strong enough random
generator, the expected sum is zero.  This causes the test to
periodically fail.

Here we estimate an upper-bound for the error as `sqrt(N) * prec` for
summing N values, with each having an approximate epsilon of `prec`.

Also fixed a few warnings generated by MSVC when compiling the
reduction test.
2021-10-30 14:59:00 -07:00
Rasmus Munk Larsen
b3bea43a2d Don't use unrolled loops for stateful reducers. The problem is the combination step, e.g.
reducer0.reducePacket(accum1, accum0);
reducer0.reducePacket(accum2, accum0);
reducer0.reducePacket(accum3, accum0);

For the mean reducer this will increment the count as well as adding together the accumulators and result in the wrong count being divided into the sum at the end.
2021-10-28 23:52:54 +00:00
Fabian Keßler
19cacd3ecb optimize cmake scripts for subproject use 2021-10-28 16:08:02 +02:00
Rohit Santhanam
48e40b22bf Preliminary HIP bfloat16 GPU support. 2021-10-27 18:36:45 +00:00
Antonio Sánchez
185ad0e610 Revert "Avoid integer overflow in EigenMetaKernel indexing"
This reverts commit 100d7caf92
2021-10-27 14:55:25 +00:00
Ben Barsdell
100d7caf92 Avoid integer overflow in EigenMetaKernel indexing
- The current implementation computes `size + total_threads`, which can
  overflow and cause CUDA_ERROR_ILLEGAL_ADDRESS when size is close to
  the maximum representable value.
- The num_blocks calculation can also overflow due to the implementation
  of divup().
- This patch prevents these overflows and allows the kernel to work
  correctly for the full representable range of tensor sizes.
- Also adds relevant tests.
2021-10-26 00:04:28 +00:00
Antonio Sanchez
a500da1dc0 Fix broadcasting oob error.
For vectorized 1-dimensional inputs that do not take the special
blocking path (e.g. `std::complex<...>`), there was an
index-out-of-bounds error causing the broadcast size to be
computed incorrectly.  Here we fix this, and make other minor
cleanup changes.

Fixes #2351.
2021-10-25 19:31:12 +00:00
Nico
b17bcddbca Fix -Wbitwise-instead-of-logical clang warning
& and | short-circuit, && and || don't. When both arguments to those
are boolean, the short-circuiting version is usually the desired one, so
clang warns on this.

Here, it is inconsequential, so switch to && and || to suppress the warning.
2021-10-21 23:32:45 -04:00
Antonio Sanchez
24ebb37f38 Disable Tree reduction for GPU.
For moderately sized inputs, running the Tree reduction quickly
fills/overflows the GPU thread stack space, leading to memory errors.
This was happening in the `cxx11_tensor_complex_gpu` test, for example.
Disabling tree reduction on GPU fixes this.
2021-10-20 20:42:37 +00:00
Rasmus Munk Larsen
360290fc42 Improve accuracy of full tensor reduction for half and bfloat16 by reducing leaf size in tree reduction.
Add more unit tests for summation accuracy.
2021-10-20 19:54:06 +00:00
Antonio Sanchez
d0d34524a1 Move CUDA/Complex.h to GPU/Complex.h, remove TensorReductionCuda.h
The `Complex.h` file applies equally to HIP/CUDA, so placing under the
generic `GPU` folder.

The `TensorReductionCuda.h` has already been deprecated, now removing
for the next Eigen version.
2021-10-20 12:00:19 -07:00
Rasmus Munk Larsen
1d75fab368 Speed up tensor reduction 2021-10-02 14:58:23 +00:00
Antonio Sanchez
be9e7d205f Reduce tensor_contract_gpu test.
The original test times out after 60 minutes on Windows, even when
setting flags to optimize for speed.  Reducing the number of
contractions performed from 3600->27 for subtests 8,9 allow the
two to run in just over a minute each.
2021-10-02 04:36:15 +00:00
Antonio Sanchez
701f5d1c91 Fix gpu special function tests.
Some checks used incorrect values, partly from copy-paste errors,
partly from the change in behaviour introduced in !398.

Modified results to match scipy, simplified tests by updating
`VERIFY_IS_CWISE_APPROX` to work for scalars.
2021-10-01 10:20:50 -07:00
Antonio Sanchez
de218b471d Add -arch=<arch> argument for nvcc.
Without this flag, when compiling with nvcc, if the compute architecture of a card does
not exactly match any of those listed for `-gencode arch=compute_<arch>,code=sm_<arch>`,
then the kernel will fail to run with:
```
cudaErrorNoKernelImageForDevice: no kernel image is available for execution on the device.
```
This can happen, for example, when compiling with an older cuda version
that does not support a newer architecture (e.g. T4 is `sm_75`, but cuda
9.2 only supports up to `sm_70`).

With the `-arch=<arch>` flag, the code will compile and run at the
supplied architecture.
2021-09-24 20:48:01 -07:00
Antonio Sanchez
846d34384a Rename EIGEN_CUDA_FLAGS to EIGEN_CUDA_CXX_FLAGS
Also add a missing space for clang.
2021-09-24 20:15:55 -07:00
Antonio Sanchez
7b00e8b186 Clean up CUDA CMake files.
- Unify test/CMakeLists.txt and unsupported/test/CMakeLists.txt
- Added `EIGEN_CUDA_FLAGS` that are appended to the set of flags passed
to the cuda compiler (nvcc or clang).

The latter is to support passing custom flags (e.g. `-arch=` to nvcc,
or to disable cuda-specific warnings).
2021-09-24 14:43:59 -07:00
Kolja Brix
afa616bc9e Fix some typos found 2021-09-23 15:22:00 +00:00
sciencewhiz
4b6036e276 fix various typos 2021-09-22 16:15:06 +00:00
Alexander Karatarakis
4d622be118 [AutodiffScalar] Remove const when returning by value
clang-tidy: Return type 'const T' is 'const'-qualified at the top level,
which may reduce code readability without improving const correctness

The types are somewhat long, but the affected return types are of the form:
```
const T my_func() { /**/ }
```

Change to:
```
T my_func() { /**/ }
```
2021-09-18 21:23:32 +00:00
Rasmus Munk Larsen
6cadab6896 Clean up EIGEN_STATIC_ASSERT to only use standard c++11 static_assert. 2021-09-16 20:43:54 +00:00
Rasmus Munk Larsen
d7d0bf832d Issue an error in case of direct inclusion of internal headers. 2021-09-10 19:12:26 +00:00
Antonio Sanchez
6c10495a78 Remove unnecessary std::tuple reference. 2021-09-09 15:49:44 +00:00
Antonio Sanchez
eea2a3385c Remove more DynamicSparseMatrix references.
Also fixed some typos in SparseExtra/MarketIO.h.
2021-09-02 15:36:47 -07:00
Jens Wehner
8286073c73 Matrixmarket extension 2021-09-02 17:23:33 +00:00
Antonio Sanchez
74da2e6821 Rename Tuple -> Pair.
This is to make way for a new `Tuple` class that mimics `std::tuple`,
but can be reliably used on device and with aligned Eigen types.

The existing Tuple has very few references, and is actually an
analogue of `std::pair`.
2021-09-02 02:20:54 +00:00
Maxiwell S. Garcia
09fc0f97b5 Rename 'vec_all_nan' of cxx11_tensor_expr test because this symbol is used by altivec.h 2021-09-01 16:42:51 +00:00
jenswehner
a443a2373f updated documentation 2021-08-31 22:58:28 +00:00
Antonio Sanchez
cc3573ab44 Disable cuda Eigen::half vectorization on host.
All cuda `__half` functions are device-only in CUDA 9, including
conversions. Host-side conversions were added in CUDA 10.
The existing code doesn't build prior to 10.0.

All arithmetic functions are always device-only, so there's
therefore no reason to use vectorization on the host at all.

Modified the code to disable vectorization for `__half` on host,
which required also updating the `TensorReductionGpu` implementation
which previously made assumptions about available packets.
2021-08-31 19:13:12 +00:00
Turing Eret
3324389f6d Add EIGEN_TENSOR_PLUGIN support per issue #2052. 2021-08-30 19:36:55 +00:00
Jens Wehner
53ad9c75b4 included unordered_map header 2021-08-27 16:53:28 +00:00
jenswehner
9abf4d0bec made RandomSetter C++11 compatible 2021-08-25 20:24:55 +00:00
Antonio Sanchez
eeacbd26c8 Bump CMake files to at least c++11.
Removed all configurations that explicitly test or set the c++ standard
flags. The only place the standard is now configured is at the top of
the main `CMakeLists.txt` file, which can easily be updated (e.g. if
we decide to move to c++14+). This can also be set via command-line using
```
> cmake -DCMAKE_CXX_STANDARD 14
```

Kept the `EIGEN_TEST_CXX11` flag for now - that still controls whether to
build/run the `cxx11_*` tests.  We will likely end up renaming these
tests and removing the `CXX11` subfolder.
2021-08-25 20:07:48 +00:00
jenswehner
90b3b6b572 added doxygen flowchart 2021-08-24 17:11:51 +00:00
jenswehner
d85de1ef56 removed sparse dynamic matrix 2021-08-24 10:33:00 +02:00
jenswehner
e3e74001f7 added includes for unordered_map 2021-08-10 13:34:57 +02:00
Alexander Karatarakis
4ba872bd75 Avoid leading underscore followed by cap in template identifiers 2021-08-04 22:41:52 +00:00
Antonio Sanchez
31f796ebef Fix MPReal detection and support.
The latest version of `mpreal` has a bug that breaks `min`/`max`.
It also breaks with the latest dev version of `mpfr`. Here we
add `FindMPREAL.cmake` which searches for the library and tests if
compilation works.

Removed our internal copy of `mpreal.h` under `unsupported/test`, as
it is out-of-sync with the latest, and similarly breaks with
the latest `mpfr`.  It would be best to use the installed version
of `mpreal` anyways, since that's what we actually want to test.

Fixes #2282.
2021-08-03 17:55:03 +00:00
Antonio Sanchez
8cf6cb27ba Fix TriSycl CMake files.
This is to enable compiling with the latest trisycl. `FindTriSYCL.cmake` was
broken by commit 00f32752, which modified `add_sycl_to_target` for ComputeCPP.
This makes the corresponding modifications for trisycl to make them consistent.

Also, trisycl now requires c++17.
2021-08-03 16:44:29 +00:00
Alexander Karatarakis
f357283d31 _DerType -> DerivativeType as underscore-followed-by-caps is a reserved identifier 2021-07-29 18:02:04 +00:00
Antonio Sanchez
1fd5ce1002 For GpuDevice::fill, use a single memset if all bytes are equal.
The original `fill` implementation introduced a 5x regression on my
nvidia Quadro K1200.  @rohitsan reported up to 100x regression for
HIP.  This restores performance.
2021-07-10 13:37:16 +00:00
Antonio Sanchez
9c22795d65 Put attach/detach buffer back in for TensorDeviceSycl.
Also added a test to verify the original buffer is updated correctly.
2021-07-09 10:00:05 -07:00
Antonio Sanchez
1e6c6c1576 Replace memset with fill to work for non-trivial scalars.
For custom scalars, zero is not necessarily represented by
a zeroed-out memory block (e.g. gnu MPFR). We therefore
cannot rely on `memset` if we want to fill a matrix or tensor
with zeroes. Instead, we should rely on `fill`, which for trivial
types does end up getting converted to a `memset` under-the-hood
(at least with gcc/clang).

Requires adding a `fill(begin, end, v)` to `TensorDevice`.

Replaced all potentially bad instances of memset with fill.

Fixes #2245.
2021-07-08 18:34:41 +00:00
Jonas Harsch
e9c9a3130b Removed superfluous boolean degenerate in TensorMorphing.h. 2021-07-08 18:02:58 +00:00
Antonio Sanchez
f5a9873bbb Fix Tensor documentation page.
The extra [TOC] tag is generating a huge floating duplicated
table-of-contents, which obscures the majority of the page
(see bottom of https://eigen.tuxfamily.org/dox/unsupported/eigen_tensors.html).
Remove it.

Also, headers do not support markup (see
[doxygen bug](https://github.com/doxygen/doxygen/issues/7467)), so
backticks like
```
```
end up generating titles that looks like
```
Constructor <tt>Tensor<double,2></tt>
```
Removing backticks for now.  To generate proper formatted headers, we
must directly use html instead of markdown, i.e.
```
<h2>Constructor <code>Tensor&lt;double,2&gt;</code></h2>
```
which is ugly.

Fixes #2254.
2021-07-03 04:39:22 +00:00
Jonas Harsch
aab747021b Don't crash when attempting to shuffle an empty tensor. 2021-07-02 20:33:52 +00:00
Antonio Sanchez
6035da5283 Fix compile issues for gcc 4.8.
- Move constructors can only be defaulted as NOEXCEPT if all members
have NOEXCEPT move constructors.
- gcc 4.8 has some funny parsing bug in `a < b->c`, thinking `b-` is a template parameter.
2021-07-01 22:58:14 +00:00
Antonio Sanchez
3a087ccb99 Modify tensor argmin/argmax to always return first occurence.
As written, depending on multithreading/gpu, the returned index from
`argmin`/`argmax` is not currently stable.  Here we modify the functors
to always keep the first occurence (i.e. if the value is equal to the
current min/max, then keep the one with the smallest index).

This is otherwise causing unpredictable results in some TF tests.
2021-06-29 10:36:20 -07:00
Antonio Sanchez
e9ab4278b7 Rewrite balancer to avoid overflows.
The previous balancer overflowed for large row/column norms.
Modified to prevent that.

Fixes #2273.
2021-06-21 17:29:55 +00:00
jenswehner
175f0cc1e9 changed documentation to make example compile 2021-06-16 11:45:06 +02:00
Antonio Sanchez
954879183b Fix placement of permanent GPU defines. 2021-06-15 12:17:09 -07:00
Rasmus Munk Larsen
13fb5ab92c Fix more enum arithmetic. 2021-06-15 09:09:31 -07:00
Antonio Sanchez
514977f31b Add ability to permanently enable HIP/CUDA gpu* defines.
When using Eigen for gpu, these simplify portability.  If
`EIGEN_PERMANENTLY_ENABLE_GPU_HIP_CUDA_DEFINES` is set, then
we do not undefine them.
2021-06-11 17:19:54 +00:00
Antonio Sanchez
6aec83263d Allow custom TENSOR_CONTRACTION_DISPATCH macro.
Currently TF lite needs to hack around with the Tensor headers in order
to customize the contraction dispatch method. Here we add simple `#ifndef`
guards to allow them to provide their own dispatch prior to inclusion.
2021-06-11 17:02:19 +00:00
Rohit Santhanam
c8d40a7bf1 Removed dead code from GPU float16 unit test. 2021-05-28 20:06:48 +00:00
Nathan Luehr
972cf0c28a Fix calls to device functions from host code 2021-05-11 22:47:49 +00:00
Antonio Sanchez
0eba8a1fe3 Clean up gpu device properties.
Made a class and singleton to encapsulate initialization and retrieval of
device properties.

Related to !481, which already changed the API to address a static
linkage issue.
2021-05-07 17:51:29 +00:00
Antonio Sanchez
e3b7f59659 Simplify TensorRandom and remove time-dependence.
Time-dependence prevents tests from being repeatable. This has long
been an issue with debugging the tensor tests. Removing this will allow
future tests to be repeatable in the usual way.

Also, the recently added macros in !476 are causing headaches across different
platforms. For example, checking `_XOPEN_SOURCE` is leading to multiple
ambiguous macro errors across Google, and `_DEFAULT_SOURCE`/`_SVID_SOURCE`/`_BSD_SOURCE`
are sometimes defined with values, sometimes defined as empty, and sometimes
not defined at all when they probably should be.  This is leading to
multiple build breakages.

The simplest approach is to generate a seed via
`Eigen::internal::random<uint64_t>()` if on CPU. For GPU, we use a
hash based on the current thread ID (since `rand()` isn't supported
on GPU).

Fixes #1602.
2021-05-04 13:34:49 -07:00
Turing Eret
3804ca0d90 Fix for issue with static global variables in TensorDeviceGpu.h
m_deviceProperties and m_devicePropInitialized are defined as global
statics which will define multiple copies which can cause issues if
initializeDeviceProp() is called in one translation unit and then
m_deviceProperties is used in a different translation unit. Added
inline functions getDeviceProperties() and getDevicePropInitialized()
which defines those variables as static locals. As per the C++ standard
7.1.2/4, a static local declared in an inline function always refers
to the same object, so this should be safer. Credit to Sun Chenggen
for this fix.

This fixes issue #1475.
2021-04-23 07:43:35 -06:00
Antonio Sanchez
045c0609b5 Check existence of BSD random before use.
`TensorRandom` currently relies on BSD `random()`, which is not always
available.  The [linux manpage](https://man7.org/linux/man-pages/man3/srandom.3.html)
gives the glibc condition:
```
_XOPEN_SOURCE >= 500
               || /* Glibc since 2.19: */ _DEFAULT_SOURCE
	       || /* Glibc <= 2.19: */ _SVID_SOURCE ||  _BSD_SOURCE
```
In particular, this was failing to compile for MinGW via msys2. If not
available, we fall back to using `rand()`.
2021-04-22 20:42:12 +00:00
Antonio Sanchez
69adf26aa3 Modify googlehash use to account for namespace issues.
The namespace declaration for googlehash is a configurable macro that
can be disabled.  In particular, it is disabled within google, causing
compile errors since `dense_hash_map`/`sparse_hash_map` are then in
the global namespace instead of in `::google`.

Here we play a bit of gynastics to allow for both `google::*_hash_map`
and `*_hash_map`, while limiting namespace polution.  Symbols within
the `::google` namespace are imported into `Eigen::google`.

We also remove checks based on `_SPARSE_HASH_MAP_H_`, as this is
fragile, and instead require `EIGEN_GOOGLEHASH_SUPPORT` to be
defined.
2021-04-12 19:00:39 -07:00
Rasmus Munk Larsen
a2c0542010 Fix typo in TensorDimensions.h 2021-04-12 18:59:56 +00:00
Rohit Santhanam
dfd6720d82 Fix for float16 GPU unit test. 2021-04-12 10:19:06 +00:00
Jens Wehner
f6fc66aa75 fixed doxygen for unsupported iterative solver module 2021-04-11 16:26:14 +00:00
Rohit Santhanam
2859db0220 This fixes an issue where the compiler was not choosing the GPU specific specialization of ScanLauncher.
The issue was discovered when the GPU scan unit test was run and resulted in a segmentation fault.

The segmantation fault occurred because the unit test allocated GPU memory and passed a pointer to that memory to the computation that it presumed would execute on the GPU.

But because of the issue, the computation was scheduled to execute on the CPU so a situation was constructed where the CPU attempted to access a GPU memory location.

The fix expands the GPU specific ScanLauncher specialization to handle cases where vectorization is enabled.

Previously, the GPU specialization is chosen only if Vectorization is not used.
2021-04-08 15:14:48 +00:00
Steve Bronder
e7b8643d70 Revert "Revert "Adds EIGEN_CONSTEXPR and EIGEN_NOEXCEPT to rows(), cols(), innerStride(), outerStride(), and size()""
This reverts commit 5f0b4a4010.
2021-03-24 18:14:56 +00:00
Jens Wehner
c0a889890f Fixed output of complex matrices 2021-03-15 21:51:55 +00:00
Antonio Sanchez
543e34ab9d Re-implement move assignments.
The original swap approach leads to potential undefined behavior (reading
uninitialized memory) and results in unnecessary copying of data for static
storage.

Here we pass down the move assignment to the underlying storage.  Static
storage does a one-way copy, dynamic storage does a swap.

Modified the tests to no longer read from the moved-from matrix/tensor,
since that can lead to UB. Added a test to ensure we do not access
uninitialized memory in a move.

Fixes: #2119
2021-03-10 16:55:20 +00:00
Antonio Sanchez
2468253c9a Define EIGEN_CPLUSPLUS and replace most __cplusplus checks.
The macro `__cplusplus` is not defined correctly in MSVC unless building
with the the `/Zc:__cplusplus` flag. Instead, it defines `_MSVC_LANG` to the
specified c++ standard version number.

Here we introduce `EIGEN_CPLUSPLUS` which will contain the c++ version
number both for MSVC and otherwise.  This simplifies checks for supported
features.

Also replaced most instances of standard version checking via `__cplusplus`
with the existing `EIGEN_COMP_CXXVER` macro for better clarity.

Fixes: #2170
2021-03-05 18:33:18 +00:00
David Tellenbach
5f0b4a4010 Revert "Adds EIGEN_CONSTEXPR and EIGEN_NOEXCEPT to rows(), cols(), innerStride(), outerStride(), and size()"
This reverts commit 6cbb3038ac because it
breaks clang-10 builds on x86 and aarch64 when C++11 is enabled.
2021-03-05 13:16:43 +01:00
Steve Bronder
6cbb3038ac Adds EIGEN_CONSTEXPR and EIGEN_NOEXCEPT to rows(), cols(), innerStride(), outerStride(), and size() 2021-03-04 18:58:08 +00:00
Eugene Zhulenev
a6601070f2 Add log2 operation to TensorBase 2021-03-04 00:13:36 +00:00
Christoph Hertzberg
2660d01fa7 Inherit from no_assignment_operator to avoid implicit copy constructor warnings
(cherry picked from commit 9bbb7ea4b54b1f307863be4ed8d105c38cdefe50)
2021-02-27 18:44:26 +01:00
Christoph Hertzberg
a3521d743c Fix some enum-enum conversion warnings
(cherry picked from commit 838f3d8ce22a5549ef10c7386fb03040721749a0)
2021-02-27 18:44:26 +01:00
Christoph Hertzberg
81b5fe2f0a ReturnByValue is already non-copyable
(cherry picked from commit abbf95045009619f37bd92b45433eedbfcbe41cf)
2021-02-27 18:44:26 +01:00
Christoph Hertzberg
4fb3459a23 Fix double-promotion warnings
(cherry picked from commit c22c103e932e511e96645186831363585a44b7a3)
2021-02-27 18:44:26 +01:00
Jens Wehner
4bfcee47b9 Idrs iterative linear solver 2021-02-27 12:09:33 +00:00
Rasmus Munk Larsen
f284c8592b Don't crash when attempting to slice an empty tensor. 2021-02-24 18:12:51 -08:00
Guoqiang QI
f44197fabd Some improvements for kissfft from Martin Reinecke(pocketfft author):
1.Only computing about half of the factors and use complex conjugate symmetry for the rest instead of all to save time.
2.All twiddles are calculated in double because that gives the maximum achievable precision when doing float transforms.
3.Reducing all angles to the range 0<angle<pi/4 which gives even more precision.
2021-02-24 21:36:47 +00:00
Antonio Sanchez
119763cf38 Eliminate CMake FindPackageHandleStandardArgs warnings.
CMake complains that the package name does not match when the case
differs, e.g.:
```
CMake Warning (dev) at /usr/share/cmake-3.18/Modules/FindPackageHandleStandardArgs.cmake:273 (message):
  The package name passed to `find_package_handle_standard_args` (UMFPACK)
  does not match the name of the calling package (Umfpack).  This can lead to
  problems in calling code that expects `find_package` result variables
  (e.g., `_FOUND`) to follow a certain pattern.
Call Stack (most recent call first):
  cmake/FindUmfpack.cmake:50 (find_package_handle_standard_args)
  bench/spbench/CMakeLists.txt:24 (find_package)
This warning is for project developers.  Use -Wno-dev to suppress it.
```
Here we rename the libraries to match their true cases.
2021-02-24 09:52:05 +00:00
Antonio Sanchez
5f9cfb2529 Add missing adolc isinf/isnan.
Also modified cmake/FindAdolc.cmake to eliminate warnings, and added
search paths to match install layout.

Fixed: #2157
2021-02-19 22:26:56 +00:00
frgossen
33e0af0130 Return nan at poles of polygamma, digamma, and zeta if limit is not defined 2021-02-19 16:35:11 +00:00
David Tellenbach
36200b7855 Remove vim specific comments to recognoize correct file-type.
As discussed in #2143 we remove editor specific comments.
2021-02-09 09:13:09 +01:00
Ralf Hannemann-Tamas
984d010b7b add specialization of check_sparse_solving() for SuperLU solver, in order to test adjoint and transpose solves 2021-02-08 22:00:31 +00:00
Antonio Sanchez
3f4684f87d Include <cstdint> in one place, remove custom typedefs
Originating from
[this SO issue](https://stackoverflow.com/questions/65901014/how-to-solve-this-all-error-2-in-this-case),
some win32 compilers define `__int32` as a `long`, but MinGW defines
`std::int32_t` as an `int`, leading to a type conflict.

To avoid this, we remove the custom `typedef` definitions for win32.  The
Tensor module requires C++11 anyways, so we are guaranteed to have
included `<cstdint>` already in `Eigen/Core`.

Also re-arranged the headers to only include `<cstdint>` in one place to
avoid this type of error again.
2021-01-26 14:23:05 -08:00
Gmc2
a4edb1079c fix test of ExtractVolumePatchesOp 2021-01-25 03:23:46 +00:00
David Tellenbach
660c6b857c Remove std::cerr in iterative solver since we don't have iostream.
This fixes #2123
2021-01-21 11:40:05 +01:00
Maozhou, Ge
21a8a2487c fix paddings of TensorVolumePatchOp 2021-01-15 11:51:49 +08:00
Antonio Sanchez
070d303d56 Add CUDA complex sqrt.
This is to support scalar `sqrt` of complex numbers `std::complex<T>` on
device, requested by Tensorflow folks.

Technically `std::complex` is not supported by NVCC on device
(though it is by clang), so the default `sqrt(std::complex<T>)` function only
works on the host. Here we create an overload to add back the
functionality.

Also modified the CMake file to add `--relaxed-constexpr` (or
equivalent) flag for NVCC to allow calling constexpr functions from
device functions, and added support for specifying compute architecture for
NVCC (was already available for clang).
2020-12-22 23:25:23 -08:00
Turing Eret
19e6496ce0 Replace call to FixedDimensions() with a singleton instance of
FixedDimensions.
2020-12-16 07:34:44 -07:00
Turing Eret
bc7d1599fb TensorStorage with FixedDimensions now has zero instance memory overhead.
Removed m_dimension as instance member of TensorStorage with
FixedDimensions and instead use the template parameter. This
means that the sizeof a pure fixed-size storage is exactly
equal to the data it is storing.
2020-12-14 07:19:34 -07:00
Alexander Grund
cf0b5b0344 Remove code checking for CMake < 3.5
As the CMake version is at least 3.5 the code checking for earlier versions can be removed.
2020-12-14 09:57:44 +00:00
Antonio Sanchez
2dbac2f99f Fix bad NEON fp16 check 2020-12-04 13:42:18 -08:00
Antonio Sanchez
e2f21465fe Special function implementations for half/bfloat16 packets.
Current implementations fail to consider half-float packets, only
half-float scalars.  Added specializations for packets on AVX, AVX512 and
NEON.  Added tests to `special_packetmath`.

The current `special_functions` tests would fail for half and bfloat16 due to
lack of precision. The NEON tests also fail with precision issues and
due to different handling of `sqrt(inf)`, so special functions bessel, ndtri
have been disabled.

Tested with AVX, AVX512.
2020-12-04 10:16:29 -08:00
Rasmus Munk Larsen
71c85df4c1 Clean up the Tensor header and get rid of the EIGEN_SLEEP macro. 2020-12-02 11:04:04 -08:00
Bowie Owens
9842366bba Make inclusion of doc sub-directory optional by adjusting options.
Allows exclusion of doc and related targets to help when using eigen via add_subdirectory().

Requested by:

https://gitlab.com/libeigen/eigen/-/issues/1842

Also required making EIGEN_TEST_BUILD_DOCUMENTATION a dependent option on EIGEN_BUILD_DOC. This ensures documentation targets are properly defined when EIGEN_TEST_BUILD_DOCUMENTATION is ON.
2020-11-27 08:11:49 +11:00
Antonio Sanchez
22f67b5958 Fix boolean float conversion and product warnings.
This fixes some gcc warnings such as:
```
Eigen/src/Core/GenericPacketMath.h:655:63: warning: implicit conversion turns floating-point number into bool: 'typename __gnu_cxx::__enable_if<__is_integer<bool>::__value, double>::__type' (aka 'double') to 'bool' [-Wimplicit-conversion-floating-point-to-bool]
    Packet psqrt(const Packet& a) { EIGEN_USING_STD(sqrt); return sqrt(a); }
```

Details:

- Added `scalar_sqrt_op<bool>` (`-Wimplicit-conversion-floating-point-to-bool`).

- Added `scalar_square_op<bool>` and `scalar_cube_op<bool>`
specializations (`-Wint-in-bool-context`)

- Deprecated above specialized ops for bool.

- Modified `cxx11_tensor_block_eval` to specialize generator for
booleans (`-Wint-in-bool-context`) and to use `abs` instead of `square` to
avoid deprecated bool ops.
2020-11-24 20:20:36 +00:00
Antonio Sanchez
a8fdcae55d Fix sparse_extra_3, disable counting temporaries for testing DynamicSparseMatrix.
Multiplication of column-major `DynamicSparseMatrix`es involves three
temporaries:
- two for transposing twice to sort the coefficients
(`ConservativeSparseSparseProduct.h`, L160-161)
- one for a final copy assignment (`SparseAssign.h`, L108)
The latter is avoided in an optimization for `SparseMatrix`.

Since `DynamicSparseMatrix` is deprecated in favor of `SparseMatrix`, it's not
worth the effort to optimize further, so I simply disabled counting
temporaries via a macro.

Note that due to the inclusion of `sparse_product.cpp`, the `sparse_extra`
tests actually re-run all the original `sparse_product` tests as well.

We may want to simply drop the `DynamicSparseMatrix` tests altogether, which
would eliminate the test duplication.

Related to #2048
2020-11-18 23:15:33 +00:00
Antonio Sanchez
17268b155d Add bit_cast for half/bfloat to/from uint16_t, fix TensorRandom
The existing `TensorRandom.h` implementation makes the assumption that
`half` (`bfloat16`) has a `uint16_t` member `x` (`value`), which is not
always true. This currently fails on arm64, where `x` has type `__fp16`.
Added `bit_cast` specializations to allow casting to/from `uint16_t`
for both `half` and `bfloat16`.  Also added tests in
`half_float`, `bfloat16_float`, and `cxx11_tensor_random` to catch
these errors in the future.
2020-11-18 20:32:35 +00:00
Antonio Sanchez
3669498f5a Fix rule-of-3 for the Tensor module.
Adds copy constructors to Tensor ops, inherits assignment operators from
`TensorBase`.

Addresses #1863
2020-11-18 18:14:53 +00:00
Antonio Sanchez
852513e7a6 Disable testing of OpenGL by default.
The `OpenGLSupport` module contains mostly deprecated features, and the
test is highly GL context-dependent, relies on deprecated GLUT, and
requires a display.  Until the module is updated to support modern
OpenGL and the test to use newer windowing frameworks (e.g. GLFW)
it's probably best to disable the test by default.

The test can be enabled with `cmake -DEIGEN_TEST_OPENGL=ON`.

See #2053 for more details.
2020-11-12 16:15:40 -08:00
Antonio Sanchez
6961468915 Address issues with openglsupport test.
The existing test fails on several systems due to GL runtime version mismatches,
the use of deprecated features, and memory errors due to improper use of GLUT.
The test was modified to:

- Run within a display function, allowing proper GLUT cleanup.
- Generate dynamic shaders with a supported GLSL version string and output variables.
- Report shader compilation errors.
- Check GL context version before launching version-specific tests.

Note that most of the existing `OpenGLSupport` module and tests rely on deprecated
features (e.g. fixed-function pipeline). The test was modified to allow it to
pass on various systems. We might want to consider removing the module or re-writing
it entirely to support modern OpenGL.  This is beyond the scope of this patch.

Testing of legacy GL (for platforms that support it) can be enabled by defining
`EIGEN_LEGACY_OPENGL`.  Otherwise, the test will try to create a modern context.

Tested on
- MacBook Air (2019), macOS Catalina 10.15.7 (OpenGL 2.1, 4.1)
- Debian 10.6, NVidia Quadro K1200 (OpenGL 3.1, 3.3)
2020-11-11 15:54:43 -08:00
Deven Desai
9d11e2c03e CMakefile update for ROCm 4.0
Starting with ROCm 4.0, the `hipconfig --platform` command will return `amd` (prior return value was `hcc`). Updating the CMakeLists.txt files in the test dirs to account for this change.
2020-10-29 18:06:31 +00:00
mehdi-goli
a725a3233c [SYCL clean up the code] : removing exrta #pragma unroll in SYCL which was causing issues in embeded systems 2020-10-28 08:34:49 +00:00
Rasmus Munk Larsen
274ef12b61 Remove leftover debug print statement in cxx11_tensor_expr.cpp 2020-10-14 22:59:51 +00:00
Rasmus Munk Larsen
61fc78bbda Get rid of nested template specialization in TensorReductionGpu.h, which was broken by c6953f799b. 2020-10-13 23:53:11 +00:00
Rasmus Munk Larsen
c6953f799b Add packet generic ops predux_fmin, predux_fmin_nan, predux_fmax, and predux_fmax_nan that implement reductions with PropagateNaN, and PropagateNumbers semantics. Add (slow) generic implementations for most reductions. 2020-10-13 21:48:31 +00:00
David Tellenbach
8f8d77b516 Add EIGEN prefix for HAS_LGAMMA_R 2020-10-08 18:32:19 +02:00
Eugene Zhulenev
2279f2c62f Use lgamma_r if it is available (update check for glibc 2.19+) 2020-10-08 00:26:45 +00:00
Rasmus Munk Larsen
b431024404 Don't make assumptions about NaN-propagation for pmin/pmax - it various across platforms.
Change test to only test for NaN-propagation for pfmin/pfmax.
2020-10-07 19:05:18 +00:00
Zhuyie
e4b24e7fb2 Fix Eigen::ThreadPool::CurrentThreadId returning wrong thread id when EIGEN_AVOID_THREAD_LOCAL and NDEBUG are defined 2020-09-25 09:36:43 +00:00
Rasmus Munk Larsen
e55182ac09 Get rid of initialization logic for blueNorm by making the computed constants static const or constexpr.
Move macro definition EIGEN_CONSTEXPR to Core and make all methods in NumTraits constexpr when EIGEN_HASH_CONSTEXPR is 1.
2020-09-18 17:38:58 +00:00
Deven Desai
603e213d13 Fixing a CUDA / P100 regression introduced by PR 181
PR 181 ( https://gitlab.com/libeigen/eigen/-/merge_requests/181 ) adds `__launch_bounds__(1024)` attribute to GPU kernels, that did not have that attribute explicitly specified.

That PR seems to cause regressions on the CUDA platform. This PR/commit makes the changes in PR 181, to be applicable for HIP only
2020-08-20 00:29:57 +00:00
David Tellenbach
d4a727d092 Disable min/max NaN propagation in test cxx11_tensor_expr
The current pmin/pmax implementation for Arm Neon propagate NaNs
differently than std::min/std::max.

See issue https://gitlab.com/libeigen/eigen/-/issues/1937
2020-08-14 16:16:27 +00:00
Deven Desai
46f8a18567 Adding an explicit launch_bounds(1024) attribute for GPU kernels.
Starting with ROCm 3.5, the HIP compiler will change from HCC to hip-clang.

This compiler change introduce a change in the default value of the `__launch_bounds__` attribute associated with a GPU kernel. (default value means the value assumed by the compiler as the `__launch_bounds attribute__` value, when it is not explicitly specified by the user)

Currently (i.e. for HIP with ROCm 3.3 and older), the default value is 1024. That changes to 256 with ROCm 3.5 (i.e. hip-clang compiler). As a consequence of this change, if a GPU kernel with a `__luanch_bounds__` attribute of 256 is launched at runtime with a threads_per_block value > 256, it leads to a runtime error. This is leading to a couple of Eigen unit test failures with ROCm 3.5.

This commit adds an explicit `__launch_bounds(1024)__` attribute to every GPU kernel that currently does not have it explicitly specified (and hence will end up getting the default value of 256 with the change to hip-clang)
2020-08-05 01:46:34 +00:00
Rasmus Munk Larsen
b92206676c Inherit alignment trait from argument in TensorBroadcasting to avoid segfault when the argument is unaligned. 2020-07-28 19:19:37 +00:00
Rasmus Munk Larsen
ac2eca6b11 Update tensor reduction test to avoid undefined division of bfloat16 by int. 2020-07-22 00:35:51 +00:00