eigen

CFD/eigen

Author	SHA1	Message	Date
Chip Kerchner	aa8b7e2c37	Add subMappers to Power GEMM packing - simplifies the address calculations (10% faster)	2022-05-23 15:18:29 +00:00
Mehdi Goli	cbe03f3531	[SYCL] Extending SYCL queue interface extension.	2022-05-23 14:45:27 +00:00
Eisuke Kawashima	ac5c83a3f5	unset executable flag	2022-05-22 22:47:43 +09:00
Tobias Wood	a9868bd5be	Add arg() to tensor	2022-05-20 03:33:01 +00:00
Antonio Sánchez	9b9496ad98	Revert "Add AVX512 optimizations for matrix multiply" This reverts commit `25db0b4a82`	2022-05-13 18:50:33 +00:00
aaraujom	25db0b4a82	Add AVX512 optimizations for matrix multiply	2022-05-12 23:41:19 +00:00
Guoqiang QI	00b75375e7	Adding PocketFFT support in FFT module since kissfft has some flaw in accuracy and performance	2022-05-11 17:44:22 +00:00
Rasmus Munk Larsen	73d65dbc43	Update README.md. Remove obsolete comment about RowMajor not being fully supported.	2022-05-06 18:19:35 +00:00
Antonio Sánchez	f7b31f864c	Revert "Replace call to FixedDimensions() with a singleton instance of" This reverts commit `19e6496ce0`	2022-04-10 15:30:33 +00:00
Tobias Schlüter	f3ba220c5d	Remove EIGEN_EMPTY_STRUCT_CTOR	2022-04-08 18:27:26 +00:00
Antonio Sánchez	5ed7a86ae9	Fix MSVC+CUDA issues.	2022-04-08 18:05:32 +00:00
Erik Schultheis	e1df3636b2	More constexpr helpers	2022-04-04 18:38:34 +00:00
Erik Schultheis	64909b82bd	static const class members turned into constexpr	2022-04-04 17:33:33 +00:00
Antonio Sanchez	9bc9992dd3	Eliminate trace unused warning.	2022-03-29 22:04:50 +00:00
Essex Edwards	cd3c81c3bc	Add a NNLS solver to unsupported - issue #655	2022-03-23 20:20:44 +00:00
Romain Biessy	f2a3e03e9b	Fix usages of wrong namespace	2022-03-21 15:07:53 +00:00
Erik Schultheis	421cbf0866	Replace Eigen type metaprogramming with corresponding std types and make use of alias templates	2022-03-16 16:43:40 +00:00
Antonio Sánchez	9296bb4b93	Fix edge-case in zeta for large inputs.	2022-03-08 21:21:20 +00:00
Antonio Sánchez	008ff3483a	Fix broken tensor executor test, allow tensor packets of size 1.	2022-03-07 20:30:37 +00:00
Antonio Sanchez	bded5028a5	Fix ODR failures in TensorRandom.	2022-02-11 23:28:33 -08:00
Rasmus Munk Larsen	18eab8f997	Add convenience method `constexpr std::size_t size() const` to `Eigen::IndexList`	2022-02-12 04:23:03 +00:00
Antonio Sánchez	9441d94dcc	Revert "Make fixed-size Matrix and Array trivially copyable after C++20" This reverts commit `47eac21072`	2022-02-05 04:40:29 +00:00
Antonio Sánchez	cafeadffef	Fix ODR violations.	2022-02-04 19:01:07 +00:00
Rasmus Munk Larsen	ea2c02060c	Add reciprocal packet op and fast specializations for float with SSE, AVX, and AVX512.	2022-01-21 23:49:18 +00:00
Erik Schultheis	970640519b	Cleanup	2022-01-21 01:48:59 +00:00
Kolja Brix	8d81a2339c	Reduce usage of reserved names	2022-01-10 20:53:29 +00:00
Matthias Möller	c9df98b071	Fix Gcc8.5 warning about missing base class initialisation (#2404 )	2022-01-07 19:16:53 +00:00
Lingzhu Xiang	47eac21072	Make fixed-size Matrix and Array trivially copyable after C++20 Making them trivially copyable allows using std::memcpy() without undefined behaviors. Only Matrix and Array with trivially copyable DenseStorage are marked as trivially copyable with an additional type trait. As described in http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p0848r3.html it requires extremely verbose SFINAE to make the special member functions of fixed-size Matrix and Array trivial, unless C++20 concepts are available to simplify the selection of trivial special member functions given template parameters. Therefore only make this feature available to compilers that support C++20 P0848R3. Fix #1855.	2022-01-07 19:04:35 +00:00
Erik Schultheis	c20e908ebc	turn some macros intro constexpr functions	2021-12-10 19:27:01 +00:00
Erik Schultheis	c35679af27	fixed customIndices2Array forgetting first index	2021-12-10 16:41:59 +00:00
Erik Schultheis	e4c40b092a	disambiguate overloads for empty index list	2021-12-07 19:40:09 +00:00
Jens Wehner	c6fa0ca162	Idrsstabl	2021-12-06 20:00:00 +00:00
Erik Schultheis	cc11e240ac	Some further cleanup	2021-12-06 18:01:15 +00:00
Erik Schultheis	cd83f34d3a	fix typo `StableNorm` -> `stableNorm`	2021-12-04 14:52:09 +00:00
Jens Wehner	4ee2e9b340	Idrs refactoring	2021-12-02 23:32:07 +00:00
Jens Wehner	f63c6dd1f9	Bicgstabl	2021-12-02 22:48:22 +00:00
Erik Schultheis	2f65ec5302	fixed leftover else branch	2021-12-02 18:13:19 +00:00
Erik Schultheis	ec2fd0f7ed	Require recent GCC and MSCV and removed `EIGEN_HAS_CXX14` and some other feature test macros	2021-12-01 00:48:34 +00:00
Erik Schultheis	f33a31b823	removed EIGEN_HAS_CXX11_* and redundant EIGEN_COMP_CXXVER checks	2021-11-29 19:18:57 +00:00
Erik Schultheis	ec4efbd696	remove EIGEN_HAS_CXX11	2021-11-24 20:08:49 +00:00
Rasmus Munk Larsen	cfdb3ce3f0	Fix warnings about shadowing definitions.	2021-11-23 14:34:47 -08:00
Rasmus Munk Larsen	5e89573e2a	Implement Eigen::array<...>::reverse_iterator if std::reverse_iterator exists.	2021-11-20 00:22:46 +00:00
Rasmus Munk Larsen	11cb7b8372	Add basic iterator support for Eigen::array to ease transition to std::array in third-party libraries.	2021-11-19 05:14:30 +00:00
Antonio Sanchez	c107bd6102	Fix errors for windows build.	2021-11-19 04:23:25 +00:00
Rasmus Munk Larsen	96aeffb013	Make the new TensorIO implementation work with TensorMap with const elements.	2021-11-17 18:16:04 -08:00
Rasmus Munk Larsen	824d06eb36	Include <numeric> to get std::iota.	2021-11-18 00:47:18 +00:00
Antonio Sanchez	ffb78e23a1	Fix tensor broadcast off-by-one error. Caught by JAX unit tests. Triggered if broadcast is smaller than packet size.	2021-11-16 17:37:38 +00:00
cpp977	f73c95c032	Reimplemented the Tensor stream output.	2021-11-16 17:36:58 +00:00
Ben Barsdell	50df8d3d6d	Avoid integer overflow in EigenMetaKernel indexing - The current implementation computes `size + total_threads`, which can overflow and cause CUDA_ERROR_ILLEGAL_ADDRESS when size is close to the maximum representable value. - The num_blocks calculation can also overflow due to the implementation of divup(). - This patch prevents these overflows and allows the kernel to work correctly for the full representable range of tensor sizes. - Also adds relevant tests.	2021-11-05 16:39:37 +11:00
Antonio Sanchez	8f8c2ba2fe	Remove bad "take" impl that causes g++-11 crash. For some reason, having `take<n, numeric_list<T>>` for `n > 0` causes g++-11 to ICE with ``` sorry, unimplemented: unexpected AST of kind nontype_argument_pack ``` It does work with other versions of gcc, and with clang. I filed a GCC bug [here](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102999). Technically we should never actually run into this case, since you can't take n > 0 elements from an empty list. Commenting it out allows our Eigen tests to pass.	2021-11-01 17:04:41 +00:00
Antonio Sanchez	f6c8cc0e99	Fix TensorReduction warnings and error bound for sum accuracy test. The sum accuracy test currently uses the default test precision for the given scalar type. However, scalars are generated via a normal distribution, and given a large enough count and strong enough random generator, the expected sum is zero. This causes the test to periodically fail. Here we estimate an upper-bound for the error as `sqrt(N) * prec` for summing N values, with each having an approximate epsilon of `prec`. Also fixed a few warnings generated by MSVC when compiling the reduction test.	2021-10-30 14:59:00 -07:00
Rasmus Munk Larsen	b3bea43a2d	Don't use unrolled loops for stateful reducers. The problem is the combination step, e.g. reducer0.reducePacket(accum1, accum0); reducer0.reducePacket(accum2, accum0); reducer0.reducePacket(accum3, accum0); For the mean reducer this will increment the count as well as adding together the accumulators and result in the wrong count being divided into the sum at the end.	2021-10-28 23:52:54 +00:00
Antonio Sánchez	185ad0e610	Revert "Avoid integer overflow in EigenMetaKernel indexing" This reverts commit `100d7caf92`	2021-10-27 14:55:25 +00:00
Ben Barsdell	100d7caf92	Avoid integer overflow in EigenMetaKernel indexing - The current implementation computes `size + total_threads`, which can overflow and cause CUDA_ERROR_ILLEGAL_ADDRESS when size is close to the maximum representable value. - The num_blocks calculation can also overflow due to the implementation of divup(). - This patch prevents these overflows and allows the kernel to work correctly for the full representable range of tensor sizes. - Also adds relevant tests.	2021-10-26 00:04:28 +00:00
Antonio Sanchez	a500da1dc0	Fix broadcasting oob error. For vectorized 1-dimensional inputs that do not take the special blocking path (e.g. `std::complex<...>`), there was an index-out-of-bounds error causing the broadcast size to be computed incorrectly. Here we fix this, and make other minor cleanup changes. Fixes #2351.	2021-10-25 19:31:12 +00:00
Nico	b17bcddbca	Fix -Wbitwise-instead-of-logical clang warning & and \| short-circuit, && and \|\| don't. When both arguments to those are boolean, the short-circuiting version is usually the desired one, so clang warns on this. Here, it is inconsequential, so switch to && and \|\| to suppress the warning.	2021-10-21 23:32:45 -04:00
Antonio Sanchez	24ebb37f38	Disable Tree reduction for GPU. For moderately sized inputs, running the Tree reduction quickly fills/overflows the GPU thread stack space, leading to memory errors. This was happening in the `cxx11_tensor_complex_gpu` test, for example. Disabling tree reduction on GPU fixes this.	2021-10-20 20:42:37 +00:00
Rasmus Munk Larsen	360290fc42	Improve accuracy of full tensor reduction for half and bfloat16 by reducing leaf size in tree reduction. Add more unit tests for summation accuracy.	2021-10-20 19:54:06 +00:00
Antonio Sanchez	d0d34524a1	Move CUDA/Complex.h to GPU/Complex.h, remove TensorReductionCuda.h The `Complex.h` file applies equally to HIP/CUDA, so placing under the generic `GPU` folder. The `TensorReductionCuda.h` has already been deprecated, now removing for the next Eigen version.	2021-10-20 12:00:19 -07:00
Rasmus Munk Larsen	1d75fab368	Speed up tensor reduction	2021-10-02 14:58:23 +00:00
Kolja Brix	afa616bc9e	Fix some typos found	2021-09-23 15:22:00 +00:00
sciencewhiz	4b6036e276	fix various typos	2021-09-22 16:15:06 +00:00
Alexander Karatarakis	4d622be118	[AutodiffScalar] Remove const when returning by value clang-tidy: Return type 'const T' is 'const'-qualified at the top level, which may reduce code readability without improving const correctness The types are somewhat long, but the affected return types are of the form: ``` const T my_func() { // } ``` Change to: ``` T my_func() { // } ```	2021-09-18 21:23:32 +00:00
Rasmus Munk Larsen	6cadab6896	Clean up EIGEN_STATIC_ASSERT to only use standard c++11 static_assert.	2021-09-16 20:43:54 +00:00
Rasmus Munk Larsen	d7d0bf832d	Issue an error in case of direct inclusion of internal headers.	2021-09-10 19:12:26 +00:00
Antonio Sanchez	6c10495a78	Remove unnecessary std::tuple reference.	2021-09-09 15:49:44 +00:00
Antonio Sanchez	eea2a3385c	Remove more DynamicSparseMatrix references. Also fixed some typos in SparseExtra/MarketIO.h.	2021-09-02 15:36:47 -07:00
Jens Wehner	8286073c73	Matrixmarket extension	2021-09-02 17:23:33 +00:00
Antonio Sanchez	74da2e6821	Rename Tuple -> Pair. This is to make way for a new `Tuple` class that mimics `std::tuple`, but can be reliably used on device and with aligned Eigen types. The existing Tuple has very few references, and is actually an analogue of `std::pair`.	2021-09-02 02:20:54 +00:00
jenswehner	a443a2373f	updated documentation	2021-08-31 22:58:28 +00:00
Antonio Sanchez	cc3573ab44	Disable cuda Eigen::half vectorization on host. All cuda `__half` functions are device-only in CUDA 9, including conversions. Host-side conversions were added in CUDA 10. The existing code doesn't build prior to 10.0. All arithmetic functions are always device-only, so there's therefore no reason to use vectorization on the host at all. Modified the code to disable vectorization for `__half` on host, which required also updating the `TensorReductionGpu` implementation which previously made assumptions about available packets.	2021-08-31 19:13:12 +00:00
Turing Eret	3324389f6d	Add EIGEN_TENSOR_PLUGIN support per issue #2052 .	2021-08-30 19:36:55 +00:00
Jens Wehner	53ad9c75b4	included unordered_map header	2021-08-27 16:53:28 +00:00
jenswehner	9abf4d0bec	made RandomSetter C++11 compatible	2021-08-25 20:24:55 +00:00
jenswehner	90b3b6b572	added doxygen flowchart	2021-08-24 17:11:51 +00:00
jenswehner	d85de1ef56	removed sparse dynamic matrix	2021-08-24 10:33:00 +02:00
Alexander Karatarakis	4ba872bd75	Avoid leading underscore followed by cap in template identifiers	2021-08-04 22:41:52 +00:00
Alexander Karatarakis	f357283d31	_DerType -> DerivativeType as underscore-followed-by-caps is a reserved identifier	2021-07-29 18:02:04 +00:00
Antonio Sanchez	1fd5ce1002	For GpuDevice::fill, use a single memset if all bytes are equal. The original `fill` implementation introduced a 5x regression on my nvidia Quadro K1200. @rohitsan reported up to 100x regression for HIP. This restores performance.	2021-07-10 13:37:16 +00:00
Antonio Sanchez	9c22795d65	Put attach/detach buffer back in for TensorDeviceSycl. Also added a test to verify the original buffer is updated correctly.	2021-07-09 10:00:05 -07:00
Antonio Sanchez	1e6c6c1576	Replace memset with fill to work for non-trivial scalars. For custom scalars, zero is not necessarily represented by a zeroed-out memory block (e.g. gnu MPFR). We therefore cannot rely on `memset` if we want to fill a matrix or tensor with zeroes. Instead, we should rely on `fill`, which for trivial types does end up getting converted to a `memset` under-the-hood (at least with gcc/clang). Requires adding a `fill(begin, end, v)` to `TensorDevice`. Replaced all potentially bad instances of memset with fill. Fixes #2245.	2021-07-08 18:34:41 +00:00
Jonas Harsch	e9c9a3130b	Removed superfluous boolean `degenerate` in TensorMorphing.h.	2021-07-08 18:02:58 +00:00
Antonio Sanchez	f5a9873bbb	Fix Tensor documentation page. The extra [TOC] tag is generating a huge floating duplicated table-of-contents, which obscures the majority of the page (see bottom of https://eigen.tuxfamily.org/dox/unsupported/eigen_tensors.html). Remove it. Also, headers do not support markup (see [doxygen bug](https://github.com/doxygen/doxygen/issues/7467)), so backticks like ``` ``` end up generating titles that looks like ``` Constructor <tt>Tensor<double,2></tt> ``` Removing backticks for now. To generate proper formatted headers, we must directly use html instead of markdown, i.e. ``` <h2>Constructor <code>Tensor<double,2></code></h2> ``` which is ugly. Fixes #2254.	2021-07-03 04:39:22 +00:00
Jonas Harsch	aab747021b	Don't crash when attempting to shuffle an empty tensor.	2021-07-02 20:33:52 +00:00
Antonio Sanchez	6035da5283	Fix compile issues for gcc 4.8. - Move constructors can only be defaulted as NOEXCEPT if all members have NOEXCEPT move constructors. - gcc 4.8 has some funny parsing bug in `a < b->c`, thinking `b-` is a template parameter.	2021-07-01 22:58:14 +00:00
Antonio Sanchez	3a087ccb99	Modify tensor argmin/argmax to always return first occurence. As written, depending on multithreading/gpu, the returned index from `argmin`/`argmax` is not currently stable. Here we modify the functors to always keep the first occurence (i.e. if the value is equal to the current min/max, then keep the one with the smallest index). This is otherwise causing unpredictable results in some TF tests.	2021-06-29 10:36:20 -07:00
Antonio Sanchez	e9ab4278b7	Rewrite balancer to avoid overflows. The previous balancer overflowed for large row/column norms. Modified to prevent that. Fixes #2273.	2021-06-21 17:29:55 +00:00
jenswehner	175f0cc1e9	changed documentation to make example compile	2021-06-16 11:45:06 +02:00
Antonio Sanchez	954879183b	Fix placement of permanent GPU defines.	2021-06-15 12:17:09 -07:00
Rasmus Munk Larsen	13fb5ab92c	Fix more enum arithmetic.	2021-06-15 09:09:31 -07:00
Antonio Sanchez	514977f31b	Add ability to permanently enable HIP/CUDA gpu* defines. When using Eigen for gpu, these simplify portability. If `EIGEN_PERMANENTLY_ENABLE_GPU_HIP_CUDA_DEFINES` is set, then we do not undefine them.	2021-06-11 17:19:54 +00:00
Antonio Sanchez	6aec83263d	Allow custom TENSOR_CONTRACTION_DISPATCH macro. Currently TF lite needs to hack around with the Tensor headers in order to customize the contraction dispatch method. Here we add simple `#ifndef` guards to allow them to provide their own dispatch prior to inclusion.	2021-06-11 17:02:19 +00:00
Nathan Luehr	972cf0c28a	Fix calls to device functions from host code	2021-05-11 22:47:49 +00:00
Antonio Sanchez	0eba8a1fe3	Clean up gpu device properties. Made a class and singleton to encapsulate initialization and retrieval of device properties. Related to !481, which already changed the API to address a static linkage issue.	2021-05-07 17:51:29 +00:00
Antonio Sanchez	e3b7f59659	Simplify TensorRandom and remove time-dependence. Time-dependence prevents tests from being repeatable. This has long been an issue with debugging the tensor tests. Removing this will allow future tests to be repeatable in the usual way. Also, the recently added macros in !476 are causing headaches across different platforms. For example, checking `_XOPEN_SOURCE` is leading to multiple ambiguous macro errors across Google, and `_DEFAULT_SOURCE`/`_SVID_SOURCE`/`_BSD_SOURCE` are sometimes defined with values, sometimes defined as empty, and sometimes not defined at all when they probably should be. This is leading to multiple build breakages. The simplest approach is to generate a seed via `Eigen::internal::random<uint64_t>()` if on CPU. For GPU, we use a hash based on the current thread ID (since `rand()` isn't supported on GPU). Fixes #1602.	2021-05-04 13:34:49 -07:00
Turing Eret	3804ca0d90	Fix for issue with static global variables in TensorDeviceGpu.h m_deviceProperties and m_devicePropInitialized are defined as global statics which will define multiple copies which can cause issues if initializeDeviceProp() is called in one translation unit and then m_deviceProperties is used in a different translation unit. Added inline functions getDeviceProperties() and getDevicePropInitialized() which defines those variables as static locals. As per the C++ standard 7.1.2/4, a static local declared in an inline function always refers to the same object, so this should be safer. Credit to Sun Chenggen for this fix. This fixes issue #1475.	2021-04-23 07:43:35 -06:00
Antonio Sanchez	045c0609b5	Check existence of BSD random before use. `TensorRandom` currently relies on BSD `random()`, which is not always available. The [linux manpage](https://man7.org/linux/man-pages/man3/srandom.3.html) gives the glibc condition: ``` _XOPEN_SOURCE >= 500 \|\| /* Glibc since 2.19: / _DEFAULT_SOURCE \|\| / Glibc <= 2.19: */ _SVID_SOURCE \|\| _BSD_SOURCE ``` In particular, this was failing to compile for MinGW via msys2. If not available, we fall back to using `rand()`.	2021-04-22 20:42:12 +00:00
Antonio Sanchez	69adf26aa3	Modify googlehash use to account for namespace issues. The namespace declaration for googlehash is a configurable macro that can be disabled. In particular, it is disabled within google, causing compile errors since `dense_hash_map`/`sparse_hash_map` are then in the global namespace instead of in `::google`. Here we play a bit of gynastics to allow for both `google::_hash_map` and `_hash_map`, while limiting namespace polution. Symbols within the `::google` namespace are imported into `Eigen::google`. We also remove checks based on `_SPARSE_HASH_MAP_H_`, as this is fragile, and instead require `EIGEN_GOOGLEHASH_SUPPORT` to be defined.	2021-04-12 19:00:39 -07:00
Rasmus Munk Larsen	a2c0542010	Fix typo in TensorDimensions.h	2021-04-12 18:59:56 +00:00
Jens Wehner	f6fc66aa75	fixed doxygen for unsupported iterative solver module	2021-04-11 16:26:14 +00:00

1 2 3 4 5 ...

2485 Commits