eigen

CFD/eigen

Author	SHA1	Message	Date
Mehdi Goli	b523120687	[SYCL-2020 Support] Enabling Intel DPCPP Compiler support to Eigen	2023-01-16 07:04:08 +00:00
Charles Schlosser	fa0bd2c34e	improve sparse permutations	2023-01-15 03:21:25 +00:00
Antonio Sánchez	262194f12c	Fix a bunch of minor build and test issues.	2023-01-06 16:37:26 +00:00
Antonio Sánchez	551eebc8ca	Add synchronize method to all devices.	2022-11-29 19:35:02 +00:00
Chris	6728683938	Small cleanup of IDRS.h	2022-11-16 13:51:23 +00:00
Antonio Sánchez	e5794873cb	Replace assert with eigen_assert.	2022-10-04 17:11:23 +00:00
Rasmus Munk Larsen	3c4637640b	Remove unused typedef.	2022-09-23 19:11:31 +00:00
Chao Chen	5ffe7b92e0	[ROCm] fixed gpuGetDevice unused message	2022-09-20 21:38:20 +00:00
chuckyschluz	8acbf5c11c	re-enable pow for complex types	2022-08-26 17:29:02 -04:00
Charles Schlosser	76a669fb45	add fixed power unary operation	2022-08-16 21:32:36 +00:00
Romain Biessy	2f7cce2dd5	[SYCL] Fix some SYCL tests	2022-08-16 17:37:54 +00:00
Antonio Sánchez	b8e93bf589	Eliminate bool bitwise warnings.	2022-08-09 22:42:30 +00:00
Julian Kent	69714ff613	Add Sparse Subset of Matrix Inverse	2022-07-28 18:04:35 +00:00
Antonio Sánchez	e1165dbf9a	AutoDiff depends on Core, so include appropriate header.	2022-07-09 23:57:09 +00:00
Antonio Sánchez	bb51d9f4fa	Fix ODR violations.	2022-07-09 04:56:36 +00:00
Antonio Sanchez	0e18714167	Fix clang-tidy warnings about function definitions in headers.	2022-06-24 15:10:58 +00:00
Antonio Sánchez	8ed3b9dcd6	Skip f16/bf16 bessel specializations on AVX512 if unavailable.	2022-06-24 15:10:36 +00:00
Antonio Sánchez	8c2e0e3cb8	Fix ambiguous comparisons for c++20 (again again)	2022-06-07 17:06:17 +00:00
Antonio Sánchez	76cf6204f3	Revert "Fix c++20 ambiguity of comparisons." This reverts commit `4f6354128f`	2022-06-04 02:32:10 +00:00
Antonio Sánchez	4f6354128f	Fix c++20 ambiguity of comparisons.	2022-06-03 05:11:07 +00:00
Oleg Shirokobrod	f542b0a71f	Adding an MKL adapter in FFT module.	2022-06-02 18:10:43 +00:00
Mario Rincon-Nigro	e99163e732	fix: issue 2481: LDLT produce wrong results with AutoDiffScalar	2022-05-25 15:26:10 +00:00
Antonio Sánchez	477eb7f630	Revert "Avoid ambiguous Tensor comparison operators for C++20 compatibility" This reverts commit `5c2179b6c3`	2022-05-24 16:09:59 +00:00
Mehdi Goli	c5a5ac680c	[SYCL] SYCL-2020 range does not have default constructor.	2022-05-24 03:11:46 +00:00
Benjamin Kramer	5c2179b6c3	Avoid ambiguous Tensor comparison operators for C++20 compatibility	2022-05-23 17:36:03 +00:00
Chip Kerchner	aa8b7e2c37	Add subMappers to Power GEMM packing - simplifies the address calculations (10% faster)	2022-05-23 15:18:29 +00:00
Mehdi Goli	cbe03f3531	[SYCL] Extending SYCL queue interface extension.	2022-05-23 14:45:27 +00:00
Eisuke Kawashima	ac5c83a3f5	unset executable flag	2022-05-22 22:47:43 +09:00
Tobias Wood	a9868bd5be	Add arg() to tensor	2022-05-20 03:33:01 +00:00
Antonio Sánchez	9b9496ad98	Revert "Add AVX512 optimizations for matrix multiply" This reverts commit `25db0b4a82`	2022-05-13 18:50:33 +00:00
aaraujom	25db0b4a82	Add AVX512 optimizations for matrix multiply	2022-05-12 23:41:19 +00:00
Guoqiang QI	00b75375e7	Adding PocketFFT support in FFT module since kissfft has some flaw in accuracy and performance	2022-05-11 17:44:22 +00:00
Rasmus Munk Larsen	73d65dbc43	Update README.md. Remove obsolete comment about RowMajor not being fully supported.	2022-05-06 18:19:35 +00:00
Antonio Sánchez	f7b31f864c	Revert "Replace call to FixedDimensions() with a singleton instance of" This reverts commit `19e6496ce0`	2022-04-10 15:30:33 +00:00
Tobias Schlüter	f3ba220c5d	Remove EIGEN_EMPTY_STRUCT_CTOR	2022-04-08 18:27:26 +00:00
Antonio Sánchez	5ed7a86ae9	Fix MSVC+CUDA issues.	2022-04-08 18:05:32 +00:00
Erik Schultheis	e1df3636b2	More constexpr helpers	2022-04-04 18:38:34 +00:00
Erik Schultheis	64909b82bd	static const class members turned into constexpr	2022-04-04 17:33:33 +00:00
Antonio Sanchez	9bc9992dd3	Eliminate trace unused warning.	2022-03-29 22:04:50 +00:00
Erik Schultheis	b9d2900e8f	added a missing typename and fixed a unused typedef warning	2022-03-24 12:07:18 +02:00
Essex Edwards	cd3c81c3bc	Add a NNLS solver to unsupported - issue #655	2022-03-23 20:20:44 +00:00
Romain Biessy	f2a3e03e9b	Fix usages of wrong namespace	2022-03-21 15:07:53 +00:00
Erik Schultheis	421cbf0866	Replace Eigen type metaprogramming with corresponding std types and make use of alias templates	2022-03-16 16:43:40 +00:00
Antonio Sánchez	9296bb4b93	Fix edge-case in zeta for large inputs.	2022-03-08 21:21:20 +00:00
Antonio Sánchez	008ff3483a	Fix broken tensor executor test, allow tensor packets of size 1.	2022-03-07 20:30:37 +00:00
Antonio Sánchez	d819a33bf6	Remove poor non-convergence checks in NonLinearOptimization.	2022-03-02 19:31:20 +00:00
Antonio Sanchez	1c2690ed24	Adjust tolerance of matrix_power test for MSVC.	2022-03-01 23:33:05 +00:00
Antonio Sánchez	ae86a146b1	Modify test expression to avoid numerical differences (#2402 ).	2022-02-23 16:37:03 +00:00
Romain Biessy	2dd879d4b0	[SYCL] Fix CMake for SYCL support	2022-02-22 16:53:27 +00:00
Antonio Sanchez	bded5028a5	Fix ODR failures in TensorRandom.	2022-02-11 23:28:33 -08:00
Rasmus Munk Larsen	18eab8f997	Add convenience method `constexpr std::size_t size() const` to `Eigen::IndexList`	2022-02-12 04:23:03 +00:00
Antonio Sánchez	9441d94dcc	Revert "Make fixed-size Matrix and Array trivially copyable after C++20" This reverts commit `47eac21072`	2022-02-05 04:40:29 +00:00
Antonio Sánchez	cafeadffef	Fix ODR violations.	2022-02-04 19:01:07 +00:00
Rasmus Munk Larsen	ea2c02060c	Add reciprocal packet op and fast specializations for float with SSE, AVX, and AVX512.	2022-01-21 23:49:18 +00:00
Erik Schultheis	970640519b	Cleanup	2022-01-21 01:48:59 +00:00
Kolja Brix	8d81a2339c	Reduce usage of reserved names	2022-01-10 20:53:29 +00:00
Matthias Möller	c9df98b071	Fix Gcc8.5 warning about missing base class initialisation (#2404 )	2022-01-07 19:16:53 +00:00
Lingzhu Xiang	47eac21072	Make fixed-size Matrix and Array trivially copyable after C++20 Making them trivially copyable allows using std::memcpy() without undefined behaviors. Only Matrix and Array with trivially copyable DenseStorage are marked as trivially copyable with an additional type trait. As described in http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p0848r3.html it requires extremely verbose SFINAE to make the special member functions of fixed-size Matrix and Array trivial, unless C++20 concepts are available to simplify the selection of trivial special member functions given template parameters. Therefore only make this feature available to compilers that support C++20 P0848R3. Fix #1855.	2022-01-07 19:04:35 +00:00
Erik Schultheis	c20e908ebc	turn some macros intro constexpr functions	2021-12-10 19:27:01 +00:00
Erik Schultheis	c35679af27	fixed customIndices2Array forgetting first index	2021-12-10 16:41:59 +00:00
Erik Schultheis	e4c40b092a	disambiguate overloads for empty index list	2021-12-07 19:40:09 +00:00
Jens Wehner	c6fa0ca162	Idrsstabl	2021-12-06 20:00:00 +00:00
Erik Schultheis	cc11e240ac	Some further cleanup	2021-12-06 18:01:15 +00:00
Erik Schultheis	cd83f34d3a	fix typo `StableNorm` -> `stableNorm`	2021-12-04 14:52:09 +00:00
Jens Wehner	4ee2e9b340	Idrs refactoring	2021-12-02 23:32:07 +00:00
Jens Wehner	f63c6dd1f9	Bicgstabl	2021-12-02 22:48:22 +00:00
Erik Schultheis	2f65ec5302	fixed leftover else branch	2021-12-02 18:13:19 +00:00
Xinle Liu	7ef5f0641f	Remove macro EIGEN_GPU_TEST_C99_MATH Remove macro EIGEN_GPU_TEST_C99_MATH which is used in a single test file only and always defaults to true.	2021-12-01 14:48:56 +00:00
Erik Schultheis	ec2fd0f7ed	Require recent GCC and MSCV and removed `EIGEN_HAS_CXX14` and some other feature test macros	2021-12-01 00:48:34 +00:00
Erik Schultheis	4a76880351	Updated CMake This patch updates the minimum required CMake version to 3.10 and removes the EIGEN_TEST_CXX11 CMake option, including corresponding logic.	2021-11-29 20:24:20 +00:00
Erik Schultheis	f33a31b823	removed EIGEN_HAS_CXX11_* and redundant EIGEN_COMP_CXXVER checks	2021-11-29 19:18:57 +00:00
David Tellenbach	08da52eb85	Remove DenseBase::nonZeros() which just calls DenseBase::size() Fixes #2382.	2021-11-27 14:31:00 +00:00
Erik Schultheis	ec4efbd696	remove EIGEN_HAS_CXX11	2021-11-24 20:08:49 +00:00
Rasmus Munk Larsen	cfdb3ce3f0	Fix warnings about shadowing definitions.	2021-11-23 14:34:47 -08:00
Rasmus Munk Larsen	5e89573e2a	Implement Eigen::array<...>::reverse_iterator if std::reverse_iterator exists.	2021-11-20 00:22:46 +00:00
Rasmus Munk Larsen	11cb7b8372	Add basic iterator support for Eigen::array to ease transition to std::array in third-party libraries.	2021-11-19 05:14:30 +00:00
Antonio Sanchez	c107bd6102	Fix errors for windows build.	2021-11-19 04:23:25 +00:00
Rasmus Munk Larsen	96aeffb013	Make the new TensorIO implementation work with TensorMap with const elements.	2021-11-17 18:16:04 -08:00
Rasmus Munk Larsen	824d06eb36	Include <numeric> to get std::iota.	2021-11-18 00:47:18 +00:00
Antonio Sanchez	ffb78e23a1	Fix tensor broadcast off-by-one error. Caught by JAX unit tests. Triggered if broadcast is smaller than packet size.	2021-11-16 17:37:38 +00:00
cpp977	f73c95c032	Reimplemented the Tensor stream output.	2021-11-16 17:36:58 +00:00
Ben Barsdell	50df8d3d6d	Avoid integer overflow in EigenMetaKernel indexing - The current implementation computes `size + total_threads`, which can overflow and cause CUDA_ERROR_ILLEGAL_ADDRESS when size is close to the maximum representable value. - The num_blocks calculation can also overflow due to the implementation of divup(). - This patch prevents these overflows and allows the kernel to work correctly for the full representable range of tensor sizes. - Also adds relevant tests.	2021-11-05 16:39:37 +11:00
Rasmus Munk Larsen	55e3ae02ac	Compare summation results against forward error bound.	2021-11-04 18:04:04 -07:00
Antonio Sanchez	8f8c2ba2fe	Remove bad "take" impl that causes g++-11 crash. For some reason, having `take<n, numeric_list<T>>` for `n > 0` causes g++-11 to ICE with ``` sorry, unimplemented: unexpected AST of kind nontype_argument_pack ``` It does work with other versions of gcc, and with clang. I filed a GCC bug [here](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102999). Technically we should never actually run into this case, since you can't take n > 0 elements from an empty list. Commenting it out allows our Eigen tests to pass.	2021-11-01 17:04:41 +00:00
Antonio Sanchez	f6c8cc0e99	Fix TensorReduction warnings and error bound for sum accuracy test. The sum accuracy test currently uses the default test precision for the given scalar type. However, scalars are generated via a normal distribution, and given a large enough count and strong enough random generator, the expected sum is zero. This causes the test to periodically fail. Here we estimate an upper-bound for the error as `sqrt(N) * prec` for summing N values, with each having an approximate epsilon of `prec`. Also fixed a few warnings generated by MSVC when compiling the reduction test.	2021-10-30 14:59:00 -07:00
Rasmus Munk Larsen	b3bea43a2d	Don't use unrolled loops for stateful reducers. The problem is the combination step, e.g. reducer0.reducePacket(accum1, accum0); reducer0.reducePacket(accum2, accum0); reducer0.reducePacket(accum3, accum0); For the mean reducer this will increment the count as well as adding together the accumulators and result in the wrong count being divided into the sum at the end.	2021-10-28 23:52:54 +00:00
Fabian Keßler	19cacd3ecb	optimize cmake scripts for subproject use	2021-10-28 16:08:02 +02:00
Rohit Santhanam	48e40b22bf	Preliminary HIP bfloat16 GPU support.	2021-10-27 18:36:45 +00:00
Antonio Sánchez	185ad0e610	Revert "Avoid integer overflow in EigenMetaKernel indexing" This reverts commit `100d7caf92`	2021-10-27 14:55:25 +00:00
Ben Barsdell	100d7caf92	Avoid integer overflow in EigenMetaKernel indexing - The current implementation computes `size + total_threads`, which can overflow and cause CUDA_ERROR_ILLEGAL_ADDRESS when size is close to the maximum representable value. - The num_blocks calculation can also overflow due to the implementation of divup(). - This patch prevents these overflows and allows the kernel to work correctly for the full representable range of tensor sizes. - Also adds relevant tests.	2021-10-26 00:04:28 +00:00
Antonio Sanchez	a500da1dc0	Fix broadcasting oob error. For vectorized 1-dimensional inputs that do not take the special blocking path (e.g. `std::complex<...>`), there was an index-out-of-bounds error causing the broadcast size to be computed incorrectly. Here we fix this, and make other minor cleanup changes. Fixes #2351.	2021-10-25 19:31:12 +00:00
Nico	b17bcddbca	Fix -Wbitwise-instead-of-logical clang warning & and \| short-circuit, && and \|\| don't. When both arguments to those are boolean, the short-circuiting version is usually the desired one, so clang warns on this. Here, it is inconsequential, so switch to && and \|\| to suppress the warning.	2021-10-21 23:32:45 -04:00
Antonio Sanchez	24ebb37f38	Disable Tree reduction for GPU. For moderately sized inputs, running the Tree reduction quickly fills/overflows the GPU thread stack space, leading to memory errors. This was happening in the `cxx11_tensor_complex_gpu` test, for example. Disabling tree reduction on GPU fixes this.	2021-10-20 20:42:37 +00:00
Rasmus Munk Larsen	360290fc42	Improve accuracy of full tensor reduction for half and bfloat16 by reducing leaf size in tree reduction. Add more unit tests for summation accuracy.	2021-10-20 19:54:06 +00:00
Antonio Sanchez	d0d34524a1	Move CUDA/Complex.h to GPU/Complex.h, remove TensorReductionCuda.h The `Complex.h` file applies equally to HIP/CUDA, so placing under the generic `GPU` folder. The `TensorReductionCuda.h` has already been deprecated, now removing for the next Eigen version.	2021-10-20 12:00:19 -07:00
Rasmus Munk Larsen	1d75fab368	Speed up tensor reduction	2021-10-02 14:58:23 +00:00
Antonio Sanchez	be9e7d205f	Reduce tensor_contract_gpu test. The original test times out after 60 minutes on Windows, even when setting flags to optimize for speed. Reducing the number of contractions performed from 3600->27 for subtests 8,9 allow the two to run in just over a minute each.	2021-10-02 04:36:15 +00:00
Antonio Sanchez	701f5d1c91	Fix gpu special function tests. Some checks used incorrect values, partly from copy-paste errors, partly from the change in behaviour introduced in !398. Modified results to match scipy, simplified tests by updating `VERIFY_IS_CWISE_APPROX` to work for scalars.	2021-10-01 10:20:50 -07:00
Antonio Sanchez	de218b471d	Add -arch=<arch> argument for nvcc. Without this flag, when compiling with nvcc, if the compute architecture of a card does not exactly match any of those listed for `-gencode arch=compute_<arch>,code=sm_<arch>`, then the kernel will fail to run with: ``` cudaErrorNoKernelImageForDevice: no kernel image is available for execution on the device. ``` This can happen, for example, when compiling with an older cuda version that does not support a newer architecture (e.g. T4 is `sm_75`, but cuda 9.2 only supports up to `sm_70`). With the `-arch=<arch>` flag, the code will compile and run at the supplied architecture.	2021-09-24 20:48:01 -07:00
Antonio Sanchez	846d34384a	Rename EIGEN_CUDA_FLAGS to EIGEN_CUDA_CXX_FLAGS Also add a missing space for clang.	2021-09-24 20:15:55 -07:00

1 2 3 4 5 ...

3098 Commits