eigen

CFD/eigen

Author	SHA1	Message	Date
Eugene Zhulenev	e3dec4dcc1	ThreadLocal container that does not rely on thread local storage	2019-09-09 15:18:14 -07:00
Srinivas Vasudevan	e38dd48a27	PR 681: Add ndtri function, the inverse of the normal distribution function.	2019-08-12 19:26:29 -04:00
Eugene Zhulenev	47fefa235f	Allow move-only done callback in TensorAsyncDevice	2019-09-03 17:20:56 -07:00
Eugene Zhulenev	a8d264fa9c	Add test for const TensorMap underlying data mutation	2019-09-03 11:38:39 -07:00
Eugene Zhulenev	f0b36fb9a4	evalSubExprsIfNeededAsync + async TensorContractionThreadPool	2019-08-30 15:13:38 -07:00
Eugene Zhulenev	66665e7e76	Asynchronous expression evaluation with TensorAsyncDevice	2019-08-30 14:49:40 -07:00
Eugene Zhulenev	bc40d4522c	Const correctness in TensorMap<const Tensor<T, ...>> expressions	2019-08-28 17:46:05 -07:00
Eugene Zhulenev	071311821e	Remove XSMM support from Tensor module	2019-08-19 11:44:25 -07:00
Rasmus Munk Larsen	facc4e4536	Disable tests for contraction with output kernels when using libxsmm, which does not support this.	2019-08-07 14:11:15 -07:00
Eugene Zhulenev	6e7c76481a	Merge with Eigen head	2019-06-28 11:22:46 -07:00
Eugene Zhulenev	878845cb25	Add block access to TensorReverseOp and make sure that TensorForcedEval uses block access when preferred	2019-06-28 11:13:44 -07:00
Mehdi Goli	7d08fa805a	[SYCL] This PR adds the minimum modifications to the Eigen unsupported module required to run it on devices supporting SYCL. * Abstracting the pointer type so that both SYCL memory and pointer can be captured. * Converting SYCL virtual pointer to SYCL device memory in Eigen evaluator class. * Binding SYCL placeholder accessor to command group handler by using bind method in Eigen evaluator node. * Adding SYCL macro for controlling loop unrolling. * Modifying the TensorDeviceSycl.h and SYCL executor method to adopt the above changes.	2019-06-28 10:08:23 +01:00
tra	b4c49bf00e	Minor build improvements * Allow specifying multiple GPU architectures. E.g.: cmake -DEIGEN_CUDA_COMPUTE_ARCH="60;70" * Pass CUDA SDK path to clang. Without it it will default to /usr/local/cuda which may not be the right location, if cmake was invoked with -DCUDA_TOOLKIT_ROOT_DIR=/some/other/CUDA/path	2019-05-31 14:08:34 -07:00
Rasmus Larsen	c8d8d5c0fc	Merged in rmlarsen/eigen_threadpool (pull request PR-640) Fix deadlocks in thread pool. Approved-by: Eugene Zhulenev <ezhulenev@google.com>	2019-05-13 20:04:35 +00:00
Christoph Hertzberg	4ccd1ece92	bug #1707 : Fix deprecation warnings, or disable warnings when testing deprecated functions	2019-05-10 14:57:05 +02:00
Rasmus Munk Larsen	e5ac8cbd7a	A) fix deadlocks in thread pool caused by EventCount This fixed 2 deadlocks caused by sloppiness in the EventCount logic. Both most likely were introduced by cl/236729920 which includes the new EventCount algorithm: `01da8caf00` bug #1 (Prewait): Prewait must not consume existing signals. Consider the following scenario. There are 2 thread pool threads (1 and 2) and 1 external thread (3). RunQueue is empty. Thread 1 checks the queue, calls Prewait, checks RunQueue again and now is going to call CommitWait. Thread 2 checks the queue and now is going to call Prewait. Thread 3 submits 2 tasks, EventCount signals is set to 1 because only 1 waiter is registered the second signal is discarded). Now thread 2 resumes and calls Prewait and takes away the signal. Thread 1 resumes and calls CommitWait, there are no pending signals anymore, so it blocks. As the result we have 2 tasks, but only 1 thread is running. bug #2 (CancelWait): CancelWait must not take away a signal if it's not sure that the signal was meant for this thread. When one thread blocks and another submits a new task concurrently, the EventCount protocol guarantees only the following properties (similar to the Dekker's algorithm): (a) the registered waiter notices presence of the new task and does not block (b) the signaler notices presence of the waiters and wakes it (c) both the waiter notices presence of the new task and signaler notices presence of the waiter [it's only that both of them do not notice each other must not be possible, because it would lead to a deadlock] CancelWait is called for cases (a) and (c). For case (c) it is OK to take the notification signal away, but it's not OK for (a) because nobody queued a signals for us and we take away a signal meant for somebody else. Consider: Thread 1 calls Prewait, checks RunQueue, it's empty, now it's going to call CommitWait. Thread 3 submits 2 tasks, EventCount signals is set to 1 because only 1 waiter is registered the second signal is discarded). Thread 2 calls Prewait, checks RunQueue, discovers the tasks, calls CancelWait and consumes the pending signal (meant for thread 1). Now Thread 1 resumes and calls CommitWait, since there are no signals it blocks. As the result we have 2 tasks, but only 1 thread is running. Both deadlocks are only a problem if the tasks require parallelism. Most computational tasks do not require parallelism, i.e. a single thread will run task 1, finish it and then dequeue and run task 2. This fix undoes some of the sloppiness in the EventCount that was meant to reduce CPU consumption by idle threads, because we now have more threads running in these corner cases. But we still don't have pthread_yield's and maybe the strictness introduced by this change will actually help to reduce tail latency because we will have threads running when we actually need them running. B) fix deadlock in thread pool caused by RunQueue This fixed a deadlock caused by sloppiness in the RunQueue logic. Most likely this was introduced with the non-blocking thread pool. The deadlock only affects workloads that require parallelism. Most computational tasks don't require parallelism. PopBack must not fail spuriously. If it does, it can effectively lead to single thread consuming several wake up signals. Consider 2 worker threads are blocked. External thread submits a task. One of the threads is woken. It tries to steal the task, but fails due to a spurious failure in PopBack (external thread submits another task and holds the lock). The thread executes blocking protocol again (it won't block because NonEmptyQueueIndex is precise and the thread will discover pending work, but it has called PrepareWait). Now external thread submits another task and signals EventCount again. The signal is consumed by the first thread again. But now we have 2 tasks pending but only 1 worker thread running. It may be possible to fix this in a different way: make EventCount::CancelWait forward wakeup signal to a blocked thread rather then consuming it. But this looks more complex and I am not 100% that it will fix the bug. It's also possible to have 2 versions of PopBack: one will do try_to_lock and another won't. Then worker threads could first opportunistically check all queues with try_to_lock, and only use the blocking version before blocking. But let's first fix the bug with the simpler change.	2019-05-08 10:16:46 -07:00
Eugene Zhulenev	5d9a6686ed	Block evaluation for TensorGeneratorOp	2019-03-05 16:35:21 -08:00
Eugene Zhulenev	b1a8627493	Do not create Tensor<const T> in cxx11_tensor_forced_eval test	2019-03-05 11:19:25 -08:00
Eugene Zhulenev	b95941e5c2	Add tiled evaluation for TensorForcedEvalOp	2019-03-04 16:02:22 -08:00
Rasmus Munk Larsen	6560692c67	Improve EventCount used by the non-blocking threadpool. The current algorithm requires threads to commit/cancel waiting in order they called Prewait. Spinning caused by that serialization can consume lots of CPU time on some workloads. Restructure the algorithm to not require that serialization and remove spin waits from Commit/CancelWait. Note: this reduces max number of threads from 2^16 to 2^14 to leave more space for ABA counter (which is now 22 bits). Implementation details are explained in comments.	2019-02-22 13:56:26 -08:00
Christoph Hertzberg	934b8a1304	Avoid `I` as an identifier, since it may clash with the C-header complex.h	2019-01-25 14:54:39 +01:00
Rasmus Munk Larsen	ee550a2ac3	Fix flaky test for tensor fft.	2019-01-16 14:03:12 -08:00
Gael Guennebaud	d812f411c3	bug #1654 : fix compilation with cuda and no c++11	2019-01-09 18:00:05 +01:00
Gael Guennebaud	450dc97c6b	Various fixes in polynomial solver and its unit tests: - cleanup noise in imaginary part of real roots - take into account the magnitude of the derivative to check roots. - use <= instead of < at appropriate places	2018-12-09 22:54:39 +01:00
Christoph Hertzberg	0ec8afde57	Fixed most conversion warnings in MatrixFunctions module	2018-11-20 16:23:28 +01:00
luz.paz"	f67b19a884	[PATCH 1/2] Misc. typos From 68d431b4c14ad60a778ee93c1f59ecc4b931950e Mon Sep 17 00:00:00 2001 Found via `codespell -q 3 -I ../eigen-word-whitelist.txt` where the whitelists consists of: ``` als ans cas dum lastr lowd nd overfl pres preverse substraction te uint whch ``` --- CMakeLists.txt \| 26 +++++++++---------- Eigen/src/Core/GenericPacketMath.h \| 2 +- Eigen/src/SparseLU/SparseLU.h \| 2 +- bench/bench_norm.cpp \| 2 +- doc/HiPerformance.dox \| 2 +- doc/QuickStartGuide.dox \| 2 +- .../Eigen/CXX11/src/Tensor/TensorChipping.h \| 6 ++--- .../Eigen/CXX11/src/Tensor/TensorDeviceGpu.h \| 2 +- .../src/Tensor/TensorForwardDeclarations.h \| 4 +-- .../src/Tensor/TensorGpuHipCudaDefines.h \| 2 +- .../Eigen/CXX11/src/Tensor/TensorReduction.h \| 2 +- .../CXX11/src/Tensor/TensorReductionGpu.h \| 2 +- .../test/cxx11_tensor_concatenation.cpp \| 2 +- unsupported/test/cxx11_tensor_executor.cpp \| 2 +- 14 files changed, 29 insertions(+), 29 deletions(-)	2018-09-18 04:15:01 -04:00
Rasmus Munk Larsen	07fcdd1438	Merged in ezhulenev/eigen-02 (pull request PR-534) Fix cxx11_tensor_{block_access, reduction} tests	2018-10-25 18:34:35 +00:00
Eugene Zhulenev	8a977c1f46	Fix cxx11_tensor_{block_access, reduction} tests	2018-10-25 11:31:29 -07:00
Christoph Hertzberg	40fa6f98bf	bug #1606 : Explicitly set the standard before find_package(StandardMathLibrary). Also replace EIGEN_COMPILER_SUPPORT_CXX11 in favor of EIGEN_COMPILER_SUPPORT_CPP11. Grafted manually from `a4afa90d16`	2018-10-19 17:20:51 +02:00
Eugene Zhulenev	900c7c61bb	Check if it's allowed to squueze inner dimensions in TensorBlockIO	2018-10-15 16:52:33 -07:00
Gael Guennebaud	d835a0bf53	relax number of iterations checks to avoid false negatives	2018-10-15 10:23:32 +02:00
Gael Guennebaud	8214cf1896	Make sparse_basic includable from sparse_extra, but disable it since sparse_basic(DynamicSparseMatrix) does not compile at all anyways	2018-10-11 10:27:23 +02:00
Gael Guennebaud	93a6192e98	fix mpreal for mpfr<4.0.0	2018-10-09 09:15:22 +02:00
Rasmus Munk Larsen	1a737e1d6a	Fix contraction test.	2018-10-08 16:37:07 -07:00
Gael Guennebaud	2eda9783de	typo	2018-10-08 21:37:46 +02:00
Gael Guennebaud	6cc9b2c831	fix warning in mpreal.h	2018-10-08 18:25:37 +02:00
Gael Guennebaud	e29bfe8479	Update included mpreal header to 3.6.5 and fix deprecated warnings.	2018-10-08 17:09:23 +02:00
Christoph Hertzberg	c5f1d0a72a	Fix shadow warning	2018-10-02 19:01:08 +02:00
Rasmus Munk Larsen	7c1b47840a	Merged in ezhulenev/eigen-01 (pull request PR-514) Add tests for evalShardedByInnerDim contraction + fix bugs	2018-09-28 18:37:54 +00:00
Eugene Zhulenev	524c81f3fa	Add tests for evalShardedByInnerDim contraction + fix bugs	2018-09-28 11:24:08 -07:00
Eugene Zhulenev	9f4988959f	Remove explicit mkldnn support and redundant TensorContractionKernelBlocking	2018-09-27 11:49:19 -07:00
Eugene Zhulenev	b314376f9c	Test mkldnn pack for doubles	2018-09-26 18:22:24 -07:00
Eugene Zhulenev	22ed98a331	Conditionally add mkldnn test	2018-09-26 17:57:37 -07:00
Eugene Zhulenev	71cd3fbd6a	Support multiple contraction kernel types in TensorContractionThreadPool	2018-09-26 11:08:47 -07:00
Christoph Hertzberg	0a3356f4ec	Don't deactivate BVH test for clang (probably, this was failing for very old versions of clang)	2018-09-25 20:26:16 +02:00
Christoph Hertzberg	2c083ace3e	Provide EIGEN_OVERRIDE and EIGEN_FINAL macros to mark virtual function overrides	2018-09-24 18:01:17 +02:00
Eugene Zhulenev	719e438a20	Collapsed revision * Split cxx11_tensor_executor test * Register test parts with EIGEN_SUFFIXES * Fix EIGEN_SUFFIXES in cxx11_tensor_executor test	2018-09-20 15:19:12 -07:00
Christoph Hertzberg	c50250cb24	Avoid warning "suggest braces around initialization of subobject". This test is not run in C++03 mode, so no compatibility is lost.	2018-09-20 17:03:42 +02:00
Eugene Zhulenev	218a7b9840	Enable DSizes type promotion with c++03 compilers	2018-09-18 10:57:00 -07:00
Ravi Kiran	1f0c941c3d	Collapsed revision * Merged eigen/eigen into default	2018-09-17 18:29:12 -07:00
Rasmus Munk Larsen	03a88c57e1	Merged in ezhulenev/eigen-02 (pull request PR-498) Add DSizes index type promotion	2018-09-17 21:58:38 +00:00
Eugene Zhulenev	a5cd4e9ad1	Replace deprecated Eigen::DenseIndex with Eigen::Index in TensorIndexList	2018-09-17 10:58:07 -07:00
Eugene Zhulenev	66f056776f	Add DSizes index type promotion	2018-09-15 15:17:38 -07:00
Rasmus Munk Larsen	14e35855e1	Merged in chtz/eigen-maxsizevector (pull request PR-490) Let MaxSizeVector respect alignment of objects Approved-by: Rasmus Munk Larsen <rmlarsen@google.com>	2018-09-14 23:29:24 +00:00
Eugene Zhulenev	1b8d70a22b	Support reshaping with static shapes and dimensions conversion in tensor broadcasting	2018-09-14 15:25:27 -07:00
Christoph Hertzberg	007f165c69	bug #1598 : Let MaxSizeVector respect alignment of objects and add a unit test Also revert `8b3d9ed081`	2018-09-14 20:21:56 +02:00
Rasmus Munk Larsen	9b864cdb37	Merged in rmlarsen/eigen3 (pull request PR-480) Avoid compilation error in C++11 test when EIGEN_AVOID_STL_ARRAY is set.	2018-09-14 00:05:09 +00:00
Rasmus Munk Larsen	d0eef5fe6c	Don't use bracket syntax in ctor.	2018-09-13 17:04:05 -07:00
Rasmus Munk Larsen	6313dde390	Fix merge error.	2018-09-13 16:42:05 -07:00
Rasmus Munk Larsen	0db590d22d	Backed out changeset `01197e4452`	2018-09-13 16:20:57 -07:00
Rasmus Munk Larsen	b3f4c067d9	Merge	2018-09-13 16:18:52 -07:00
Eugene Zhulenev	01197e4452	Fix warnings	2018-09-13 15:03:36 -07:00
Eugene Zhulenev	d138fe341d	Fis static_assert in test to conform c++11 standard	2018-09-11 17:23:18 -07:00
Eugene Zhulenev	55bb7e7935	Merge with upstream eigen/default	2018-09-11 13:33:06 -07:00
Eugene Zhulenev	81b38a155a	Fix compilation of tiled evaluation code with c++03	2018-09-11 13:32:32 -07:00
Rasmus Munk Larsen	46f88fc454	Use numerically stable tree reduction in TensorReduction.	2018-09-11 10:08:10 -07:00
Rasmus Munk Larsen	3d057e0453	Avoid compilation error in C++11 test when EIGEN_AVOID_STL_ARRAY is set.	2018-09-06 12:59:36 -07:00
Christoph Hertzberg	ff4e835d6b	"sparse_product.cpp" must be included before "sparse_basic.cpp", otherwise EIGEN_SPARSE_CREATE_TEMPORARY_PLUGIN has no effect	2018-08-30 20:10:11 +02:00
Christoph Hertzberg	20ba2eee6d	gcc thinks this may not be initialized	2018-08-28 18:33:24 +02:00
Christoph Hertzberg	73ca600bca	Fix numerous shadow-warnings for GCC<=4.8	2018-08-28 18:32:39 +02:00
Eugene Zhulenev	c144bb355b	Merge with upstream eigen/default	2018-08-27 14:34:07 -07:00
Christoph Hertzberg	42123ff38b	Make unit test C++03 compatible	2018-08-25 11:53:28 +02:00
Christoph Hertzberg	117bc5d505	Fix some shadow warnings	2018-08-25 09:06:08 +02:00
Christoph Hertzberg	f155e97adb	Previous fix broke compilation for clang	2018-08-25 00:10:46 +02:00
Christoph Hertzberg	209b4972ec	Fix conversion warning	2018-08-25 00:02:46 +02:00
Christoph Hertzberg	495f6c3c3a	Fix missing-braces warnings	2018-08-24 23:56:13 +02:00
Christoph Hertzberg	8295f02b36	Hide "maybe uninitialized" warning on gcc	2018-08-24 23:22:20 +02:00
Christoph Hertzberg	f7675b826b	Fix several integer conversion and sign-compare warnings	2018-08-24 22:58:55 +02:00
Benoit Steiner	ff8e0ecc2f	Updated one more line of code to avoid making the test dependent on cxx11 features.	2018-08-17 15:15:52 -07:00
Benoit Steiner	43d9dd9b28	Removed more dependencies on cxx11.	2018-08-17 08:49:32 -07:00
Benoit Steiner	ede580ccda	Avoid using the auto keyword to make the tensor block access test more portable	2018-08-16 10:49:47 -07:00
Benoit Steiner	4181556907	Fixed the tensor contraction code.	2018-08-15 09:34:47 -07:00
Benoit Steiner	b6f96cf7dd	Removed dependencies on cxx11 language features from the tensor_block_access test	2018-08-15 08:54:31 -07:00
Benoit Steiner	6bb3f1b43e	Made the tensor_block_access test compile again	2018-08-14 14:26:59 -07:00
Benoit Steiner	43ec0082a6	Made the kronecker_product test compile again	2018-08-14 14:08:36 -07:00
Rasmus Munk Larsen	aebdb06424	Fix a few compiler warnings in CXX11 tests.	2018-08-14 12:06:39 -07:00
Eugene Zhulenev	f2209d06e4	Add block evaluationto CwiseUnaryOp and add PreferBlockAccess enum to all evaluators	2018-08-10 16:53:36 -07:00
Benoit Steiner	3d3711f22f	Fixed compilation errors.	2018-08-13 15:16:06 -07:00
Eugene Zhulenev	cfaedb38cd	Fix bug in a test + compilation errors	2018-08-09 09:44:07 -07:00
Eugene Zhulenev	1c8b9e10a7	Merged with upstream eigen	2018-08-08 16:57:58 -07:00
Benoit Steiner	dd5875e30d	Merged in codeplaysoftware/eigen-upstream-pure/constructor_error_clang (pull request PR-451) Fixing ambigous constructor error for Clang compiler.	2018-08-02 20:46:03 +00:00
Mehdi Goli	516d2621b9	fixing compilation error for cxx11_tensor_trace.cpp error on Microsoft Visual Studio.	2018-08-02 14:30:48 +01:00
Mehdi Goli	40d6d020a0	Fixing ambigous constructor error for Clang compiler.	2018-08-02 13:34:53 +01:00
Benoit Steiner	93b9e36e10	Merged in paultucker/eigen (pull request PR-431) Optional ThreadPoolDevice allocator Approved-by: Benoit Steiner <benoit.steiner.goog@gmail.com>	2018-08-01 19:14:34 +00:00
Eugene Zhulenev	385b3ff12f	Merged latest changes from upstream/eigen	2018-08-01 11:59:04 -07:00
Eugene Zhulenev	83c0a16baf	Add block evaluation support to TensorOps	2018-07-31 15:56:31 -07:00
Gael Guennebaud	678a0dcb12	Merged in ezhulenev/eigen/tiling_3 (pull request PR-438) Tiled tensor executor	2018-07-31 08:13:00 +00:00
Eugene Zhulenev	966c2a7bb6	Rename Index to StorageIndex + use Eigen::Array and Eigen::Map when possible	2018-07-27 12:45:17 -07:00
Eugene Zhulenev	6913221c43	Add tiled evaluation support to TensorExecutor	2018-07-25 13:51:10 -07:00
Eugene Zhulenev	d55efa6f0f	TensorBlockIO	2018-07-23 15:50:55 -07:00
Eugene Zhulenev	34a75c3c5c	Initial support of TensorBlock	2018-07-20 17:37:20 -07:00
Gustavo Lima Chaves	02eaaacbc5	Move cxx11_tensor_uint128 test under an EIGEN_TEST_CXX11 guarded block Builds configured without the -DEIGEN_TEST_CXX11=ON flag would fail right away without this, as this test seems to rely on those language features. The skip under compilation with MSVC was kept.	2018-07-20 16:08:40 -07:00
Paul Tucker	d4afccde5a	Add test coverage for ThreadPoolDevice optional allocator.	2018-07-19 17:43:44 -07:00
Gael Guennebaud	add5757488	Simplify handling and non-splitted tests and include split_test_helper.h instead of re-generating it. This also allows us to modify it without breaking existing build folder.	2018-07-16 18:55:40 +02:00
Gael Guennebaud	901c7d31f0	Fix usage of EIGEN_SPLIT_LARGE_TESTS=ON: some unit tests, such as indexed_view have to be split unconditionally.	2018-07-16 18:35:05 +02:00
Gael Guennebaud	7ccb623746	bug #1569 : fix Tensor<half>::mean() on AVX with respective unit test.	2018-07-19 13:15:40 +02:00
Gael Guennebaud	44ea5f7623	Add unit test for -Tensor<complex> on GPU	2018-07-12 17:19:38 +02:00
Thales Sabino	9a6a43319f	Fix cxx11_tensor_fft not building on Windows. The type used in Eigen::DSizes needs to be at least 8 bytes long. Internally Tensor tries to convert this to an __int64 on Windows and this fails to build. On Linux, long and long long are both 8 byte integer types. * * * Changing from "long long" to "std::int64_t".	2018-07-12 11:20:59 +01:00
Gustavo Lima Chaves	705f66a9ca	Account for missing change on commit "Remove SimpleThreadPool and..." "... always use {NonBlocking}ThreadPool". It seems the non-blocking implementation was me the default/only one, but a reference to the old name was left unmodified. Fix that.	2018-07-23 16:29:09 -07:00
Deven Desai	38807a2575	merging updates from upstream	2018-07-11 09:17:33 -04:00
Gael Guennebaud	6190aa5632	bug #1567 : add optimized path for tensor broadcasting and 'Channel First' shape	2018-07-09 11:23:16 +02:00
Deven Desai	1bb6fa99a3	merging the CUDA and HIP implementation for the Tensor directory and the unit tests	2018-06-20 16:44:58 -04:00
Deven Desai	cfdabbcc8f	removing the *Hip files from the unsupported/Eigen/CXX11/src/Tensor and unsupported/test directories	2018-06-20 12:57:02 -04:00
Deven Desai	7e41c8f1a9	renaming Cuda files to Gpu in the unsupported/Eigen/CXX11/src/Tensor and unsupported/test directories	2018-06-20 12:52:30 -04:00
Deven Desai	b6cc0961b1	updates based on PR feedback There are two major changes (and a few minor ones which are not listed here...see PR discussion for details) 1. Eigen::half implementations for HIP and CUDA have been merged. This means that - `CUDA/Half.h` and `HIP/hcc/Half.h` got merged to a new file `GPU/Half.h` - `CUDA/PacketMathHalf.h` and `HIP/hcc/PacketMathHalf.h` got merged to a new file `GPU/PacketMathHalf.h` - `CUDA/TypeCasting.h` and `HIP/hcc/TypeCasting.h` got merged to a new file `GPU/TypeCasting.h` After this change the `HIP/hcc` directory only contains one file `math_constants.h`. That will go away too once that file becomes a part of the HIP install. 2. new macros EIGEN_GPUCC, EIGEN_GPU_COMPILE_PHASE and EIGEN_HAS_GPU_FP16 have been added and the code has been updated to use them where appropriate. - `EIGEN_GPUCC` is the same as `(EIGEN_CUDACC \|\| EIGEN_HIPCC)` - `EIGEN_GPU_DEVICE_COMPILE` is the same as `(EIGEN_CUDA_ARCH \|\| EIGEN_HIP_DEVICE_COMPILE)` - `EIGEN_HAS_GPU_FP16` is the same as `(EIGEN_HAS_CUDA_FP16 or EIGEN_HAS_HIP_FP16)`	2018-06-14 10:21:54 -04:00
Deven Desai	d1d22ef0f4	syncing this fork with upstream	2018-06-13 12:09:52 -04:00
Gael Guennebaud	67ec37f7b0	Activate dgmres unit test	2018-07-02 12:54:14 +02:00
Michael Figurnov	30fa3d0454	Merge from eigen/eigen	2018-06-07 17:57:56 +01:00
Michael Figurnov	6c71c7d360	Merge from eigen/eigen.	2018-06-07 15:54:18 +01:00
Michael Figurnov	aa813d417b	Fix compilation of special functions without C99 math. The commit with Bessel functions i0e and i1e placed the ifdef/endif incorrectly, causing i0e/i1e to be undefined when EIGEN_HAS_C99_MATH=0. These functions do not actually require C99 math, so now they are always available.	2018-06-07 14:35:07 +01:00
Gael Guennebaud	b3fd93207b	Fix typos found using codespell	2018-06-07 14:43:02 +02:00
Michael Figurnov	4bd158fa37	Derivative of the incomplete Gamma function and the sample of a Gamma random variable. In addition to igamma(a, x), this code implements: * igamma_der_a(a, x) = d igamma(a, x) / da -- derivative of igamma with respect to the parameter * gamma_sample_der_alpha(alpha, sample) -- reparameterization derivative of a Gamma(alpha, 1) random variable sample with respect to the alpha parameter The derivatives are computed by forward mode differentiation of the igamma(a, x) code. Although gamma_sample_der_alpha can be implemented via igamma_der_a, a separate function is more accurate and efficient due to analytical cancellation of some terms. All three functions are implemented by a method parameterized with "mode" that always computes the derivatives, but does not return them unless required by the mode. The compiler is expected to (and, based on benchmarks, does) skip the unnecessary computations depending on the mode.	2018-06-06 18:49:26 +01:00
Deven Desai	8fbd47052b	Adding support for using Eigen in HIP kernels. This commit enables the use of Eigen on HIP kernels / AMD GPUs. Support has been added along the same lines as what already exists for using Eigen in CUDA kernels / NVidia GPUs. Application code needs to explicitly define EIGEN_USE_HIP when using Eigen in HIP kernels. This is because some of the CUDA headers get picked up by default during Eigen compile (irrespective of whether or not the underlying compiler is CUDACC/NVCC, for e.g. Eigen/src/Core/arch/CUDA/Half.h). In order to maintain this behavior, the EIGEN_USE_HIP macro is used to switch to using the HIP version of those header files (see Eigen/Core and unsupported/Eigen/CXX11/Tensor) Use the "-DEIGEN_TEST_HIP" cmake option to enable the HIP specific unit tests.	2018-06-06 10:12:58 -04:00
Michael Figurnov	f216854453	Exponentially scaled modified Bessel functions of order zero and one. The functions are conventionally called i0e and i1e. The exponentially scaled version is more numerically stable. The standard Bessel functions can be obtained as i0(x) = exp(\|x\|) i0e(x) The code is ported from Cephes and tested against SciPy.	2018-05-31 15:34:53 +01:00
Vamsi Sripathi	6293ad3f39	Performance improvements to tensor broadcast operation 1. Added new packet functions using SIMD for NByOne, OneByN cases 2. Modified existing packet functions to reduce index calculations when input stride is non-SIMD 3. Added 4 test cases to cover the new packet functions	2018-05-23 14:02:05 -07:00
Rasmus Munk Larsen	afec3021f7	Use numext::maxi & numext::mini.	2018-05-14 16:35:39 -07:00
Rasmus Munk Larsen	b8c8e5f436	Add vectorized clip functor for Eigen Tensors.	2018-05-14 16:07:13 -07:00
Gael Guennebaud	2f3287da7d	Fix "used uninitialized" warnings	2018-04-24 17:17:25 +02:00
Gael Guennebaud	3ffd449ef5	Workaround warning	2018-04-24 17:11:51 +02:00
Christoph Hertzberg	84dcd998a9	Recent Adolc versions require C++11	2018-04-13 19:10:23 +02:00
Viktor Csomor	000840cae0	Added a move constructor and move assignment operator to Tensor and wrote some tests.	2018-02-07 19:10:54 +01:00
Deven Desai	f124f07965	applying EIGEN_DECLARE_TEST to gpu tests Also, a few minor fixes for GPU tests running in HIP mode. 1. Adding an include for hip/hip_runtime.h in the Macros.h file For HIP __host__ and __device__ are macros which are defined in hip headers. Their definitions need to be included before their use in the file. 2. Fixing the compile failure in TensorContractionGpu introduced by the commit to "Fuse computations into the Tensor contractions using output kernel" 3. Fixing a HIP/clang specific compile error by making the struct-member assignment explicit	2018-07-17 14:16:48 -04:00
Gael Guennebaud	82f0ce2726	Get rid of EIGEN_TEST_FUNC, unit tests must now be declared with EIGEN_DECLARE_TEST(mytest) { /* code */ }. This provide several advantages: - more flexibility in designing unit tests - unit tests can be glued to speed up compilation - unit tests are compiled with same predefined macros, which is a requirement for zapcc	2018-07-17 14:46:15 +02:00
Eugene Zhulenev	01fd4096d3	Fuse computations into the Tensor contractions using output kernel	2018-07-10 13:16:38 -07:00
Benoit Steiner	8f55956a57	Update the padding computation for PADDING_SAME to be consistent with TensorFlow.	2018-01-30 20:22:12 +00:00
RJ Ryan	59985cfd26	Disable use of recurrence for computing twiddle factors. Fixes FFT precision issues for large FFTs. https://github.com/tensorflow/tensorflow/issues/10749#issuecomment-354557689	2017-12-31 10:44:56 -05:00
Yangzihao Wang	3122477c86	Update the padding computation for PADDING_SAME to be consistent with TensorFlow.	2017-12-12 11:15:24 -08:00
Benoit Steiner	a6d875bac8	Removed unecesasry #include	2017-10-22 08:12:45 -07:00
Gael Guennebaud	a91918a105	Merged in infinitei/eigen (pull request PR-328) bug #1464 : Fixes construction of EulerAngles from 3D vector expression. Approved-by: Tal Hadad <tal_hd@hotmail.com> Approved-by: Abhijit Kundu <abhijit.kundu@gatech.edu>	2017-09-06 08:42:14 +00:00
Benoit Steiner	a4089991eb	Added support for CUDA 9.0.	2017-08-31 02:49:39 +00:00
Abhijit Kundu	6d991a9595	bug #1464 : Fixes construction of EulerAngles from 3D vector expression.	2017-08-30 13:26:30 -04:00
Gael Guennebaud	304ef29571	Handle min/max/inf/etc issue in cuda_fp16.h directly in test/main.h	2017-08-24 11:26:41 +02:00
Gael Guennebaud	21633e585b	bug #1462 : remove all occurences of the deprecated __CUDACC_VER__ macro by introducing EIGEN_CUDACC_VER	2017-08-24 11:06:47 +02:00
Benoit Steiner	c5a241ab9b	Merged in benoitsteiner/opencl (pull request PR-323) Improved support for OpenCL	2017-07-07 16:27:33 +00:00
Benoit Steiner	9daed67952	Merged in tntnatbry/eigen (pull request PR-319) Tensor Trace op	2017-07-07 04:18:03 +00:00
Benoit Steiner	62b4634ebe	Merged in mehdi_goli/upstr_benoit/TensorSYCLImageVolumePatchFixed (pull request PR-14) Applying Benoit's comment for Fixing ImageVolumePatch. * Applying Benoit's comment for Fixing ImageVolumePatch. Fixing conflict on cmake file. * Fixing dealocation of the memory in ImagePatch test for SYCL. * Fixing the automerge issue.	2017-07-06 05:08:13 +00:00
Benoit Steiner	b8e805497e	Merged in benoitsteiner/opencl (pull request PR-318) Improved support for OpenCL	2017-06-13 05:01:10 +00:00
Gael Guennebaud	8640093af1	fix compilation in C++98	2017-06-09 12:45:01 +02:00
Benoit Steiner	9dee55ec33	Merged eigen/eigen into default	2017-05-26 09:01:04 -07:00
a-doumoulakis	7a8ba565f8	Merge changed from upstream	2017-05-24 17:45:29 +01:00
Mmanu Chaturvedi	2971503fed	Specializing numeric_limits For AutoDiffScalar	2017-05-23 17:12:36 -04:00
Mehdi Goli	61d7f3664a	Fixing Cmake Dependency for SYCL	2017-05-22 14:58:28 +01:00
a-doumoulakis	052426b824	Add support for triSYCL Eigen is now able to use triSYCL with EIGEN_SYCL_TRISYCL and TRISYCL_INCLUDE_DIR options Fix contraction kernel with correct nd_item dimension	2017-05-05 19:26:27 +01:00
RJ Ryan	949a2da38c	Use scalar_sum_op and scalar_quotient_op instead of operator+ and operator/ in MeanReducer. Improves support for std::complex types when compiling for CUDA. Expands on `e2e9cdd169` and `2bda1b0d93` .	2017-04-14 13:23:35 -07:00
Benoit Steiner	068cc09708	Preserve file naming conventions	2017-04-04 10:09:10 -07:00
Mehdi Goli	bd64ee8555	Fixing TensorArgMaxSycl.h; Removing warning related to the hardcoded type of dims to be int in Argmax.	2017-03-28 16:50:34 +01:00
Benoit Steiner	f8a622ef3c	Merged eigen/eigen into default	2017-03-15 20:06:19 -07:00
Luke Iwanski	9597d6f6ab	Temporary: Disables cxx11_tensor_argmax_sycl test since it is causing zombie thread	2017-03-15 19:28:09 +00:00
Rasmus Munk Larsen	344c2694a6	Make the non-blocking threadpool more flexible and less wasteful of CPU cycles for high-latency use-cases. * Adds a hint to ThreadPool allowing us to turn off spin waiting. Currently each reader and record yielder op in a graph creates a threadpool with a thread that spins for 1000 iterations through the work stealing loop before yielding. This is wasteful for such ops that process I/O. * This also changes the number of iterations through the steal loop to be inversely proportional to the number of threads. Since the time of each iteration is proportional to the number of threads, this yields roughly a constant spin time. * Implement a separate worker loop for the num_threads == 1 case since there is no point in going through the expensive steal loop. Moreover, since Steal() calls PopBack() on the victim queues it might reverse the order in which ops are executed, compared to the order in which they are scheduled, which is usually counter-productive for the types of I/O workloads the single thread pools tend to be used for. * Store num_threads in a member variable for simplicity and to avoid a data race between the thread creation loop and worker threads calling threads_.size().	2017-03-09 15:41:03 -08:00
Mehdi Goli	f84963ed95	Adding TensorIndexTuple and TensorTupleReduceOP backend (ArgMax/Min) for sycl; fixing the address space issue for const TensorMap; converting all discard_write to write due to data missmatch.	2017-03-07 14:27:10 +00:00
Mehdi Goli	8296b87d7b	Adding sycl backend for TensorCustomOp; fixing the partial lhs modification issue on sycl when the rhs is TensorContraction, reduction or convolution; Fixing the partial modification for memset when sycl backend is used.	2017-02-28 17:16:14 +00:00
Mehdi Goli	2fa2b617a9	Adding TensorVolumePatchOP.h for sycl	2017-02-24 19:16:24 +00:00
Mehdi Goli	89dfd51fae	Adding Sycl Backend for TensorGenerator.h.	2017-02-22 16:36:24 +00:00
Mehdi Goli	4f07ac16b0	Reducing the number of warnings.	2017-02-21 10:09:47 +00:00
Mehdi Goli	79ebc8f761	Adding Sycl backend for TensorImagePatchOP.h; adding Sycl backend for TensorInflation.h.	2017-02-20 12:11:05 +00:00
Mehdi Goli	91982b91c0	Adding TensorLayoutSwapOp for sycl.	2017-02-15 16:28:12 +00:00
Mehdi Goli	b1e312edd6	Adding TensorPatch.h for sycl backend.	2017-02-15 10:13:01 +00:00
Mehdi Goli	0d153ded29	Adding TensorChippingOP for sycl backend; fixing the index value in the verification operation for cxx11_tensorChipping.cpp test	2017-02-13 17:25:12 +00:00
Mehdi Goli	0ee97b60c2	Adding mean to TensorReductionSycl.h	2017-02-07 15:43:17 +00:00
Mehdi Goli	42bd5c4e7b	Fixing TensorReductionSycl for min and max.	2017-02-06 18:05:23 +00:00
Mehdi Goli	ff53050034	Converting ptrdiff_t type to int64_t type in cxx11_tensor_contract_sycl.cpp in order to be the same as other tests.	2017-02-01 15:36:03 +00:00
Mehdi Goli	bab29936a1	Reducing warnings in Sycl backend.	2017-02-01 15:29:53 +00:00
Benoit Steiner	fbc39fd02c	Merge latest changes from upstream	2017-01-30 15:25:57 -08:00
Rasmus Munk Larsen	5e144bbaa4	Make NaN propagatation consistent between the pmax/pmin and std::max/std::min. This makes the NaN propagation consistent between the scalar and vectorized code paths of Eigen's scalar_max_op and scalar_min_op. See #1373 for details.	2017-01-24 13:32:50 -08:00
Mehdi Goli	6bdd15f572	Adding non-deferrenciable pointer track for ComputeCpp backend; Adding TensorConvolutionOp for ComputeCpp; fixing typos. modifying TensorDeviceSycl to use the LegacyPointer class.	2017-01-19 11:30:59 +00:00
Mehdi Goli	e46e722381	Adding Tensor ReverseOp; TensorStriding; TensorConversionOp; Modifying Tensor Contractsycl to be located in any place in the expression tree.	2017-01-16 13:58:49 +00:00
Benoit Steiner	0657228569	Simplified the way we link libxsmm	2016-12-21 14:40:08 -08:00
Benoit Steiner	c19fe5e9ed	Added support for libxsmm in the eigen makefiles	2016-12-21 10:43:40 -08:00
Benoit Steiner	0f577d4744	Merged eigen/eigen into default	2016-12-20 17:02:06 -08:00
Gael Guennebaud	e8d6862f14	Properly adjust precision when saving to Market format.	2016-12-20 22:10:33 +01:00
Gael Guennebaud	e2f4ee1c2b	Speed up parsing of sparse Market file.	2016-12-20 21:56:21 +01:00
Benoit Steiner	548ed30a1c	Added an OpenCL regression test	2016-12-19 18:56:26 -08:00
Benoit Steiner	27ceb43bf6	Fixed race condition in the tensor_shuffling_sycl test	2016-12-19 15:34:42 -08:00
Mehdi Goli	35bae513a0	Converting all parallel for lambda to functor in order to prevent kernel duplication name error; adding tensorConcatinationOp backend for sycl.	2016-12-16 19:46:45 +00:00
Benoit Steiner	9ff5d0f821	Merged eigen/eigen into default	2016-12-14 17:32:16 -08:00
Mehdi Goli	730eb9fe1c	Adding asynchronous execution as it improves the performance.	2016-12-14 17:38:53 +00:00
Mehdi Goli	2d4a091beb	Adding tensor contraction operation backend for Sycl; adding test for contractionOp sycl backend; adding temporary solution to prevent memory leak in buffer; cleaning up cxx11_tensor_buildins_sycl.h	2016-12-14 15:30:37 +00:00
Benoit Steiner	4deafd35b7	Introduce a portable EIGEN_SLEEP macro.	2016-12-09 14:52:15 -08:00
Benoit Steiner	2f5b7a199b	Reworked the threadpool cancellation mechanism to not depend on pthread_cancel since it turns out that pthread_cancel doesn't work properly on numerous platforms.	2016-12-09 13:05:14 -08:00
Benoit Steiner	3d59a47720	Added a message to ease the detection of platforms on which thread cancellation isn't supported.	2016-12-08 14:51:46 -08:00
Benoit Steiner	7bfff85355	Added support for thread cancellation on Linux	2016-12-08 08:12:49 -08:00
Srinivas Vasudevan	218764ee1f	Added support for expm1 in Eigen.	2016-12-02 14:13:01 -08:00
Mehdi Goli	79aa2b784e	Adding sycl backend for TensorPadding.h; disbaling __unit128 for sycl in TensorIntDiv.h; disabling cashsize for sycl in tensorDeviceDefault.h; adding sycl backend for StrideSliceOP ; removing sycl compiler warning for creating an array of size 0 in CXX11Meta.h; cleaning up the sycl backend code.	2016-12-01 13:02:27 +00:00
Benoit Steiner	fd1dc3363e	Merged eigen/eigen into default	2016-11-30 20:16:17 -08:00
Mehdi Goli	577ce78085	Adding TensorShuffling backend for sycl; adding TensorReshaping backend for sycl; cleaning up the sycl backend.	2016-11-29 15:30:42 +00:00
Benoit Steiner	3011dc94ef	Call internal::array_prod to compute the total size of the tensor.	2016-11-28 09:00:31 -08:00
Benoit Steiner	9f8fbd9434	Merged eigen/eigen into default	2016-11-26 11:28:25 -08:00
Mehdi Goli	7318daf887	Fixing LLVM error on TensorMorphingSycl.h on GPU; fixing int64_t crash for tensor_broadcast_sycl on GPU; adding get_sycl_supported_devices() on syclDevice.h.	2016-11-25 16:19:07 +00:00
Mehdi Goli	b8cc5635d5	Removing unsupported device from test case; cleaning the tensor device sycl.	2016-11-23 16:30:41 +00:00
Gael Guennebaud	7f6333c32b	Merged in tal500/eigen-eulerangles (pull request PR-237) Euler angles	2016-11-23 15:17:38 +00:00

... 2 3 4 5 6 ...

1228 Commits