The TensorScanOp implementation was missing a CUDA kernel launch. This adds a simple placeholder implementation.
This is the initial implementation a generic scan operation. Based on this, cumsum and cumprod method have been added to TensorBase.