This PR fixes the build with Kokkos + OMP offload, supports OMP offload without linking CUDA libraries, and supports OMP offload on Intel GPUs.