In my last post, I mentioned that Campy was able to JIT quite a bit, but failed to JIT many kernels because a NET base class library for the GPU was required. I’m happy to say this is now corrected. The DotNetAnywhere runtime has been ported to the NVIDIA GPU, and the simple string search example I mentioned in the last blog post now works.
The GPU BCL consists of 13 thousand lines of CUDA code (in 41 .CU files, which is compiled by NVCC to generate 56 thousand lines of PTX code that is loaded when executing a kernel), and 24 thousand lines of C# code. When a kernel is run, Campy rewrites the kernel to use the GPU BCL. I still haven’t gotten over first seeing the string search kernel compile and run with all of this baggage–uh, runtime!
While an important step, there is much work to do, the least of which is a parallel memory allocator/garbage collector that will work on the GPU.