Campy JIT’s a C#/NET lambda expression into NVIDIA PTX, which is then linked with the GPU base class library runtime, loaded into a cuModule, and executed. C++ global variables are fine, accessible and stable between calls to different kernels via Cuda.cuLaunchKernel(). Unfortunately, global variables are not shared between cuModule’s. As noted in the CUDA C Programming Guide, “The names for all symbols, including functions, global variables, and texture or surface references, are maintained at module scope so that modules written by independent third parties may interoperate in the same CUDA context.” Further, “Any user defined __device__, __constant__ or __managed__ device global variables present in the module that owns the CUfunction being launched are independently instantiated on every device. The user is responsible for initializing such device global variables appropriately.”
At this moment, each call to Parallel.For() results in a separate JIT and CUDA module. The runtime is initialized with each Parallel.For(). Unfortunately, initialization of the base class layer is quite time-consuming, especially on the GPU. So it should be done only once, either on the CPU or GPU. Therefore, C++ global variables need to be eliminated. But, the GPU base class layer library contains dozens of global variables and hundreds of references. Another mass edit of the DotNetAnywhere code, and bump on the road to full C# support on the GPU.