Now that Campy can compile simple value and reference types, the next issue is the runtime. Even on simple examples, when Campy decompiles a kernel in order to JIT it for the GPU, it rapidly discovers that the kernel references the NET runtime in most C# code.
In all NET runtimes, while most of the source is written in C#–which can be JIT’ed by Campy–there is a small portion of the Base Class Library (see ECMA-334, Appendix D) that is also written in C source code, and therefore cannot be JIT’ed by Campy. Furthermore, depending on how a program that uses Campy is run, Campy decompiles different runtimes for the same kernel source code.
Let’s take a simple string search example, which searches for all occurrences of a pattern in another using a brute-force algorithm.
string text = ...; string pattern = ...; int[] found = ...; Campy.Parallel.For(n, i => { int j; for (j = 0; j < m && i + j < n; j++) if (text[i + j] != pattern[j]) break; // mismatch found, break the inner loop if (j == m) { // match found found[i] = 1; } });
When built with NET Framework 4.7, Campy discovers the kernel method and the BCL method String.get_Chars(). However, get_Chars() has no CIL body as it is a native C function, which we also know because it is tagged “Is Internal” when using reflection.
If the program is run under Mono, Campy discovers a different implementation, all being in CIL, which can be JIT’ed.
The result is a richly textured landscape of different runtimes that need to be accommodated. That said, much of the runtime just doesn’t make any sense on the GPU: networking, graphics drawing, etc.
After reading some of the source code for Coreclr, Corert, Mono, NET Micro Framework, Dot Net Anywhere, I’ve come to the conclusion that I cannot simply replace the function calls to C functions of the runtime with unsafe C# code, which I have been doing so far. This cannot work as there must be an agreement between the implementation of the data structures in the BCL presumed between C# and C source.
Therefore, Campy must implement a BCL specific for it. Fortunately, there are plenty of examples to start from. Unfortunately, while CUDA C++ on Windows can compile C++11/14, it requires all device-runnable functions to be tagged with the __device__ qualifier, a rather annoying requirement in CUDA.