August 2018 – Campy

(Update: Campy 0.0.14 is now released, Sept 2, 2018. –Ken)

The next release of Campy, which will happen sometime this week (8/27-9/3), will have many changes in order to compile and run a call to System.Console.WriteLine() on a GPU. There are also changes for correctness and stability of the compiler, on its ever-so-slow march towards the ECMA 335 standard.

As noted in a previous blog post, using System.Console.WriteLine() in kernel code exercises quite a bit of the compiler, pulling in a large chunk of the GPU runtime to compile, starting from the user’s kernel code containing just a WriteLine call. In developing this release, it was taking over 6 minutes to compile due to the number of types/virtual functions pulled in. This was a very long time in the debug-fix-recompile cycle.

Consequently, I’ve started looking at the performance of the compiler. For example, a classic problem in compilers is computing the back edges of a graph, and subsequently creating a topological ordering of the nodes without back-edges. For a C# method, it’s guaranteed that the basic blocks of the method can’t have branches outside the method, so this computation need only consider the basic blocks for the method, not the whole graph–which is what Campy’s IMPORTER class would do for each method discovered! This fix improves the importer run time from 5m 48s to 4m 5s for the WriteLine() example. Another aspect central to the compiler is the extension method ToMonoTypeReference(this System.Type type). This method converts System.Types into Mono Cecil types in code analysis. The performance of this method was improved by memoization. This fix improves the importer run time from 4m 5s to just 5s! Note: performance analysis was done with DotTrace, an incredibly valuable tool. The fixes in this release vastly improve the performance of the compiler, but I’m sure there are more.

However, the following clip (taken with OBS Studio) shows the debugger stepping into the GPU runtime for WriteLine() on an NVIDIA GPU.

The code fails because Campy does not yet generate code to execute the .cctor’s, of which System.String.Empty is one. I expect this fix to be one of the last changes before the release.

With these changes, Campy now supports a lot of C# and a bit of the runtime, surpassing the capability of Alea GPU, Altimesh, and ILGPU, at least in what it can support (reference types, value types, new operator, generics, static and instance fields, static and instance methods, virtual methods, strings, multidimensional arrays, .cctors, enums, switch statements, conversions, native code).

In the 50 day since the last release on July 6, over 500 hours of programming work was done (including a week for vacation).

This version compiles and runs System.Console.WriteLine() within kernels.
The new operator in kernel (newarr, newobj, initobj) in now supported.
ldtoken and constrained CIL instructions are now supported.
Reflection is now supported.
C# try/catch/finally (including with nesting) is now supported. However, exceptions are not supported because CUDA cannot catch exceptions in kernel code. Instead, Campy generates code that executes try/finally code in order.
Static fields of structs and classes are now supported.
There were numerous fixes to CIL instructions: compare instructions, brfalse/brtrue, ldobj, stobj.
There were numerous fixes for generic types and generic methods. Method TypeReference.Deresolve() was written to extends Mono Cecil to convert a generic back to a generic instance given type information computed by the compiler.
Fields within a class/struct hierarchy are now supported.
Moved to stack allocation runtime model for ldarga, ldloca, ldarg, ldloc. In the future, I may add an optimization whereby args and locals are simply copied directly from the stack–if the method does not contain ldarga or ldloca.
Changes to get the compiler and runtime to use types and sizes as specified in ECMA 335 for the abstract machine/compilation (storage type, intermediate type, etc.
Compilation and execution of .cctor methods on the GPU prior to kernel execution. Initialization of array data implemented (e.g., see System.Console.Empty).
“Intelligent virtual method compilation”–the compiler will deduce the types/virtual methods for a callvirt CIL instruction, pulling in all possibilities. Users do not need to tag methods/types with attributes to know what can run on the GPU.
Support for switch CIL instruction.
Much better integration of compiler with the runtime meta system.
Numerous bug fixes.
Much general clean up and refactoring. Removing of some kludges.
Rolling forward to LLVM 7.0 (pre-release).
Rolling forward to CUDA 9.2.148.
All unit tests pass.

Again, please note that Campy is still in development. However, I am proud of this release, even with all the bugs still in play (“i.ToString()” crashes, so there is a lot of work still to do). The compiler is really coming into its own, and running in reasonable time. It was a ton of work, many late nights, long hours. But, the job is, of course, never quite done.

Once Campy works with the full range of CIL instructions and data types in the GPU runtime, I’ll be working on separate compilation, then targeting the compiler to AMD GPUs. In addition, while the cctor initialization code currently runs on the GPU in a single thread before the main kernel of the Parallel.For(), Campy should compile and run constructors on the CPU. Fortunately, Campy uses LLVM, so potentially it could compile and target the x64 to run on the CPU. Finally, something has to be done with the deep copying of the C# data structures to shared memory.

If you have any questions, or problems getting Campy working, please let me know.

Ken Domino (ken.domino AT gmail.com)

In the latest round of edits for Campy, I am trying to get System.Console.WriteLine() working on a GPU–a task easier said than done. The changes have taken quite a bit of time, since as noted in the previous blog post, Mono Cecil has some quirkiness (e.g., ElementType and GetElementType() do not return the same types; Resolve() of an Array type returns the element type, not an Array type, because the resolver uses GetElementType(); Resolve() of a generic instance type returns the type definition with all type arguments thrown away). Mono Cecil has the virtual method Resolve() to turn a type/method reference into a type/method definition. But, MemberReference doesn’t have a Deresolve() method that turns a generic type/method definition into a type/method reference with specific types, e.g., convert List<T> into List<int>.

Mono Cecil defines a class hierarchy for types, e.g., MethodReference, TypeReference, GenericParameter, GenericInstanceType, etc. Many of these classes in the type hierarchy are sealed and not partial. Unfortunately, C# method parametric polymorphism is not a substitute for virtual methods. Since the class hierarchy cannot be altered with a Deresolve() virtual method, I have to code a function that takes MethodReference and computes type specific computations in a large if-then-else.

In order to make my life easier, I exported the class diagram for the Mono.Cecil MemberReference type hierarchy from Visual Studio (although I almost never visualize code, or code with pictures, this time it comes in handy). Note: to create the diagram, I opened the Mono.Cecil solution, then in the Solution Explorer, right click on “View -> View Class Diagram”, print to “Microsoft Print to PDF”, edit the PDF file in Inkscape, and export to an SVG file.

–Ken

Month: August 2018

Release v0.0.14

Some stats for compiling System.Console.WriteLine()

Lest I forget…