News

Correcting the building of new releases

After taking a short break to accompany my 93-year-old dad to Japan, and hike Mount Fuji, I’m going to address the build issues for releases. Undoubtedly one reason why some haven’t gotten Campy to work (https://sigma.software/about/media/gp-gpu-computing-c) is that the releases are from my build machine. Yes, I am not naive. I knew this was a problem but hadn’t any money to buy a new cheap machine to do builds. This is finally being corrected with the purchase of new AMD Ryzen hardware and Quadro card.

New release, same ol’ same ol’

I have been trying a few things, and unfortunately, Campy crashes and burns in some cases due to CCTOR initialization order. E.g.,

Executing cctor System.Void System.Console::.cctor()
Executing cctor System.Void System.Environment::.cctor()
Executing cctor System.Void System.String::.cctor()
Executing cctor System.Void System.NumberFormatter::.cctor()
Executing cctor System.Void System.Globalization.NumberFormatInfo::.cctor()
Executing cctor System.Void System.NumberFormatter/NumberStore::.cctor()
Executing cctor System.Void System.Globalization.CultureInfo::.cctor()

To check, you can set a compiler option to turn on tracing of the CCTOR calls calling Campy.Utils.Options.Set(“trace-cctors”); right before calling Campy.Parallel.For(). If it doesn’t say “Done with cctors”, you can probably be sure that’s the problem. Anyways, I fixed that problem by reordering the calls. I then found out that despite my best efforts in generics, the compiler messes up on generics (in List.cs). In this code, it creates an array with the wrong element type:

public List(int capacity) {
	if (capacity < 0) {
		throw new ArgumentOutOfRangeException("capacity");
	}
	this.items = new T[capacity];
	this.size = 0;
}

I know the fix, but I just won’t be able to address it for a couple weeks–going on vaca. But, I checked in some code changes that improve on the compiler generated names of methods.

Enjoy. –Ken

Release v0.0.14

(Update: Campy 0.0.14 is now released, Sept 2, 2018. –Ken)

The next release of Campy, which will happen sometime this week (8/27-9/3), will have many changes in order to compile and run a call to System.Console.WriteLine() on a GPU. There are also changes for correctness and stability of the compiler, on its ever-so-slow march towards the ECMA 335 standard.

As noted in a previous blog post, using System.Console.WriteLine() in kernel code exercises quite a bit of the compiler, pulling in a large chunk of the GPU runtime to compile, starting from the user’s kernel code containing just a WriteLine call. In developing this release, it was taking over 6 minutes to compile due to the number of types/virtual functions pulled in. This was a very long time in the debug-fix-recompile cycle.

Consequently, I’ve started looking at the performance of the compiler. For example, a classic problem in compilers is computing the back edges of a graph, and subsequently creating a topological ordering of the nodes without back-edges. For a C# method, it’s guaranteed that the basic blocks of the method can’t have branches outside the method, so this computation need only consider the basic blocks for the method, not the whole graph–which is what Campy’s IMPORTER class would do for each method discovered! This fix improves the importer run time from 5m 48s to 4m 5s for the WriteLine() example.  Another aspect central to the compiler is the extension method ToMonoTypeReference(this System.Type type). This method converts System.Types into Mono Cecil types in code analysis. The performance of this method was improved by memoization. This fix improves the importer run time from 4m 5s to just 5s! Note: performance analysis was done with DotTrace, an incredibly valuable tool. The fixes in this release vastly improve the performance of the compiler, but I’m sure there are more.

However, the following clip (taken with OBS Studio) shows the debugger stepping into the GPU runtime for WriteLine() on an NVIDIA GPU.

The code fails because Campy does not yet generate code to execute the .cctor’s, of which System.String.Empty is one. I expect this fix to be one of the last changes before the release.

With these changes, Campy now supports a lot of C# and a bit of the runtime, surpassing the capability of Alea GPU, Altimesh, and ILGPU, at least in what it can support (reference types, value types, new operator, generics, static and instance fields, static and instance methods, virtual methods, strings, multidimensional arrays, .cctors, enums, switch statements, conversions, native code).

In the 50 day since the last release on July 6, over 500 hours of programming work was done (including a week for vacation).

  • This version compiles and runs System.Console.WriteLine() within kernels.
  • The new operator in kernel (newarr, newobjinitobj) in now supported.
  • ldtoken and constrained CIL instructions are now supported.
  • Reflection is now supported.
  • C# try/catch/finally (including with nesting) is now supported. However, exceptions are not supported because CUDA cannot catch exceptions in kernel code. Instead, Campy generates code that executes try/finally code in order.
  • Static fields of structs and classes are now supported.
  • There were numerous fixes to CIL instructions: compare instructions, brfalse/brtrue, ldobj, stobj.
  • There were numerous fixes for generic types and generic methods. Method TypeReference.Deresolve() was written to extends Mono Cecil to convert a generic back to a generic instance given type information computed by the compiler.
  • Fields within a class/struct hierarchy are now supported.
  • Moved to stack allocation runtime model for ldarga, ldloca, ldarg, ldloc. In the future, I may add an optimization whereby args and locals are simply copied directly from the stack–if the method does not contain ldarga or ldloca.
  • Changes to get the compiler and runtime to use types and sizes as specified in ECMA 335 for the abstract machine/compilation (storage type, intermediate type, etc.
  • Compilation and execution of .cctor methods on the GPU prior to kernel execution. Initialization of array data implemented (e.g., see System.Console.Empty).
  • “Intelligent virtual method compilation”–the compiler will deduce the types/virtual methods for a callvirt CIL instruction, pulling in all possibilities. Users do not need to tag methods/types with attributes to know what can run on the GPU.
  • Support for switch CIL instruction.
  • Much better integration of compiler with the runtime meta system.
  • Numerous bug fixes.
  • Much general clean up and refactoring. Removing of some kludges.
  • Rolling forward to LLVM 7.0 (pre-release).
  • Rolling forward to CUDA 9.2.148.
  • All unit tests pass.

Again, please note that Campy is still in development. However, I am proud of this release, even with all the bugs still in play (“i.ToString()” crashes, so there is a lot of work still to do). The compiler is really coming into its own, and running in reasonable time. It was a ton of work, many late nights, long hours. But, the job is, of course, never quite done.

Once Campy works with the full range of CIL instructions and data types in the GPU runtime, I’ll be working on separate compilation, then targeting the compiler to AMD GPUs. In addition, while the cctor initialization code currently runs on the GPU in a single thread before the main kernel of the Parallel.For(), Campy should compile and run constructors on the CPU. Fortunately, Campy uses LLVM, so potentially it could compile and target the x64 to run on the CPU. Finally, something has to be done with the deep copying of the C# data structures to shared memory.

If you have any questions, or problems getting Campy working, please let me know.

Ken Domino (ken.domino AT gmail.com)

Lest I forget…

In the latest round of edits for Campy, I am trying to get System.Console.WriteLine() working on a GPU–a task easier said than done. The changes have taken quite a bit of time, since as noted in the previous blog post, Mono Cecil has some quirkiness (e.g., ElementType and GetElementType() do not return the same types; Resolve() of an Array type returns the element type, not an Array type, because the resolver uses GetElementType()Resolve() of a generic instance type returns the type definition with all type arguments thrown away). Mono Cecil has the virtual method Resolve() to turn a type/method reference into a type/method definition. But, MemberReference doesn’t have a Deresolve() method that turns a generic type/method definition into a type/method reference with specific types, e.g., convert List<T> into List<int>.

Mono Cecil defines a class hierarchy for types, e.g., MethodReference, TypeReference, GenericParameter, GenericInstanceType, etc. Many of these classes in the type hierarchy are sealed and not partial. Unfortunately, C# method parametric polymorphism is not a substitute for virtual methods. Since the class hierarchy cannot be altered with a Deresolve() virtual method, I have to code a function that takes MethodReference and computes type specific computations in a large if-then-else.

In order to make my life easier, I exported the class diagram for the Mono.Cecil MemberReference type hierarchy from Visual Studio (although I almost never visualize code, or code with pictures, this time it comes in handy). Note: to create the diagram, I opened the Mono.Cecil solution, then in the Solution Explorer, right click on “View -> View Class Diagram”, print to “Microsoft Print to PDF”, edit the PDF file in Inkscape, and export to an SVG file.

–Ken

The problem with “IL_0009: callvirt instance void class [mscorlib]System.Collections.Generic.List`1::set_Item(int32, !0/*int32*/)”

Consider a program that uses System.Collections.Generic.List<>:

List<int> x = new List<int>();
x.Add(1);
x[0] = 2;

After compiling the program, we find CIL instructions to create the generic List<int>, add 1 to the list, and reset the first element in the list to be 2:

IL_0001: newobj instance void class [mscorlib]System.Collections.Generic.List`1<int32>::.ctor()
IL_0009: callvirt instance void class [mscorlib]System.Collections.Generic.List`1<int32>::Add(!0/*int32*/)
IL_0012: callvirt instance void class [mscorlib]System.Collections.Generic.List`1<int32>::set_Item(int32, !0/*int32*/)

Notice that in each call, a generic instance is referenced. The signature of the method in the instruction contains the instantiated generic parameters, not the actual generic instance arguments, e.g., “0!”. You may ask: “Why aren’t the generic parameters substituted with the actual argument System.Int32?” The only reason I can think of is so the name/signature encoding can be found in the assembly mscorlib. It also allows for the system to JIT the CIL code with various generic arguments when executed. You can see using DotPeek that in the MemberRef table for the set_Item method, there are three fields used to define the method called: (1) the declaring type System.Collections.Generic.List`<System.Int32>, which is an instantiated generic; (2) the name of the method, set_Item; and (3) the signature blob System.Void (System.Int32, !0). In order to find the CIL for the method in mscorlib, a compiler would need to find a method with the same name and same signature. It’s easier to get a match when the generic parameter is used, and not the generic argument.

The problem with these incomplete signatures is that the generic parameter is already typed. Campy fixes this problem by creating new MethodReference values that fully type the method parameters. It performs unification of signatures instead of a simple string comparison for matching. Thus, System.Collections.Generic.List`1<>::set_Item(Int32, !0) matches System.Collections.Generic.List`1<Int32>::set_Item(Int32, Int32). This change required quite a bit of jumping through hoops because the Mono Cecil’s assembly and metadata resolvers could not be used. I had to write new ones. The next release of Campy will add in this new code.

 

Waz up?

After a year plus some, Campy is starting to work on some practical examples. But, when things go sour in an executing kernel, there’s not much I can do but single step and look at disassemblies and registers of the GPU. I know what things should look like because you’d expect that from a compiler writer. But, for the average user, they’re not going to understand much. Before I get LLVM debugging information really working, the first step is good ol’ WriteLine() calls. What I should be able to do is this little ditty:


using System;
namespace test
{
    class Program
    {
        static void Main(string[] args)
        {
            Campy.Parallel.For(4, i => { System.Console.WriteLine(i); });
        }
    }
}

This simple kernel does quite a bit. First thing to note is the code generated :


Node: 1 
    Method System.Void ConsoleApp4.Program/<>c::b__1_0(System.Int32) ConsoleApp4.exe C:\Users\kenne\Documents\Campy2\ConsoleApp4\bin\Debug\ConsoleApp4.exe
    Method System.Void ConsoleApp4.Program/<>c::b__1_0(System.Int32) ConsoleApp4.exe .\ConsoleApp4.exe
    HasThis   False
    Args   0
    Locals 0
    Return (reuse) False
    Instructions:
        IL_0000: nop    
        IL_0001: ldarg.1    
        IL_0002: call System.Void System.Console::WriteLine(System.String)    
        IL_0007: nop    
        IL_0008: ret    

In this example, there is no expect call to “ToString()” the value after the ldarg.1, so Campy must know to convert the integer to a string. I’m a little surprised when I see crap like this coming out of the C# compiler; it would have made my life a little easier if it generated code to convert to the appropriate parameter type. It’s likely there are many other such implicit type conversions: the rules for implicit argument coercion is in ECMA 335 (page 305), although it does not mention int to string conversion. Does anyone know where this is in the spec?

Second, while a lot of the infrastructure for compiling this test works, there are still a number of problems preventing it from working. Looking through the output of the compiler, the generated LLVM code isn’t correct for newarr:

        IL_0040: newarr System.Char    

This will be fixed. I’m hoping the next release will have WriteLine finally working.

Third, I’m noticing that there are lots of try-catch-finally blocks in the NET runtime to compile. I’ve been holding off on this, as it appears that CUDA does not allow try/catch exception handling whatsoever. For the moment, I can try to string together basic blocks so that the finally clauses are executed at least from the try clause. I might be able to implement some sort of exception handling, but it’s not at all clear at the moment.

BTW, does anyone else hate how Windows OS ignores case for file or directory names? I just found out that there’s a “Corlib” and a “corlib” in the Campy Git repository. Undoubtedly I added using the CLI for Git and typed in by mistake both ways. Unfortunately, to correct it, I’ll have to use Linux less I repeat the same mistakes on Windows.

Next: Virtual functions and boxing

As indicated, for the next release (v0.0.13) I will be adding code to Campy to compile callvirt and box/unbox CIL instructions. Right now, compilation of callvirt actually calls only the base class method. For virtual functions to work (and, actually, a computed “late-bound method on an object”), the JITed code must be stored in the BCL meta. Then, for a callvirt, a pointer corresponding to the virtual function is loaded and called for the object. For box and unbox, I currently have an implementation of box for Int32, but will add in the other basic value types. Unbox will not be in this release–too much other work to do. It will probably take a couple weeks to get all this working.

But, with these changes, an ever larger amount of C# and NET framework should start to work as expected on a GPU. However, I have noticed many functions in the DNA runtime that are attributed “[MethodImpl(MethodImplOptions.InternalCall)]” which do not have a C++/C implementation. Clearly, DNA has some shortcomings that need to be fixed. I will deal with these missing methods first on a case-by-case basis. Then, at some point, a methodical check must be done to verify that there is an implementation for all such methods.

Release v0.0.12

This next release fixes a number of problems with Campy for a more complex example: steepest descent. This example encompasses a number of advanced capabilities of Campy and C#, which is best explained with the implementation shown below. In this example, you will note use of value types, reference types, generics, and multiple Parallel.For() calls.


using System;
using System.Text;
using System.Collections.Generic;
using System.Collections.ObjectModel;

namespace test
{
    class Program
    {
        static void Main(string[] args)
        {
            var A = new SquareMatrix(new Collection<double>() { 3, 2, 2, 6 });
            var b = new Vector(new Collection<double>() {2, -8});
            var x = new Vector(new Collection<double>() {-2, -2});
	    var r = SD.SteepestDescent(A, b, x);
            System.Console.WriteLine(r.ToString());
        }
    }

    class SquareMatrix
    {
        public int N { get; private set; }
        private List<double> data;
        public SquareMatrix(int n)
        {
            N = n;
            data = new List<double>();
            for (int i = 0; i < n*n; ++i) data.Add(0);
        }

        public SquareMatrix(Collection<double> c)
        {
            data = new List<double>(c);
            var s = Math.Sqrt(c.Count);
            N = (int)Math.Floor(s);
            if (s != (double)N)
            {
                throw new Exception("Need to provide square matrix sized initializer.");
            }
        }

        public static Vector operator *(SquareMatrix a, Vector b)
        {
            Vector result = new Vector(a.N);
            Campy.Parallel.For(result.N, i =>
            {
                for (int j = 0; j < result.N; ++j)
                    result[i] += a.data[i * result.N + j] * b[j];
            });
            return result;
        }
    }

    class Vector
    {
        public int N { get; private set; }
        private List<double> data;

        public Vector(int n)
        {
            N = n;
            data = new List<double>();
            for (int i = 0; i < n; ++i) data.Add(0);
        }

        public double this[int i]
        {
            get
            {
                return data[i];
            }
            set
            {
                data[i] = value;
            }
        }

        public Vector(Collection<double> c)
        {
            data = new List<double>(c);
            N = c.Count;
        }

        public static double operator *(Vector a, Vector b)
        {
            double result = 0;
            for (int i = 0; i < a.N; ++i) result += a[i] * b[i]; return result; } public static Vector operator *(double a, Vector b) { Vector result = new Vector(b.N); Campy.Parallel.For(b.N, i => { result[i] = a * b[i]; });
            return result;
        }

        public static Vector operator -(Vector a, Vector b)
        {
            Vector result = new Vector(a.N);
            Campy.Parallel.For(a.N, i => { result[i] = a[i] - b[i]; });
            return result;
        }

        public static Vector operator +(Vector a, Vector b)
        {
            Vector result = new Vector(a.N);
            Campy.Parallel.For(a.N, i => { result[i] = a[i] + b[i]; });
            return result;
        }

        public override string ToString()
        {
            StringBuilder sb = new StringBuilder();
            for (int i = 0; i < data.Count; ++i)
            {
                sb.Append(data[i] + " ");
            }
            return sb.ToString();
        }
    }

    class SD
    {
        public static Vector SteepestDescent(SquareMatrix A, Vector b, Vector x)
        {
            // Similar to http://ta.twi.tudelft.nl/nw/users/mmbaumann/projects/Projekte/MPI2_slides.pdf
            for (;;)
            {
                Vector r = b - A * x;
                double rr = r * r;
                double rAr = r * (A * r);
		if (Math.Abs(rAr) <= 1.0e-10) break;
                double a = (double) rr / (double) rAr;
                x = x + (a * r);
            }
            return x;
        }

        // https://www.coursera.org/learn/predictive-analytics/lecture/RhkFB/parallelizing-gradient-descent
        // "Hogwild! A lock-free approach to parallelizing stochastic gradient descent"
        // https://arxiv.org/abs/1106.5730

        // Parallelize vector and matrix operations
        // http://www.dcs.warwick.ac.uk/pmbs/pmbs14/PMBS14/Workshop_Schedule_files/8-CUDAHPCG.pdf

        // An introduction to the conjugate gradient method without the agonizing pain
        // https://www.cs.cmu.edu/~quake-papers/painless-conjugate-gradient.pdf

        // https://github.com/gmarkall/cuda_cg/blob/master/gpu_solve.cu
    }
}

 

Release v0.0.11

After a considerable amount of hacking, I’ve released v0.0.11 of Campy. This version fixes a number of problems with reading PE files. Again, most of the problems I have been encountering go back to the old DotNetAnywhere code and its lack of support for anything that has happened in .NET over the last 10+ years, e.g., additional metadata type tables, 64-bit code targets, type and assembly resolution, low-level metadata access, etc. Some changes are for undocumented Net Core hacks, such as a PE machine version 0xFD1D. A search in all of Github.com indicates that this occurs for native libraries, such as System.Collections.dll on Ubuntu 16.04. However, even though it is x64 native code, and you may think the assembly useless, it seems to contain metadata type information that is crucial in the analysis for type/assembly resolution, which you can verify using DotPeek. Incidentally, assembly resolution–the steps used by Net to figure out where and what assembly to load for a program–is somewhat fixed in Campy with the addition of lots of probing of the standard locations for assemblies, and checking “public key” for the correct version. Generics and String finally work again with changes to rewrite the stack types during compilation of a method. I fear, however, that it may be inadequate, for generics are quite complicated. There was a problem with Swigged.CUDA on Ubuntu, but that is now fixed.

So, slowly, Campy is coming up to speed with respect to being platform independent and able to work with a lot of C# (value types, reference types, generics). But, it still has a way to go: boxing, virtual methods, IOS, Mono assemblies, etc. And, it still has a number of bugs with C# generics, which can make it unstable, if not impossible, to use.

Note: While I appreciate why Steve Sanderson et al. switched from DotNetAnywhere to Mono with Blazor in late 2017, I am glad I chose DNA for Campy. Trading DNA for Mono is just trading one set of problems for another. If you’ve been in the business as long as I have (30+ years), you realize that you may think you and your code hot stuff, but someone can always improve on it–or rewrite it completely. That old programmer who wrote that original crapy code that you improved may come back and bite your ass off. That’s the nature of software.