News

The problem with “IL_0009: callvirt instance void class [mscorlib]System.Collections.Generic.List`1::set_Item(int32, !0/*int32*/)”

When C# is compiled, the generated CIL of a program that uses a generic will contain references to a generic instance. For example:

List<int> x = new List<int>();
x.Add(1);
x[0] = 2;

After compiling, we find CIL calls to create the generic List<int>, add 1 to the list, and reset the first element in the list to be 2:

IL_0001: newobj instance void class [mscorlib]System.Collections.Generic.List`1<int32>::.ctor()
IL_0009: callvirt instance void class [mscorlib]System.Collections.Generic.List`1<int32>::Add(!0/*int32*/)
IL_0012: callvirt instance void class [mscorlib]System.Collections.Generic.List`1<int32>::set_Item(int32, !0/*int32*/)

Notice that in each call, a generic instance is referenced,while the signature of the method contains the instantiated generic parameters, not the generic instance arguments substituted. You may ask: “Why aren’t the generic parameters substituted with the actual argument System.Int32?” The reason is probably because the method is encoded as a name/signature encoding, so that it can be found in the referenced assembly (mscorlib). You can see using DotPeek that in the MemberRef table for the set_Item method, there are three fields used to define the method called: (1) the declaring type System.Collections.Generic.List`<System.Int32>, which is an instantiated generic; (2) the name of the method, set_Item; and (3) the signature blob System.Void (System.Int32, !0). In order to find the CIL for the method in mscorlib, a compiler would need to find a method with the same name and same signature. It’s easier to get a match when the generic parameter is used, and not the generic argument.

The problem with these incomplete signatures is that the generic parameter is already typed. Campy fixes this problem by creating new MethodReference values that fully type the method parameters. It performs unification of signatures instead of a simple string comparison for matching. Thus, System.Collections.Generic.List`1<>::set_Item(Int32, !0) matches System.Collections.Generic.List`1<Int32>::set_Item(Int32, Int32). This change required quite a bit of jumping through hoops because the Mono Cecil’s assembly and metadata resolvers could not be used. I had to write new ones. The next release of Campy will add in this new code.

 

Waz up?

After a year plus some, Campy is starting to work on some practical examples. But, when things go sour in an executing kernel, there’s not much I can do but single step and look at disassemblies and registers of the GPU. I know what things should look like because you’d expect that from a compiler writer. But, for the average user, they’re not going to understand much. Before I get LLVM debugging information really working, the first step is good ol’ WriteLine() calls. What I should be able to do is this little ditty:


using System;
namespace test
{
    class Program
    {
        static void Main(string[] args)
        {
            Campy.Parallel.For(4, i => { System.Console.WriteLine(i); });
        }
    }
}

This simple kernel does quite a bit. First thing to note is the code generated :


Node: 1 
    Method System.Void ConsoleApp4.Program/<>c::b__1_0(System.Int32) ConsoleApp4.exe C:\Users\kenne\Documents\Campy2\ConsoleApp4\bin\Debug\ConsoleApp4.exe
    Method System.Void ConsoleApp4.Program/<>c::b__1_0(System.Int32) ConsoleApp4.exe .\ConsoleApp4.exe
    HasThis   False
    Args   0
    Locals 0
    Return (reuse) False
    Instructions:
        IL_0000: nop    
        IL_0001: ldarg.1    
        IL_0002: call System.Void System.Console::WriteLine(System.String)    
        IL_0007: nop    
        IL_0008: ret    

In this example, there is no expect call to “ToString()” the value after the ldarg.1, so Campy must know to convert the integer to a string. I’m a little surprised when I see crap like this coming out of the C# compiler; it would have made my life a little easier if it generated code to convert to the appropriate parameter type. It’s likely there are many other such implicit type conversions: the rules for implicit argument coercion is in ECMA 335 (page 305), although it does not mention int to string conversion. Does anyone know where this is in the spec?

Second, while a lot of the infrastructure for compiling this test works, there are still a number of problems preventing it from working. Looking through the output of the compiler, the generated LLVM code isn’t correct for newarr:

        IL_0040: newarr System.Char    

This will be fixed. I’m hoping the next release will have WriteLine finally working.

Third, I’m noticing that there are lots of try-catch-finally blocks in the NET runtime to compile. I’ve been holding off on this, as it appears that CUDA does not allow try/catch exception handling whatsoever. For the moment, I can try to string together basic blocks so that the finally clauses are executed at least from the try clause. I might be able to implement some sort of exception handling, but it’s not at all clear at the moment.

BTW, does anyone else hate how Windows OS ignores case for file or directory names? I just found out that there’s a “Corlib” and a “corlib” in the Campy Git repository. Undoubtedly I added using the CLI for Git and typed in by mistake both ways. Unfortunately, to correct it, I’ll have to use Linux less I repeat the same mistakes on Windows.

Next: Virtual functions and boxing

As indicated, for the next release (v0.0.13) I will be adding code to Campy to compile callvirt and box/unbox CIL instructions. Right now, compilation of callvirt actually calls only the base class method. For virtual functions to work (and, actually, a computed “late-bound method on an object”), the JITed code must be stored in the BCL meta. Then, for a callvirt, a pointer corresponding to the virtual function is loaded and called for the object. For box and unbox, I currently have an implementation of box for Int32, but will add in the other basic value types. Unbox will not be in this release–too much other work to do. It will probably take a couple weeks to get all this working.

But, with these changes, an ever larger amount of C# and NET framework should start to work as expected on a GPU. However, I have noticed many functions in the DNA runtime that are attributed “[MethodImpl(MethodImplOptions.InternalCall)]” which do not have a C++/C implementation. Clearly, DNA has some shortcomings that need to be fixed. I will deal with these missing methods first on a case-by-case basis. Then, at some point, a methodical check must be done to verify that there is an implementation for all such methods.

Release v0.0.12

This next release fixes a number of problems with Campy for a more complex example: steepest descent. This example encompasses a number of advanced capabilities of Campy and C#, which is best explained with the implementation shown below. In this example, you will note use of value types, reference types, generics, and multiple Parallel.For() calls.


using System;
using System.Text;
using System.Collections.Generic;
using System.Collections.ObjectModel;

namespace test
{
    class Program
    {
        static void Main(string[] args)
        {
            var A = new SquareMatrix(new Collection<double>() { 3, 2, 2, 6 });
            var b = new Vector(new Collection<double>() {2, -8});
            var x = new Vector(new Collection<double>() {-2, -2});
	    var r = SD.SteepestDescent(A, b, x);
            System.Console.WriteLine(r.ToString());
        }
    }

    class SquareMatrix
    {
        public int N { get; private set; }
        private List<double> data;
        public SquareMatrix(int n)
        {
            N = n;
            data = new List<double>();
            for (int i = 0; i < n*n; ++i) data.Add(0);
        }

        public SquareMatrix(Collection<double> c)
        {
            data = new List<double>(c);
            var s = Math.Sqrt(c.Count);
            N = (int)Math.Floor(s);
            if (s != (double)N)
            {
                throw new Exception("Need to provide square matrix sized initializer.");
            }
        }

        public static Vector operator *(SquareMatrix a, Vector b)
        {
            Vector result = new Vector(a.N);
            Campy.Parallel.For(result.N, i =>
            {
                for (int j = 0; j < result.N; ++j)
                    result[i] += a.data[i * result.N + j] * b[j];
            });
            return result;
        }
    }

    class Vector
    {
        public int N { get; private set; }
        private List<double> data;

        public Vector(int n)
        {
            N = n;
            data = new List<double>();
            for (int i = 0; i < n; ++i) data.Add(0);
        }

        public double this[int i]
        {
            get
            {
                return data[i];
            }
            set
            {
                data[i] = value;
            }
        }

        public Vector(Collection<double> c)
        {
            data = new List<double>(c);
            N = c.Count;
        }

        public static double operator *(Vector a, Vector b)
        {
            double result = 0;
            for (int i = 0; i < a.N; ++i) result += a[i] * b[i]; return result; } public static Vector operator *(double a, Vector b) { Vector result = new Vector(b.N); Campy.Parallel.For(b.N, i => { result[i] = a * b[i]; });
            return result;
        }

        public static Vector operator -(Vector a, Vector b)
        {
            Vector result = new Vector(a.N);
            Campy.Parallel.For(a.N, i => { result[i] = a[i] - b[i]; });
            return result;
        }

        public static Vector operator +(Vector a, Vector b)
        {
            Vector result = new Vector(a.N);
            Campy.Parallel.For(a.N, i => { result[i] = a[i] + b[i]; });
            return result;
        }

        public override string ToString()
        {
            StringBuilder sb = new StringBuilder();
            for (int i = 0; i < data.Count; ++i)
            {
                sb.Append(data[i] + " ");
            }
            return sb.ToString();
        }
    }

    class SD
    {
        public static Vector SteepestDescent(SquareMatrix A, Vector b, Vector x)
        {
            // Similar to http://ta.twi.tudelft.nl/nw/users/mmbaumann/projects/Projekte/MPI2_slides.pdf
            for (;;)
            {
                Vector r = b - A * x;
                double rr = r * r;
                double rAr = r * (A * r);
		if (Math.Abs(rAr) <= 1.0e-10) break;
                double a = (double) rr / (double) rAr;
                x = x + (a * r);
            }
            return x;
        }

        // https://www.coursera.org/learn/predictive-analytics/lecture/RhkFB/parallelizing-gradient-descent
        // "Hogwild! A lock-free approach to parallelizing stochastic gradient descent"
        // https://arxiv.org/abs/1106.5730

        // Parallelize vector and matrix operations
        // http://www.dcs.warwick.ac.uk/pmbs/pmbs14/PMBS14/Workshop_Schedule_files/8-CUDAHPCG.pdf

        // An introduction to the conjugate gradient method without the agonizing pain
        // https://www.cs.cmu.edu/~quake-papers/painless-conjugate-gradient.pdf

        // https://github.com/gmarkall/cuda_cg/blob/master/gpu_solve.cu
    }
}

 

Release v0.0.11

After a considerable amount of hacking, I’ve released v0.0.11 of Campy. This version fixes a number of problems with reading PE files. Again, most of the problems I have been encountering go back to the old DotNetAnywhere code and its lack of support for anything that has happened in .NET over the last 10+ years, e.g., additional metadata type tables, 64-bit code targets, type and assembly resolution, low-level metadata access, etc. Some changes are for undocumented Net Core hacks, such as a PE machine version 0xFD1D. A search in all of Github.com indicates that this occurs for native libraries, such as System.Collections.dll on Ubuntu 16.04. However, even though it is x64 native code, and you may think the assembly useless, it seems to contain metadata type information that is crucial in the analysis for type/assembly resolution, which you can verify using DotPeek. Incidentally, assembly resolution–the steps used by Net to figure out where and what assembly to load for a program–is somewhat fixed in Campy with the addition of lots of probing of the standard locations for assemblies, and checking “public key” for the correct version. Generics and String finally work again with changes to rewrite the stack types during compilation of a method. I fear, however, that it may be inadequate, for generics are quite complicated. There was a problem with Swigged.CUDA on Ubuntu, but that is now fixed.

So, slowly, Campy is coming up to speed with respect to being platform independent and able to work with a lot of C# (value types, reference types, generics). But, it still has a way to go: boxing, virtual methods, IOS, Mono assemblies, etc. And, it still has a number of bugs with C# generics, which can make it unstable, if not impossible, to use.

Note: While I appreciate why Steve Sanderson et al. switched from DotNetAnywhere to Mono with Blazor in late 2017, I am glad I chose DNA for Campy. Trading DNA for Mono is just trading one set of problems for another. If you’ve been in the business as long as I have (30+ years), you realize that you may think you and your code hot stuff, but someone can always improve on it–or rewrite it completely. That old programmer who wrote that original crapy code that you improved may come back and bite your ass off. That’s the nature of software.

Generics limping along…and into DNA

Well, I finally have generics working again…sort of, and currently only in Net Framework apps. Unfortunately, I’m back in dll/assembly hell. When I try to find List<> in Net Core’s System.Collections.dll, it isn’t there. Where is it? And, why am I even looking in the file if Campy BCL has a replacement?

Starting with a specific example, and using DotPeek of the Campy Net Core test program ConsoleApp1/bin/Debug/netcoreapp2.0/win-x64/publish/ConsoleApp1.dll, this is what I can figure out:

  • The metadata for TypeRef table containing List`1 in “ConsoleApp1.dll” says the type is in AssemblyRef 0x23000004, major version 4, minor version 1, name “System.Collections”. Since “publish” is a self-contained app, I open “System.Collections.dll” in the same directory.
  • DotPeek of “System.Collections.dll” indicates it does not define a type “List`1”. So, looking at the TypeRef’s and AssemblyRef’s in the metadata, it should be in AsmRef 0x23000001, System.Private.CoreLib, major version 4, minor version 0.
  • DotPeek of “System.Private.CoreLib.dll” indicates it defines a type “List`1”, and this is in fact where implementation for List<> lives. (Note, List`1 is way down in the TypeDef’s table with id 0x0200040e. So using DotPeek to find the type is basically impossible, because the “find” function in DotPeek crashes with this file. I’m writing a program called Campy.Find that will help with these kinds of queries.)

In fact, there’s already a bit of kludgy code in Campy that does something like this, between “public static IntPtr GetBclType(Type type)” in C# code that looks up the type hierarchy to load in specific files and types, and “function_space_specifier tMetaData* CLIFile_GetMetaDataForAssembly(char * fileName)” in C code within DNA that loads an assembly, probing if needed. But, clearly, it should all be done in DNA. Fortunately, DNA can load files from the host OS file system since I do provide a wrapper for DNA that is called from C#.

Once I have this search coded up in DNA–and patching up DNA to read FieldMarshal tables, which are in System.Private.CoreLib.dll–I’m hoping generics will generally start to work, and the code a little cleaner to boot.

 

 

 

NVIDIA GPU support

An important note…

Since 2006, NVIDIA has named their GPU microarchitectures after various scientists and inventors throughout history: Tesla (gpu, person), Fermi (gpu, person), Kepler (gpu, person), Maxwell (gpu, person), Pascal (gpu, person), and Volta (gpu, person). My laptop for instance has a Geforce GT 635M, which has compute capability of sm_21. I assumed that that was a Kepler. But, it turns out that was wrong. It’s actually a Fermi.

Unfortunately, you really cannot believe what you read sometimes (GPUBoss, accessed May 28, 2018), which led to my confusion:

“These chips are still based on Kepler (600-series), but feature more CUDA cores, more memory, a wider memory bus, and faster clockspeeds. by Tim-Verry (Jun, 2013)

“The range is powered by Kepler from bottom to top and brings great performance to mobile platforms. by Trace-Hagan (May, 2013)

Techpowerup.com seems to have the correct information listed.

Campy’s runtime is based on DotNetAnywhere. During the port, it was apparent that my old GPUs weren’t going to work because Campy needed to be compiled with “compute_30,sm_30”.  sm_30 is Kepler, not Maxwell. So, any architecture that is sm_30 or newer, Campy will be able to run on.

Release v0.0.10

After a lot of work on the metadata subsystem, I decided to release a new version of Campy. This release fixes a lot of issues with programs that use Net Core and Net Standard, how it reads assemblies, and how it finds and allocates objects used in kernels. The memory allocation subsystem was also improved, although it is still just a first-fit free block allocator. There are some corrections for various CIL instructions, like ldlen, ldnull, and newobj. Generics still do not work. After some thought, rewriting a generic instance like “List<int>” into a non-generic  Mono.Cecil.TypeDefinition where the name is “List<int>”, and every damn CIL instruction that references a generic argument is rewritten, isn’t going to work when System.Reflection is added. FFT finally works again, although through that test case, I found out more than I bargained for. When building a Net Core app, it links with System.Numerics.dll in Net Core (C:\Program Files\dotnet\shared\Microsoft.NETCore.App\2.0.7\System.Numerics.dll). That DLL does not contain CIL, which you can verify yourself using DotPeek. It turns out that System.Numerics.dll, as well as netstandard.dll, “forwards types” to System.Runtime.Numerics.dll–which actually contains the CIL for methods, e.g., “Complex operator +(Complex, Complex)”, which is what FFT uses. Unfortunately, I found this out just as I was about to release Campy. Further, I also found out that the runtime framework DotNetAnywhere does not read x64 Net Core assemblies on Ubuntu. It turns out that DNA, which was written quite long ago, does not read 0x8664 machine PE files. So, many last minute changes to get the Ubuntu platform working.  It all means that there is still a lot to change in DNA to bring it up to snuff with respect to Mono, Net Core, Net Standard, and Net Framework.

PE, metadata, signatures, blobs, oh my!

After using the DNA code for a while, I’ve identified some of the problems with the implementation that need to be corrected. Other problems were noted in Matt Warren’s article, and in the original DNA Git repository. Several problems mentioned have already been fixed.

  • DNA does not conform to ECMA 335. There are missing table types, described below. The problem is that if any PE/assembly is read that contains one of these missing table types, DNA will not work, and likely you won’t even know! For example, in the original code, when reading a table that followed the missing table input, I recall it would segv because null would be passed to strlen. The following table illustrates the current state of DNA.
Table number (base 10) Type name In ECMA 335 6th Ed. June ‘12 In CodeProject 12585 In original DNA In Blazor DNA In GPU DNA so far
00 Module x x x x x
01 TypeRef x x x x x
02 TypeDef x x x x x
03 FieldPtr x
04 Field x x x x x
05 MethodPtr x
06 MethodDef x x x x x
07
08 Param x x x x x
09 InterfaceImpl x x x x x
10 MemberRef x x x x x
11 Constant x x x x x
12 CustomAttribute x x x x x
13 FieldMarshal x x
14 DeclSecurity x x x x x
15 ClassLayout x x x x x
16 FieldLayout x x x
17 StandAloneSig x x x x x
18 EventMap x x x x x
19
20 Event x x x x x
21 PropertyMap x x x x x
22
23 Property x x x x x
24 MethodSemantics x x x x x
25 MethodImpl x x x x x
26 ModuleRef x x x x x
27 TypeSpec x x x x x
28 ImplMap x x x x x
29 FieldRVA x x x x x
30
31
32 Assembly x x x x x
33 AssemblyProcessor x x
34 AssemblyOS x x
35 AssemblyRef x x x x x
36 AssemblyRefProcessor x x
37 AssemblyRefOS x x
38 File x x
39 ExportedType x x x
40 ManifestResource x x x
41 NestedClass x x x x x
42 GenericParam x x x x x
43 MethodSpec x x x x
44 GenericParamConstraint x x x x x

 

  • The parser for signatures is just terrible. The parser should be an LL-like parser, which it sort of does on first glance seems to resemble, but actually isn’t. For example, MetaData_DecodeSigEntry() is used to decode the signature entry field. But, it is also called in many other places to just get a 32-bit unsigned integer. IT SHOULD NOT! That’s not how parsers should ever be written! It should follow the syntax descriptions of the ECMA 335 spec, section II.23.2, and from that, using the Dragon Book, a nice implementation written. This code needs to be completely rewritten.
  • There is no tool for a human readable print out of the PE file metadata tables for debugging. I have added “CampyPeek” to fix this problem.
  • Old Blazor code changed MetaData_DecodeSigEntry() in metadata.c, but it isn’t clear why. I will need to chase this down.
  • Assembly resolution in DNA is a problem for the GPU. In DNA, assembly “resolution” is sort of done with function CLIFile_Load() in CLIFile.c. “Probing” occurs here, just opening the file in the current directory. Unfortunately, probing can only work if the files are pre-loaded into the GPU file system. So, assembly resolution doesn’t following that in the standard sense of the term. For the moment, I will assume that all assemblies are placed in the directory of the executable. For Net Core programs, this is already done with a “publish”. I will need to figure out a good solution for Net Framework programs.
  • DNA does not seem to handle a number of Net Standard and Net Core assemblies: netstandard.dll (contains table type ExportedType), System.Numerics.dll (machine type 0x8664). This is the most critical problem, since it blocks execution of Net Core–and hence, an important aspect of Campy.
  • DNA does not implement type forwarding within it’s metadata reader. So, a Net Standard library may reference a type in netstandard.dll, but it cannot resolve the type to its implementation in a referenced assembly. I’ve identified in DNA that MetaDat_GetTypeDefFromName() in MetaData_Search.c that should be modified.

References

https://www.codeproject.com/Articles/42649/NET-file-format-Signatures-under-the-hood-Part#FieldSig4.1

http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-335.pdf

http://www.ecma-international.org/publications/standards/Ecma-335.htm

https://www.codeproject.com/Articles/42655/NET-file-format-Signatures-under-the-hood-Part

https://www.codeproject.com/Articles/12585/The-NET-File-Format

https://www.codeproject.com/Articles/12585/The-NET-File-Format#MetaTables

https://www.codeproject.com/Articles/42649/NET-file-format-Signatures-under-the-hood-Part#FieldSig4.1

 

What happens when an unstoppable force meets an immovable object?

Back in October 2017–which seems so long ago, but has been only 8 months–I was looking around for a NET runtime to use for Campy. It was apparent that in order to support C# on a GPU beyond value types, I was going to need a NET framework runtime. Why? It turned out there were many calls into C code, which depended on what runtime the program was compiled against. Even if you ignore this, you still need a meta on the C# side of Campy in order to get the size and alignment of fields in value and reference types when you allocate and copy objects from the CPU to GPU. The JIT compiler has this sort of baked into the code already, but it still needs to be formally added.

So, like any good programmer, I looked around. What I found were big, bloated packages: Mono, CoreCLR, etc. The NET framework that Campy needed I assumed would be a very small substituting layer for only the lowest layer of classes. Understand that GPUs don’t have file IO, don’t have threads in the classic OS sense, and many other things. So, the assumption here is that the lowest level layer isn’t changing, and hasn’t changed for a long time. Therefore, any class that uses the lowest level layer isn’t going to have problems calling into that layer because it is probably the same everywhere. Whether this assumption remains valid only time will tell. And, I can always use one of those bloated frameworks if my assumption is incorrect. But, there were greater problems–like writing a compiler for CIL, so I went fishing.

I came across an article in CodeProject, DotNetAnywhere: An Alternative .NET Runtime. Despite it not being modified for six years, I was heartened to learn that another project called Blazor was using DNA. (I learned a few months ago that Blazor switched to Mono two weeks after the CodeProject post.) So, I decided to port DotNetAnywhere (DNA) to CUDA. That turned out to be not terribly hard, but then I discovered the really big problems: DNA does not work in 64-bits, and there are quite a few bugs in reading the metadata tables. While I congratulate Chris Bacon for writing a good tool, DNA has a lot of problems. I fixed the code so that it runs on a 64-bit target. But, if an assembly contains metadata tables that aren’t supported by DNA, it craps out. And, I just found out that if I declare a field as an array of System.Numerics.Complex, DNA says the type of the field is an SByte!

At this point, I’m kind of committed to using DNA for Campy. I will be fixing the code that reads PE files, including code to read all tables in the ECMA 335 spec, and parsing the signature blobs robustly. I will also be writing a tool to read and output in a human readable format NET assemblies, similar to DotPeek, but with output to stdout so it can be used as a regression tool. As an old coworker said long ago about software: sometime you just have to pound it into submission.