Campy

First things first

Posted on February 7, 2019February 18, 2019 by kaby76

In order to rewrite the VM of Net Core, I have to get the VM source code to compile under Clang. Then, using Piggy, I can start to convert the C++ code to C#. The first block of code I plan to work on is the PE file reader.

While I can build Coreclr on Windows through a VS Developer Command Window, Cmake fails to generate build files within a Mingw Bash shell. In fact, I had no idea whether Clang was used even on Linux, because build.sh is opaque: there are no options to the script to display compilations (at least not that I can see). Years ago, make had such an option, but it seems to have changed. It is now done via “make VERBOSE=1”.

But, using the Ubuntu WSL c:/Windows/System32/bash.exe, I was able to get the build working, and examine the compile commands. Note it’s important to install the various prerequisites, otherwise Coreclr won’t build.

The question is very simple: what compiler options do I need for Clang++ in order to compile VM/*.cpp using Piggy? After a bit of work, a typical compile is this:

pushd /mnt/c/Users/Kenne/Documents/coreclr/bin/obj/Linux.x64.Debug/src/vm/wks; /usr/bin/clang++ -DAMD64 -DBIT64=1 -DBUILDENV_CHECKED=1 -DDBG_TARGET_64BIT=1 -DDBG_TARGET_AMD64=1 -DDBG_TARGET_AMD64_UNIX -DDBG_TARGET_WIN64=1 -DDEBUG -DDEBUGGING_SUPPORTED -DDISABLE_CONTRACTS -DFEATURE_ARRAYSTUB_AS_IL -DFEATURE_CODE_VERSIONING -DFEATURE_COLLECTIBLE_TYPES -DFEATURE_CORECLR -DFEATURE_CORESYSTEM -DFEATURE_CORRUPTING_EXCEPTIONS -DFEATURE_DBGIPC_TRANSPORT_DI -DFEATURE_DBGIPC_TRANSPORT_VM -DFEATURE_DEFAULT_INTERFACES=1 -DFEATURE_EVENTSOURCE_XPLAT=1 -DFEATURE_EVENT_TRACE=1 -DFEATURE_HIJACK -DFEATURE_ICASTABLE -DFEATURE_ISYM_READER -DFEATURE_JUMPSTAMP -DFEATURE_LEGACYNETCF_DBG_HOST_CONTROL -DFEATURE_MANAGED_ETW -DFEATURE_MANAGED_ETW_CHANNELS -DFEATURE_MANUALLY_MANAGED_CARD_BUNDLES -DFEATURE_MULTICASTSTUB_AS_IL -DFEATURE_MULTICOREJIT -DFEATURE_MULTIREG_RETURN -DFEATURE_PAL -DFEATURE_PAL_ANSI -DFEATURE_PAL_SXS -DFEATURE_PERFMAP -DFEATURE_PERFTRACING=1 -DFEATURE_PREJIT -DFEATURE_READYTORUN -DFEATURE_REJIT -DFEATURE_STANDALONE_GC -DFEATURE_STANDALONE_SN -DFEATURE_STRONGNAME_DELAY_SIGNING_ALLOWED -DFEATURE_STRONGNAME_MIGRATION -DFEATURE_STUBS_AS_IL -DFEATURE_SVR_GC -DFEATURE_SYMDIFF -DFEATURE_TIERED_COMPILATION -DFEATURE_USE_ASM_GC_WRITE_BARRIERS -DFEATURE_USE_SOFTWARE_WRITE_WATCH_FOR_GC_HEAP -DFEATURE_WINDOWSPHONE -DFEATURE_WINMD_RESILIENT -DLINUX64 -DPLATFORM_UNIX=1 -DPROFILING_SUPPORTED -DUNICODE -DUNIX_AMD64_ABI -DUNIX_AMD64_ABI_ITF -DURTBLDENV_FRIENDLY=Checked -DWRITE_BARRIER_CHECK=1 -D_AMD64_ -D_BLD_CLR -D_DBG -D_DEBUG -D_SECURE_SCL=0 -D_TARGET_64BIT_=1 -D_TARGET_AMD64_=1 -D_UNICODE -D_WIN64 -I/mnt/c/Users/Kenne/Documents/coreclr/bin/obj/Linux.x64.Debug/src/vm/wks -I/mnt/c/Users/Kenne/Documents/coreclr/src/vm/wks -I/mnt/c/Users/Kenne/Documents/coreclr/src/vm -I/mnt/c/Users/Kenne/Documents/coreclr/src/pal/prebuilt/inc -I/mnt/c/Users/Kenne/Documents/coreclr/bin/obj -I/mnt/c/Users/Kenne/Documents/coreclr/src/pal/inc -I/mnt/c/Users/Kenne/Documents/coreclr/src/pal/inc/rt -I/mnt/c/Users/Kenne/Documents/coreclr/src/pal/src/safecrt -I/mnt/c/Users/Kenne/Documents/coreclr/src/inc -I/mnt/c/Users/Kenne/Documents/coreclr/src/strongname/inc -I/mnt/c/Users/Kenne/Documents/coreclr/src/inc/winrt -I/mnt/c/Users/Kenne/Documents/coreclr/src/debug/inc -I/mnt/c/Users/Kenne/Documents/coreclr/src/debug/inc/amd64 -I/mnt/c/Users/Kenne/Documents/coreclr/src/debug/inc/dump -I/mnt/c/Users/Kenne/Documents/coreclr/src/md/inc -I/mnt/c/Users/Kenne/Documents/coreclr/src/classlibnative/bcltype -I/mnt/c/Users/Kenne/Documents/coreclr/src/classlibnative/cryptography -I/mnt/c/Users/Kenne/Documents/coreclr/src/classlibnative/inc -I/mnt/c/Users/Kenne/Documents/coreclr/bin/obj/Linux.x64.Debug/src/inc -I/mnt/c/Users/Kenne/Documents/coreclr/src/pal/inc/rt/cpp -I/mnt/c/Users/Kenne/Documents/coreclr/src/nativeresources -I/mnt/c/Users/Kenne/Documents/coreclr/src/vm/amd64 -Wall -Wno-null-conversion -std=c++11 -g -O0 -fno-omit-frame-pointer -fms-extensions -fwrapv -fstack-protector-strong -ferror-limit=4096 -Werror -Wno-unused-private-field -Wno-unused-variable -Wno-microsoft -Wno-tautological-compare -Wno-constant-logical-operand -Wno-pragma-pack -Wno-unknown-warning-option -Wno-invalid-offsetof -Wno-incompatible-ms-struct -fsigned-char -nostdinc -fPIC -o CMakeFiles/cee_wks.dir/__/peimage.cpp.o -c /mnt/c/Users/Kenne/Documents/coreclr/src/vm/peimage.cpp; popd

This long, rambling command contains a wealth of information necessary to use Clang. With this in hand, I can now try to use Piggy.

Unfortunately, Clang of the code on Windows doesn’t work well. It seems easier to get Piggy to work under Linux than for Coreclr to compile with Clang on Windows.

Note, there is going to be a lot of work needed to develop the patterns required to do the conversion. Piggy itself will need to be extended to allow both Listener- and Visitor-patterns because the Visitor-pattern is insufficient.

Ken

A few notes on Coreclr

Posted on February 2, 2019February 7, 2019 by kaby76

I’m starting to go through the details of Coreclr, first with System.Private.CoreLib. Here are a few notes…

I do not understand how people in Microsoft can actually modify the Coreclr without using an IDE, but it’s not apparent whether they do or not. There are .sln files for Visual Studio 2017, but I can’t open any of them after a “git clone” for two reasons: (a) missing generated build files and code; (b) unset environmental variables. For …/coreclr/bin/obj/Windows_NT.x64.Debug/CoreCLR.sln, I can open the solution after doing a build. For …\coreclr\src\System.Private.CoreLib\System.Private.CoreLib.sln, I am able to hack my way past this problem after setting an environmental variable. Steps: (a) run “./build.cmd” from within a “Developer Command Prompt for VS 2017” cmd.exe, at the top-level directory of Coreclr after checking out the source from the Github.com repository. (b) set the environmental variable PYTHON to the Python source (try “command -v python”). With these changes, I can open and start using an IDE to investigate the code. It is completely impractical to navigate quickly around the source with a straight editor. Instructions should be somewhere e.g. in the same directory as the .sln file, or even in the .sln file. Bogus, but there you have it.
The attribute [MethodImplAttribute(MethodImplOptions.InternalCall)] is used in 556 places, some nested in PLATFORM_WINDOWS. For Campy, most InternalCall’s should be rewritten in C# code, and what can’t be rewritten, available in a DLL for each platform. For example, for an Array create, we have “[MethodImplAttribute(MethodImplOptions.InternalCall)]
private static extern unsafe Array InternalCreate(void* elementType, int rank, int* pLengths, int* pLowerBounds); ” Some form of array create must be offered by the runtime, and work for all the different platforms. The plan is to rewrite the entirety, the VM, type system, GC, etc. in C# code. There are 19 InternalCall methods for the Array class, all of which need to be rewritten.
The method “[MethodImplAttribute(MethodImplOptions.InternalCall)] private static extern unsafe void InternalSetValue(void* target, object value);” can be easily rewritten and should be to unsafe C# code, which I wrote initially for Campy, but changed it after integrating the DNA runtime. This is just one example of how this code is not platform independent! Other examples are Array.Length, LongLength, and GetLength(int).
A few weeks ago, I decided to check whether Net Core and Net Framework actually conform to Net Standard–as I do not believe anyone or anything and afraid of zilch (I ice climb for relaxation). It took a little while, but I am mostly convinced (http://codinggorilla.com/?p=1578). And, I now know what I have to do to get Campy to conform to Net Standard.
Whatever changes I make to Coreclr, I’ll want to easily pick up all the latest changes. So, I’ll be cloning the repo and working with that.
The guts of Net Core is the VM and type system itself, which will need to be entirely rewritten because, yes, it is not platform independent code (it doesn’t work on a GPU for example). The VM is described in the Net Core documentation. The VM and type system source is here.
I was finally able to get a change to Array.Copy into a local System.Private.CoreLib.dll and run it. The instructions MS provides are pretty much junk–they don’t work. Debugging of CoreRun.exe is given here, but it’s pretty convoluted. Just open VS2017, then File|Open Project/Solution, then navigate and select the CoreRun.exe executable. Click on Properties, and fill in the args, which will be the Net Core console app you want to run. The go. Note, the steps to create a console app that tests your Net Core changes follows:
1. Create a Net Core test program using the normal Dotnet.exe.
2. Build the test program, and “publish” it with win-x64 self-contained executable. That should be in <published directory>.
3. Build a local Coreclr with your modifications in System.Private.CoreLib.
  1. .\build.cmd
4. cp …\coreclr\bin\Product\Windows_NT.x64.Debug\System.Private.CoreLib.dll <published directory>
5. Run CoreRun.exe <published directory>\application .exe.
I’ve noticed that in stepping through CoreRun.exe, F11 doesn’t work sometimes in stepping into a function, like CoreHost2::ExecuteAssembly(). You might need to step at the assembly level to get to C++ functions or set up your breakpoints beforehand.
Let the fun begin! For starters, I’ll continue to make changes to Array and other types that use InternalCall functions, and see if I can make a local Net Core that works. Independently, I’ll also start to translate all C++ code in the VM using several automated translation tools to C# code and check what platform issues there are there. I have a lot of tools at my disposal, including Piggy, a powerful transformational system which I wrote, and which I will be extending to deal with automatic translations of C++.

–Ken

Campy release v0.0.16

Posted on January 14, 2019 by kaby76

So, I’m getting back into the swing of things here. I’ve made a release of Campy v0.0.16. This release adds in the change for CUDA, whereby Campy interfaces directly with the graphics card drivers. The NVIDIA GPU Toolkit is now no longer required, except when building Campy from scratch. The NuGet package is also smaller because it contains only one target–netstandard2.0.

In order to save time, starting from this release, I’m suspending the builds of the Linux target for now. It only adds to the development time for Campy–which is still just “alpha”. I’ll add in the Linux target once the entire runtime and compiler are completely cleaned up.

For the runtime, I’m rewriting all the nasty C/C++ code that the “experts” wrote over the years in Net Core, replacing that code with C#, or unsafe C# code. It will first require a complete rewrite of the meta system. The goal is to make the compiler and runtime portable to any CPU or GPU using a bootstrap process.

Rewriting the runtime will require a lot of work–“years” according to the “experts”. But I am not shy from a difficult task. I don’t think it will require “years”. But, I have basically a whole lot of free time and an axe to grind.

Campy now using Piggy to access CUDA

Posted on January 13, 2019January 14, 2019 by kaby76

Finally, some good news. After three months of work, which included writing from scratch a new, powerful transformational system called Piggy, the SWIG-generated CUDA interface that Campy used has been removed, and a Piggy-generated CUDA interface substituted in its place. A release of Piggy was added to NuGet, which contains the Clang AST serializer, the Piggy tool, a basic set of templates, and the build rules to call the tool on the fly in a C# project. As a bonus, due to the fact that the dependency is directly on the CUDA drivers (nvcuda.dll), and not the NVIDIA GPU Computing Toolkit, nothing is needed beyond the drivers for the graphics card, unless you intend to build Campy itself from scratch. So, Campy is CUDA 10/9/8 (and probably earlier versions) compatible.

Sometimes small steps are the most difficult. But, with this fix, I can move onto fixing the SWIG wrapper for LLVM, and then finally back on Campy. After this, I intend to start rewriting much of the native code in the Net Core runtime in order for it to be completely platform independent. Once that is done, C# should be portable to any CPU or GPU.

Back in the saddle

Posted on December 30, 2018 by kaby76

After a fairly large diversion into pinvoke generators, I’m starting to refocus on Campy. Just to see if it’s where I left it off, all seems okay in the source tree, with a few minute changes. I’ve changed the code to not have the “using Swigged.Cuda;” declarations. This forces references to Swigged.Cuda in the code to be explicitly prefaced with the package name. I’ll now be adding code to generate the interface to nvcuda, the low-level CUDA code that is installed on all machines with an NVIDIA graphics card. The GPU toolkit shouldn’t have to be installed.

Aside, the pinvoke generator I wrote is called Piggy. It goes beyond the other generators out there like SWIG, ClangSharp, or CppSharp. It works by turning the DFS abstract tree traversals that those tools do “inside-out”. Instead of writing code to print out declarations, you write patterns that contain embedded C# code or just plain text. The idea is roughly akin to what JSP did for HTML. You can read a lot more on my blog (1, 2, 3).

A diversion back to CUDA

Posted on October 5, 2018November 27, 2018 by kaby76

On the day I was going to release version 0.0.15 of Campy, NVIDIA released version 10 of the GPU Computing Toolkit. Scrambling, I tried to make an update to Swigged.CUDA that day, but in the end I didn’t include CUDA 10 in the release of Campy. The new Toolkit contained a lot of changes that made an update of Swigged.CUDA difficult. I was planning to work on a new interface with CUDA later on, but it now seems I have to address the issue at this point.

I went back to ManagedCuda, written by Michael Kunz, to see if I could use that for Campy. Last year, I looked at ManagedCuda, and hoped that I could use it, but I couldn’t, and wrote Swigged.CUDA instead. Unfortunately, not much has changed. Although Kunz partially updated his code in February 2018 for CUDA 9.1, there is no release in Nuget for that version, the latest still being CUDA 8 from May 2017. And, the update is only partial–it is missing several changes in CUDA 9.1 that should be there. It is also stuck in Net Framework.

Sebastian Urban made a fork of ManagedCuda in September 2017 and updated it as a Net Standard package. Unfortunately, Urban introduced a number of portability issues. Last week, I made a fork of ManagedCuda from Urban and updated it partially for CUDA version 10 last week. Older versions of ManagedCuda work because it references the driver nvcuda.dll, not the Toolkit per se. But, for Cublas and the other packages associated with ManagedCuda, the dll’s are tied to the Toolkit version.

Unfortunately, ManagedCuda is a hand-crafted API, requiring one to carefully examine the API headers for CUDA to see what needs to be changed–which is clearly error prone. This isn’t how software should be crafted.

The only viable solution is with a tool that reads the C++ header files and outputs nice clean C# code. Swigged.CUDA tries to follow this paradigm using SWIG, but it’s difficult to keep up with the changes in CUDA using the arcane rules of SWIG.

An alternative tool is ClangSharp, a tool written by Mukul Sabharwal. The tool is fast and looks promising. But, like SWIG, it also has problems. For example,

when applied to the GPU Toolkit header file cuda.h, the generated code does not compile due to anonymous struct declarations;
when generating an interface for Cublas, it pulls in bits of the CUDA runtime, because of a dependency to the CUDA runtime;
C-unions not handled. It generates sequential elements for each element in the union;
C-struct layout in C# is not correct. It should generate attributes to force explicit offsets, using the [StructLayout(LayoutKind.Explicit)] and [FieldOffset(…)] attributes. ManagedCuda was written correctly with this in mind (see https://github.com/kunzmi/managedCuda/blob/master/ManagedCUDA/BasicTypes.cs#L1650).

I am planning to fork ClangSharp, fix the above problems, and incorporate some of the ideas of SWIG into this tool, e.g., %ignore (so it does not generate CUDA runtime p/invokes for Cublas).

This will, unfortunately, be a drag on making significant progress to Campy, likely for several weeks, if not longer. I am receiving bug reports for Campy, so an ever-increasing juggle for attention.

Ken Domino

Update: November 5, 2018 — To solve this problem, I am writing an entirely new pinvoke generator called Piggy, which uses Clang ASTs and tree regular expressions. Every pinvoke generator available is either inflexible or difficult to use. I hope to have this working by the end of the month (November 2018). It’s unfortunate that the people working on Clang removed the XML serializer from libclang.

Update November 26, 2018 — Finally, after two months on this problem, Piggy is starting to work. It has taken an extraordinary effort–not unique since it’s just one guy working on something that is normally done by dozens of people.

Release v0.0.15

Posted on September 23, 2018September 23, 2018 by kaby76

The next release of Campy, available in the next couple days, focuses on improving stability. In this release, I made over 20 Git commits, fixing problems with the WriteLine() code.

Structs are passed by reference in methods, but weren’t always treated correctly as such, CIL ldarga.
The CIL newarr still used Mono.TypeReference.GetElementType() which does not work–it is not the same as casting the type to Mono.ArrayType, then calling ElementType!
Native code for runtime Array.Resize() was faulty–lingering DNA hangover.
System.Byte not handled in Campy. Fixed.
Struct deep copy fixed.
Adding in implementations for System.Math.
Fixing System_String_InternalIndexOfAny, and other functions in runtime–lingering DNA hangover.
Fixing deep copy to CPU of arrays created on GPU.
Adding lock-free managed object pointer table for runtime.
Rolling forward with fixed Swigged.cuda and Swigged.llvm for Ubuntu, and upgrading LLVM to official version 7.0.0.
General code clean up.

Moving forward, there is much work to do.

I am considering how to best handle upgrading CUDA and LLVM with new releases, and handling older versions. SWIG is not robust enough.
The builds still need to be automated. I’m not sure how to handle the GPU aspect of this. I’m hoping to get some free time donated by a large hosting company.
The runtime must be rewritten so that it’s Net Standard 2+ conforming, and free of the native (C/C++) code. This is the big problem with moving forward with an AMDGPU target. Alternately, SPIR might work.
There should be an AOT compiler tool for compiling any C# directly.
Retarget the compiler to x86_64, for an alternative of Net Core!

Campy is moving forward. However, it is just one person–me–writing basically an entire CoreRT/Mono/… all by myself, whereas you have a whole army working on each of those Microsoft projects. I can only go so fast. But, trust me. I will get there!

–Ken Domino

Correcting the building of new releases

Posted on September 12, 2018September 12, 2018 by kaby76

After taking a short break to accompany my 93-year-old dad to Japan, and hike Mount Fuji, I’m going to address the build issues for releases. Undoubtedly one reason why some haven’t gotten Campy to work (https://sigma.software/about/media/gp-gpu-computing-c) is that the releases are from my build machine. Yes, I am not naive. I knew this was a problem but hadn’t any money to buy a new cheap machine to do builds. This is finally being corrected with the purchase of new AMD Ryzen hardware and Quadro card.

New release, same ol’ same ol’

Posted on September 3, 2018September 3, 2018 by kaby76

I have been trying a few things, and unfortunately, Campy crashes and burns in some cases due to CCTOR initialization order. E.g.,

Executing cctor System.Void System.Console::.cctor()
Executing cctor System.Void System.Environment::.cctor()
Executing cctor System.Void System.String::.cctor()
Executing cctor System.Void System.NumberFormatter::.cctor()
Executing cctor System.Void System.Globalization.NumberFormatInfo::.cctor()
Executing cctor System.Void System.NumberFormatter/NumberStore::.cctor()
Executing cctor System.Void System.Globalization.CultureInfo::.cctor()

To check, you can set a compiler option to turn on tracing of the CCTOR calls calling Campy.Utils.Options.Set(“trace-cctors”); right before calling Campy.Parallel.For(). If it doesn’t say “Done with cctors”, you can probably be sure that’s the problem. Anyways, I fixed that problem by reordering the calls. I then found out that despite my best efforts in generics, the compiler messes up on generics (in List.cs). In this code, it creates an array with the wrong element type:

public List(int capacity) {
	if (capacity < 0) {
		throw new ArgumentOutOfRangeException("capacity");
	}
	this.items = new T[capacity];
	this.size = 0;
}

I know the fix, but I just won’t be able to address it for a couple weeks–going on vaca. But, I checked in some code changes that improve on the compiler generated names of methods.

Enjoy. –Ken

Release v0.0.14

Posted on August 25, 2018September 3, 2018 by kaby76

(Update: Campy 0.0.14 is now released, Sept 2, 2018. –Ken)

The next release of Campy, which will happen sometime this week (8/27-9/3), will have many changes in order to compile and run a call to System.Console.WriteLine() on a GPU. There are also changes for correctness and stability of the compiler, on its ever-so-slow march towards the ECMA 335 standard.

As noted in a previous blog post, using System.Console.WriteLine() in kernel code exercises quite a bit of the compiler, pulling in a large chunk of the GPU runtime to compile, starting from the user’s kernel code containing just a WriteLine call. In developing this release, it was taking over 6 minutes to compile due to the number of types/virtual functions pulled in. This was a very long time in the debug-fix-recompile cycle.

Consequently, I’ve started looking at the performance of the compiler. For example, a classic problem in compilers is computing the back edges of a graph, and subsequently creating a topological ordering of the nodes without back-edges. For a C# method, it’s guaranteed that the basic blocks of the method can’t have branches outside the method, so this computation need only consider the basic blocks for the method, not the whole graph–which is what Campy’s IMPORTER class would do for each method discovered! This fix improves the importer run time from 5m 48s to 4m 5s for the WriteLine() example. Another aspect central to the compiler is the extension method ToMonoTypeReference(this System.Type type). This method converts System.Types into Mono Cecil types in code analysis. The performance of this method was improved by memoization. This fix improves the importer run time from 4m 5s to just 5s! Note: performance analysis was done with DotTrace, an incredibly valuable tool. The fixes in this release vastly improve the performance of the compiler, but I’m sure there are more.

However, the following clip (taken with OBS Studio) shows the debugger stepping into the GPU runtime for WriteLine() on an NVIDIA GPU.

The code fails because Campy does not yet generate code to execute the .cctor’s, of which System.String.Empty is one. I expect this fix to be one of the last changes before the release.

With these changes, Campy now supports a lot of C# and a bit of the runtime, surpassing the capability of Alea GPU, Altimesh, and ILGPU, at least in what it can support (reference types, value types, new operator, generics, static and instance fields, static and instance methods, virtual methods, strings, multidimensional arrays, .cctors, enums, switch statements, conversions, native code).

In the 50 day since the last release on July 6, over 500 hours of programming work was done (including a week for vacation).

This version compiles and runs System.Console.WriteLine() within kernels.
The new operator in kernel (newarr, newobj, initobj) in now supported.
ldtoken and constrained CIL instructions are now supported.
Reflection is now supported.
C# try/catch/finally (including with nesting) is now supported. However, exceptions are not supported because CUDA cannot catch exceptions in kernel code. Instead, Campy generates code that executes try/finally code in order.
Static fields of structs and classes are now supported.
There were numerous fixes to CIL instructions: compare instructions, brfalse/brtrue, ldobj, stobj.
There were numerous fixes for generic types and generic methods. Method TypeReference.Deresolve() was written to extends Mono Cecil to convert a generic back to a generic instance given type information computed by the compiler.
Fields within a class/struct hierarchy are now supported.
Moved to stack allocation runtime model for ldarga, ldloca, ldarg, ldloc. In the future, I may add an optimization whereby args and locals are simply copied directly from the stack–if the method does not contain ldarga or ldloca.
Changes to get the compiler and runtime to use types and sizes as specified in ECMA 335 for the abstract machine/compilation (storage type, intermediate type, etc.
Compilation and execution of .cctor methods on the GPU prior to kernel execution. Initialization of array data implemented (e.g., see System.Console.Empty).
“Intelligent virtual method compilation”–the compiler will deduce the types/virtual methods for a callvirt CIL instruction, pulling in all possibilities. Users do not need to tag methods/types with attributes to know what can run on the GPU.
Support for switch CIL instruction.
Much better integration of compiler with the runtime meta system.
Numerous bug fixes.
Much general clean up and refactoring. Removing of some kludges.
Rolling forward to LLVM 7.0 (pre-release).
Rolling forward to CUDA 9.2.148.
All unit tests pass.

Again, please note that Campy is still in development. However, I am proud of this release, even with all the bugs still in play (“i.ToString()” crashes, so there is a lot of work still to do). The compiler is really coming into its own, and running in reasonable time. It was a ton of work, many late nights, long hours. But, the job is, of course, never quite done.

Once Campy works with the full range of CIL instructions and data types in the GPU runtime, I’ll be working on separate compilation, then targeting the compiler to AMD GPUs. In addition, while the cctor initialization code currently runs on the GPU in a single thread before the main kernel of the Parallel.For(), Campy should compile and run constructors on the CPU. Fortunately, Campy uses LLVM, so potentially it could compile and target the x64 to run on the CPU. Finally, something has to be done with the deep copying of the C# data structures to shared memory.

If you have any questions, or problems getting Campy working, please let me know.

Ken Domino (ken.domino AT gmail.com)