April 2018 – Campy

Upcoming Releases

Posted on April 29, 2018May 3, 2018 by kaby76

The next release or two of Campy will be hammered out over the following weeks.

One will deal with the implementation of C# generics, which regressed a few months ago after the move to the GPU BCL reference type allocation. It kind of didn’t work all that well, and was a kludge, so it needed to be rewritten. Further, much of the BCL uses generics, e.g., System.Console.WriteLine(), so this must be sorted out as soon as possible.

The other will deal with Campy on Ubuntu. There isn’t any really good reason why Campy cannot be run on Ubuntu, so that also will be fixed. There is already a build for Swigged.LLVM for Ubuntu, and there will be a build of Swigged.CUDA for Ubuntu shortly. I’ll also need to get the GPU BCL of Campy to compile on Ubuntu, but it shouldn’t be any harder than the previously mentioned libraries.

I’m not sure which feature will come first, but generally speaking, a new version of Campy should be available every few weeks.

Support of enum types (13).
Performance improvement in basic block discovery of kernel code (77cee89).
Fix to GPU BCL type system initialization (14).
Partitioning the build of the runtime from the compiler so that it can be built for Linux. Adding in Linux build. There are a number of ways I’m looking into how to do the build, including the Linux C++ build feature in Visual Studio.
Rewriting the compiler so that phases are chained methods and renaming the phases that indicate what each does.

Release 0.0.8

Posted on April 18, 2018April 21, 2018 by kaby76

Campy version 0.0.8 has been released. The changes to Campy since release 0.0.7 have centered around the integration of the GPU BCL (i.e., the “Dot Net Anywhere” runtime that is being used on the GPU after porting to CUDA C) into the compiler. Unfortunately, the effort has set into motion a large number of changes. Some of those changes I expected, but many were not.

Up to now, C# objects on the GPU were allocated using a “malloc” of pinned memory. This memory was allocated in C# on the host CPU using Cuda.cuMemHostAlloc(), and is accessible on the CPU and GPU. But, C# objects are managed, meaning that the BCL should know the type of the object when a pointer is passed to it. With the recent changes to Campy, C# objects accessible on the GPU are now allocated using the GPU BCL. (987209c and others).
The GPU BCL needs to be accessible on both the GPU and CPU because the memory allocation on the CPU needs to be recorded by the GPU BCL. Considerable time was devoted to figure out how to write C# code to call unmanaged C code in a DLL that contains the GPU BCL (example). For the GPU, a static .LIB file is generated that contains pre-linked code (via nvcc -dlink). For the CPU, an unmanaged layer written in C/C++ is provided in a DLL. C# code calls the DLL API using P/Invoke.
The assembly containing the kernel needs to be loaded by the GPU BCL. Campy “worked” before but used the meta only on the CPU side (using Mono.Cecil). The GPU BCL now reads the meta for any assemblies referenced.
Even though Campy is supposed to be Net Standard 2.0 code, “dotnet build” of Campy wouldn’t build. As it turned out, Swigged.LLVM and Swigged.CUDA contained references to native libraries which prevented building Campy via Dotnet.exe. Those packages have been updated so that the native libraries are now in the proper sub-directory (Swigged.LLVM 6.0.5; Swigged.CUDA 9.185.5).
The pre-build code in the .TARGETS file of Swigged.LLVM and Swigged.CUDA don’t work with “dotnet build” because Dotnet does not create output directories before the running the pre-build steps. The build now performs a copy using a completely different Msbuild mechanism.
This release fixes line-oriented debugging of kernel code (11). Due to quirkiness of Mono Cecil (2116ef7), method references in CIL call instructions would not have debugging symbols loaded. A problem in instruction discovery (IMPORTER) existed with CIL call rewrite: the offset of the instruction was not set. These problems are now fixed.
Many compiler warnings were cleaned up. A Dotnet build of Campy is completely error and warning free.
Note, Dotnet works in a different directory from the application that you build. In order to find all dependent dlls and libs, you will need to change directory to the application, or “publish -r win10-x64” the application. Finding dlls is still a mess, but Campy with Net Core and Net Framework does work.
Nsight does not work with Net Core apps. I have no idea why Nsight is so messed up. Build the application as a Net Framework app, and it’ll all work as expected. Make sure it’s a 64-bit app you are building; Campy only works with 64-bit apps.

The release is in Nuget.org. You will need Net Core 2.0 and have CUDA GPU Toolkit 9.1.45 installed on Windows. To take full advantage of Campy, e.g., debugging with Nsight, you will need Visual Studio 15. You should be able to use the latest version of Visual Studio, although I haven’t tried because the GPU toolkit compiles C++ with VS version 15.4. Dotnet published Net Core 2.0 apps should run with only the GPU Toolkit installed.

Release 0.0.7

Posted on April 4, 2018April 5, 2018 by kaby76

Release 0.0.7 of Campy fixes multidimensional arrays and adds simple line/column debugging information to the generated code.

Implement line debugging of kernel code (4c15bda, b7). Note, there are bugs still in the implementation: it works only for straight line code, no branching (see bug entry). This will be fixed in the next release. Also, you will probably need to use the “Start CUDA debugging (legacy)” menu command of the NVIDIA Nsight debugger version 5.5. The “Next Gen” debugger works only in TCC mode. Looking forward to NVIDIA allowing for combined CPU/GPU debugging in the future. Make sure to follow the instructions for Nsight. Set breakpoints in your C# kernel code before you start.
Fixing 2D arrays (1284d27).
Fix “ceq” instruction code generation (75b990d, b9). In certain situations, the compiler would generate incorrect code.

The release is in Nuget.org. You will need VS2017 15.4.5 and CUDA GPU Toolkit 9.1.45 installed on Windows. However, once developed, you only need the CUDA GPU Toolkit installed on the system that has the GPU card.