A diversion back to CUDA

On the day I was going to release version 0.0.15 of Campy, NVIDIA released version 10 of the GPU Computing Toolkit. Scrambling, I tried to make an update to Swigged.CUDA that day, but in the end I didn’t include CUDA 10 in the release of Campy. The new Toolkit contained a lot of changes that made an update of Swigged.CUDA difficult. I was planning to work on a new interface with CUDA later on, but it now seems I have to address the issue at this point.

I went back to ManagedCuda, written by Michael Kunz, to see if I could use that for Campy. Last year, I looked at ManagedCuda, and hoped that I could use it, but I couldn’t, and wrote Swigged.CUDA instead. Unfortunately, not much has changed. Although Kunz partially updated his code in February 2018 for CUDA 9.1, there is no release in Nuget for that version, the latest still being CUDA 8 from May 2017. And, the update is only partial–it is missing several changes in CUDA 9.1 that should be there. It is also stuck in Net Framework.

Sebastian Urban made a fork of ManagedCuda in September 2017 and updated it as a Net Standard package. Unfortunately, Urban introduced a number of portability issues. Last week, I made a fork of ManagedCuda from Urban and updated it partially for CUDA version 10 last week. Older versions of ManagedCuda work because it references the driver nvcuda.dll, not the Toolkit per se. But, for Cublas and the other packages associated with ManagedCuda, the dll’s are tied to the Toolkit version.

Unfortunately, ManagedCuda is a hand-crafted API, requiring one to carefully examine the API headers for CUDA to see what needs to be changed–which is clearly error prone. This isn’t how software should be crafted.

The only viable solution is with a tool that reads the C++ header files and outputs nice clean C# code. Swigged.CUDA tries to follow this paradigm using SWIG, but it’s difficult to keep up with the changes in CUDA using the arcane rules of SWIG.

An alternative tool is ClangSharp, a tool written by Mukul Sabharwal. The tool is fast and looks promising. But, like SWIG, it also has problems. For example,

  • when applied to the GPU Toolkit header file cuda.h, the generated code does not compile due to anonymous struct declarations;
  • when generating an interface for Cublas, it pulls in bits of the CUDA runtime, because of a dependency to the CUDA runtime;
  • C-unions not handled. It generates sequential elements for each element in the union;
  • C-struct layout in C# is not correct. It should generate attributes to force explicit offsets, using the [StructLayout(LayoutKind.Explicit)] and [FieldOffset(…)] attributes. ManagedCuda was written correctly with this in mind (see https://github.com/kunzmi/managedCuda/blob/master/ManagedCUDA/BasicTypes.cs#L1650).

I am planning to fork ClangSharp, fix the above problems, and incorporate some of the ideas of SWIG into this tool, e.g., %ignore (so it does not generate CUDA runtime p/invokes for Cublas).

This will, unfortunately, be a drag on making significant progress to Campy, likely for several weeks, if not longer. I am receiving bug reports for Campy, so an ever-increasing juggle for attention.

Ken Domino

Update: November 5, 2018 — To solve this problem, I am writing an entirely new pinvoke generator called Piggy, which uses Clang ASTs and tree regular expressions. Every pinvoke generator available is either inflexible or difficult to use. I hope to have this working by the end of the month (November 2018). It’s unfortunate that the people working on Clang removed the XML serializer from libclang.

Update November 26, 2018 — Finally, after two months on this problem, Piggy is starting to work. It has taken an extraordinary effort–not unique since it’s just one guy working on something that is normally done by dozens of people.

3 thoughts on “A diversion back to CUDA

    1. I started Campy several years ago and restarted working on Campy last year before I saw ILGPU last year. ILGPU hasn’t been updated for 8 months (https://github.com/m4rs-mt/ILGPU), so it is unclear whether that project is continuing. Right now, Campy includes a NET framework, which puts it further ahead of the other C# GPU compilers (ILGPU, Alea, Hybridizer). I am planning on getting back to Campy this month, and start fixing the chief complaint: the performance. I will then work on the runtime–forking the Net Core standard library and removing virtually all C/C++ code, rewriting that code with C# with the GPU in mind. The goal is to make it completely platform independent for any target–including CPU. Unfortunately, I’m still stuck in Pinvoke generator hell. All the current generators (SWIG, ClangSharp, CppSharp, Pinvoke Interop Assistant, xInterop) are either too difficult use, inflexible, or unavailable. As I have to qualify, it’s only me working on this huge project, so it will take some time. Thank you Vlad for the encouragement!

Leave a Reply

Your email address will not be published. Required fields are marked *

This blog is kept spam free by WP-SpamFree.