Architecture and Design Considerations
This document is incomplete. When time permits, things will be added. In the meanwhile, the following are notes of some general considerations.
What are the major components of Campy?
Campy is partitioned into these projects.
- Campy — the user API, containing Campy.Parallel.For()
- Campy.Compiler — the JIT compiler, including code to import user CIL assembly code. Right now, LLVM and CUDA calls are sprinkled all throughout.
- Campy.Runtime — the native code (C++/CUDA code which is run on the GPU), the CPU native C++ code (the same C++/CUDA code compiled for the CPU with a very thin export layer to expose a few functions), and the C# framework for Campy (the C# code that replaces Net Core/Framework/Mono).
- Campy.Graphs — an implementation of graphs, containing only the essentials used by Campy. (It used to be much bigger.)
- Campy.Utils — odds and ends.
- Tests/ — various unit tests for Campy, including reduction, scan, sort, etc.
- Swigged.CUDA — a separate NuGet package that exposes a C# API to the CUDA Driver API.
- Swigged.LLVM — a separate NuGet package that exposes a C# API to the LLVM-C API of LLVM. Separately, builds of LLVM are released for static linking at https://github.com/kaby76/llvm, since LLVM.org does not release any pre-built binaries of the core libraries.
What open source software does Campy use?
Any project, especially with one being developed by just one person, must rely on other works. Campy utilizes the following:
- C# NET Framework and NET Core
Why does Campy use LLVM?
Campy uses LLVM for JIT compilation of CIL code. There are several alternatives, but none offer the range of targets (both NVIDIA and AMD GPUs), and depth (optimizations and debugging additions). LLVM is under active development by many people.
One alternative which Camy used to do many years ago was translation of CIL into CUDA/C++ code, which is then compiled. Cudafy, Altimesh, and others seem to take this approach. However, this requires the compiler tool chain to be installed.
Why use DotNetAnywhere instead of Mono or Net Core or something else?
Campy requires a small Net Framework to support execution of CIL on a GPU. Unfortunately, Coreclr, Corert, Mono, NET Micro Framework (now NanoFramework) are large. In fact, I don’t want to supplant the framework that an app runs in; Campy substitutes only the very lowest level classes of the framework for the GPU base class layer. This only works because there is a deep copy of the data structures to the meta on the GPU. Dot Net Anywhere, with all its problems, was the best choice to get “up and running” as I did not want to rewrite an entire framework from scratch. The main problem is that CUDA requires C/C++ functions to be declared with the __device__ modifier in order for the function to be run on a GPU. Campy also requires a host/CPU version as well. None of the frameworks include a modifier on function declarations. To add the declaration, it requires a lot of edits. Campy substitutes classes that are highly OS-dependent code, and leaves the rest of the framework that the app uses unmodified. The CIL is rewritten using Mono.Cecil, a meta system that is excellent. Further, as I haven’t been officially employed as a software developer since 2000, I seriously doubt anyone would allow me to modify any of those code bases. However, consider all the problems I’ve been having with DotNetAnywhere, I will likely explore other frameworks, or write my own.
Why does Campy need Mono.Cecil?
Although DotNetAnywhere includes a meta type system capable of reading any assembly, Mono.Cecil is used to read the user’s app CIL code. It is a well-developed API, and superior to the DotNetAnywhere meta system and anything else out there.
What is the interface between C# and CUDA and LLVM?
Campy interfaces with CUDA and LLVM through a SWIG-generated layer. Why SWIG? The CUDA API is a well-organized API containing over 397 functions. Of those, only about 20 are actually used in Campy. Likewise, LLVM is a very large API, and for the same reason as with CUDA. Although one could use C++ Interop (Implicit P/Invoke), I did not know which functions would be used in Campy. So, to allow for maximal flexibility, the interface is SWIG generated. Further, LLVM is not delivered as a library (DLL or SO files). Due to the unstable and kludgy nature of SWIG, it would probably be good to work out an alternative, or to use C++ Interop alone.
What space are GPU objects allocated in?
Campy copies all C# data structures used in the closure object of a Campy.Parallel.For() call into memory that the GPU can access. All objects are allocated in pinned memory on the CPU so they can be modified by both CPU and GPU. Eventually, I expect the API to change so pure GPU memory can be used as an alternative for performance issues.
Campy does not seem to including any explicit methods to copy data to/from the GPU. Why not?
As part of the requirements, Campy does not expose things link cudaMemcpy, cudaMalloc, cudaFree, etc., because C# data objects are managed objects. While an API could expose that functionality, it is not in keeping with managed languages. That said, for performance issues, Campy offers Sticky(), Readonly(), and Sync() for performance improvements. Other mechanisms will be explored in time.
Campy doesn’t seem to include in the API a way to access GPU shared memory and thread cooperation. Why not?
Campy will be offering that at some point. I just haven’t had time yet to do that.
Why aren’t you selling this?
Campy is not for sale. It is open source software. Although I am unemployed, haven’t had a regular programmer job for many years, have almost no money to live on, I can’t sell it considering the software is too specialized, and that there is other software that does something similar (Alea, Altimesh, and ILGPU).