Documentation

Costs, License, Privacy

Campy is free and open source software under the MIT license. No information is collected, except for comments in this website. For further information on privacy, see this page.

Campy in under a minute

To try out Campy, you will need to be running Windows 10 or Ubuntu 16.04 on a x64 processor. Further, I assume you have an NVIDIA GPU Kepler (sm_30) or newer architecture installed (Maxwell/sm_50, Pascal/sm_60, Volta/sm_70), as well as the CUDA GPU Toolkit 9.1.85 installed. I recommend you use Net Core 2.1  (install NET SDK, ~5 minutes to install), and a Bash shell with the script given below. On Windows, you can use, for example, Cygwin, or MinGW, which is installed when you install Git (~5 minutes to install). If you prefer, you could use Powershell or even Cmd to perform the equivalent commands below. Note, I haven’t tried Campy in a Windows for Subsystem Linux, but I suspect it won’t work because of the issues in sharing the GPU with a Windows host. Campy works under the Mono system.

Finally, within Bash, copy and paste the following code.


#!/bin/bash
mkdir test
cd test
dotnet new console
cat - << HERE > Program.cs
namespace test
{
    class Program
    {
        static void Main(string[] args)
        {
            int n = 4;
            int[] x = new int[n];
            Campy.Parallel.For(n, i => x[i] = i);
            for (int i = 0; i < n; ++i)
                System.Console.WriteLine(x[i]);
        }
    }
}
HERE
dotnet add package Campy
dotnet build
unameOut="$(uname -s)"
case "${unameOut}" in
    Linux*)
	dotnet publish -r ubuntu.16.04-x64
	cd bin/Debug/netcoreapp2.1/ubuntu.16.04-x64/publish/
        ./test
	;;
    Darwin*)
	echo Cannot target Mac yet.
	exit 1
	;;
    CYGWIN*)
	dotnet publish -r win-x64
	cd bin/Debug/netcoreapp2.1/win-x64/publish/
        ./test.exe
	;;
    MINGW*)
	dotnet publish -r win-x64
	cd bin/Debug/netcoreapp2.1/win-x64/publish/
        ./test.exe
	;;
    *)
	echo Unknown machine.
	exit 1
	;;
esac
echo Output should be four lines of integers, 0 to 3.

Once an app is “published” as a self-contained deployment, it is completely sufficient. Non self-containing apps run the risk of Campy unable to resolve assemblies used by the program. I currently do not implement the rules outlined by Microsoft, but I will at some point.

As an alternative to a Net Core 2.0 app, you can install MS Visual Studio 2017 for development, Nsight for debugging, and create a Net Framework 4.71 app. Note: Nsight does not work with Net Core apps.

Examples

Examples of Campy are in Git, https://github.com/kaby76/Campy/tree/master/Tests, including Reduction, various sorting algorithms, FFT, etc.

The API

Philosophy of the API

  • The API must be very small. If the API is over just a handful of methods, it may be useful in optimizing the implementation for the GPU, but it’s impossible to remember.
  • GPU independent. There should not be any CUDA-specific code in the API. There should be a simple, idealized model of a GPU.
  • Memory management should be determined by the compiler. The user should not be burdened with knowing what to transfer to GPU global memory, or CPU pinned memory. The C# operator new should work on the GPU.
  • All of C# should work within kernel code, with the exception that methods that are clearly CPU bound, such as:
    • Thread, a thread is a CPU artifact. I haven’t decided how to support dynamic parallelism, but it probably won’t be through the System.Threading API.
    • Marshal.AllocHGlobal, as this is CPU oriented, indicating the memory pool characteristics. Use the C# new operator.

Namespace: Campy

Classes

Parallel Provides support for parallel loops.
Sequential Provides support for sequential loops in the same syntax as with Parallel.

Delegates

KernelType Encapsulates a basic kernel code for GPU that takes one parameter (integer index) and does not return a value.

 

Syntax

public delegate void KernelType(int idx)

You can use this delegate to pass a method as a parameter without explicitly declaring a custom delegate. The encapsulated method must correspond to the method signature that is defined by this delegate. This means that the encapsulated method must have one parameter and no return value. For information on delegates, see the Microsoft documentation.

 

public class Parallel

static void For(Int32, KernelType) Executes a for loop in which iterations may run in parallel on a GPU.
static void Readonly(object) Indicates to never copy object from GPU back to the CPU.
static void Sticky(object) Indicates to keep object on GPU until Sync() is called.
static void Sync() Indicates to copy objects on GPU back to the CPU.

 

public class Sequential

static void For(Int32, KernelType) Executes a for loop in which iterations run sequentially on a GPU.

 

NB: Sorry, cooperative threads, the great power of GPUs, are currently not supported but will be when Campy is far enough along and stable. Expect them to be added by July 2018.

Architecture

Please see this page for some documentation on the organization of Campy.

 

Comparison with other GPU/C# systems

Please see this page for a comparison of various GPU/C# software.

 

Stackexchange.com GPU/C#

[C#] GPU

GPGPU

Aleagpu

Altimesh

Dotnet GPU

 

CIL Instructions Implemented in Campy (Sep 13, 2018)

                Add 1
                Add_Ovf 1
                Add_Ovf_Un 1
                And 1
                Arglist 0
                Beq 1
                Beq_S 1
                Bge 1
                Bge_S 1
                Bge_Un 1
                Bge_Un_S 1
                Bgt 1
                Bgt_S 1
                Bgt_Un 1
                Bgt_Un_S 1
                Ble 1
                Ble_S 1
                Ble_Un 1
                Ble_Un_S 1
                Blt 1
                Blt_S 1
                Blt_Un 1
                Blt_Un_S 1
                Bne_Un 1
                Bne_Un_S 1
                Box 1
                Br 1
                Br_S 1
                Break 0
                Brfalse 1
                Brfalse_S 1
                Brtrue 1
                Brtrue_S 1
                Call 1
                Calli 1
                Callvirt 1
                Castclass 1
                Ceq 1
                Cgt 1
                Cgt_Un 1
                Ckfinite 0
                Clt 1
                Clt_Un 1
                Constrained 1
                Conv_I1 1
                Conv_I2 1
                Conv_I4 1
                Conv_I8 1
                Conv_I 1
                Conv_Ovf_I1 1
                Conv_Ovf_I1_Un 1
                Conv_Ovf_I2 1
                Conv_Ovf_I2_Un 1
                Conv_Ovf_I4 1
                Conv_Ovf_I4_Un 1
                Conv_Ovf_I8 1
                Conv_Ovf_I8_Un 1
                Conv_Ovf_I 1
                Conv_Ovf_I_Un 1
                Conv_Ovf_U1 1
                Conv_Ovf_U1_Un 1
                Conv_Ovf_U2 1
                Conv_Ovf_U2_Un 1
                Conv_Ovf_U4 1
                Conv_Ovf_U4_Un 1
                Conv_Ovf_U8 1
                Conv_Ovf_U8_Un 1
                Conv_Ovf_U 1
                Conv_Ovf_U_Un 1
                Conv_R4 1
                Conv_R8 1
                Conv_R_Un 1
                Conv_U1 1
                Conv_U2 1
                Conv_U4 1
                Conv_U8 1
                Conv_U 1
                Cpblk 0
                Cpobj 0
                Div 1
                Div_Un 1
                Dup 1
                Endfilter 0
                Endfinally 1
                Initblk 0
                Initobj 1
                Isinst 1
                Jmp 1
                Ldarg 1
                Ldarg_0 1
                Ldarg_1 1
                Ldarg_2 1
                Ldarg_3 1
                Ldarg_S 1
                Ldarga 1
                Ldarga_S 1
                Ldc_I4 1
                Ldc_I4_0 1
                Ldc_I4_1 1
                Ldc_I4_2 1
                Ldc_I4_3 1
                Ldc_I4_4 1
                Ldc_I4_5 1
                Ldc_I4_6 1
                Ldc_I4_7 1
                Ldc_I4_8 1
                Ldc_I4_M1 1
                Ldc_I4_S 1
                Ldc_I8 1
                Ldc_R4 1
                Ldc_R8 1
                Ldelem_Any 1
                Ldelem_I1 1
                Ldelem_I2 1
                Ldelem_I4 1
                Ldelem_I8 1
                Ldelem_I 1
                Ldelem_R4 1
                Ldelem_R8 1
                Ldelem_Ref 1
                Ldelem_U1 1
                Ldelem_U2 1
                Ldelem_U4 1
                Ldelema 1
                Ldfld 1
                Ldflda 1
                Ldftn 1
                Ldind_I1 1
                Ldind_I2 1
                Ldind_I4 1
                Ldind_I8 1
                Ldind_I 1
                Ldind_R4 1
                Ldind_R8 1
                Ldind_Ref 1
                Ldind_U1 1
                Ldind_U2 1
                Ldind_U4 1
                Ldlen 1
                Ldloc 1
                Ldloc_0 1
                Ldloc_1 1
                Ldloc_2 1
                Ldloc_3 1
                Ldloc_S 1
                Ldloca 1
                Ldloca_S 1
                Ldnull 1
                Ldobj 1
                Ldsfld 1
                Ldsflda 1
                Ldstr 1
                Ldtoken 1
                Ldvirtftn 0
                Leave 1
                Leave_S 1
                Localloc 0
                Mkrefany 0
                Mul 1
                Mul_Ovf 1
                Mul_Ovf_Un 1
                Neg 1
                Newarr 1
                Newobj 1
                No 0
                Nop 1
                Not 0
                Or 1
                Pop 1
                Readonly 0
                Refanytype 0
                Refanyval 0
                Rem 1
                Rem_Un 1
                Ret 1
                Rethrow 0
                Shl 1
                Shr 1
                Shr_Un 1
                Sizeof 1
                Starg 1
                Starg_S 1
                Stelem_Any 1
                Stelem_I1 1
                Stelem_I2 1
                Stelem_I4 1
                Stelem_I8 1
                Stelem_I 1
                Stelem_R4 1
                Stelem_R8 1
                Stelem_Ref 1
                Stfld 1
                Stind_I1 1
                Stind_I2 1
                Stind_I4 1
                Stind_I8 1
                Stind_I 1
                Stind_R4 1
                Stind_R8 1
                Stind_Ref 1
                Stloc 1
                Stloc_0 1
                Stloc_1 1
                Stloc_2 1
                Stloc_3 1
                Stloc_S 1
                Stobj 1
                Stsfld 1
                Sub 1
                Sub_Ovf 1
                Sub_Ovf_Un 1
                Switch 1
                Tail 0
                Throw 1
                Unaligned 0
                Unbox 1
                Unbox_Any 1
                Volatile 0
                Xor 1
200 91.3242

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *

This blog is kept spam free by WP-SpamFree.