Here is the latest on Campy:
- The base class layer for the GPU has been fixed so that there are no longer any global variables. C++ global variables are initialized when Campy JITs a kernel. In order to avoid re-initializing the base class layer, all but one C++ global variables are now placed in a structure. The runtime is now initialized only once, a critical performance improvement.
- An API for managing GPU/CPU memory synchronization has been added. Previously, after each Parallel.For() call, memory was copied back to C# data structures on the CPU. With an explicit API to synchronize memory copying, certain algorithms, e.g., FFT and Reduction, which nest Parallel.For() calls with another loop, are now much faster.
- I have spent a lot of time making changes to Swigged.LLVM, which is used by Campy. Swigged.LLVM is now fully built automatically in Appveyor. And, it has been updated to the latest release of LLVM, version 6.0.
- I will be making a release of Campy in the next month if all goes well.