VexCL
This is the first post in my C++ accelerator library series. It is about VexCL, a header-only C++ library for accelerator programming, developed by Denis Demidov and others. The library is released under the MIT licence.
VexCL supports OpenCL and CUDA as accelerator backends. For the CUDA backend it is important to note that the CUDA SDK must be installed on systems running VexCL code, because kernel code is generated and compiled at runtime.
The central memory abstraction concept in VexCL is a vex::vector
Concurrent execution of multiple kernels or of copy and kernel operations is partially supported by VexCL. Copy operations are synchronous by default but can be configured to be asynchronous. Each container has associated command_queues that are used to enqueue operations. A command_queue has a finish() method that blocks until all commands in the queue have completed. This is not the most elegant way to handle concurrency, but VexCL does not abstract away the option for parallel execution of operations, which is nice.
VexCL supports numerous parallel primitives such as inclusive_scan, exclusive_scan, sort, sort_by_key and reduce_by_key. But the core functionality of VexCL is its kernel generation from vector expression mechanism. If X, Y and Z are a vex::vector
X = 2 * Y - sin(Z);
generates a single kernel that is automatically executed on all accelerators that the vectors occupy:
kernel void vexcl_vector_kernel( ulong n, global double * prm_1, int prm_2, global double * prm_3, global double * prm_4) { for(size_t i = get_global_id(0); i < n; i += get_global_size(0)) { prm_1[i] = ( ( prm_2 * prm_3[i] ) - sin( prm_4[i] ) ); } }
This expression template mechanism has many features, including numerous built-ins, support for constants, access to vector indices, inclusion of user defined functions, tagging of data to avoid reading memory more than once, temporary values, random numbers, permutations, slicing, reducing, reshaping, scattered data interpolation and fast Fourier transform. A performance improvement can be expected from expression template generated kernels, since such fused kernels save on memory reads and writes over individual calls to BLAS functions.
Programmers may also generate kernels by feeding a vex::symbolic
VexCL is inter-operable with ViennaCL, Boost.Compute and CLOGS. The library is available on github, the documentation is available here.
Denis Demidov mentions performance results of VexCL are published in one of his papers and are included in one of his recent talks.
I would categorize VexCL as a high productivity prototyping library that can also be used in scientific production code. It is particularly well suited to implement numerical analysis. I see minor problems when it comes to kernel caching, the fact that the CUDA SDK is required and the lack of elegant concurrency constructs. Nevertheless, the number of features in VexCL is enormous. VexCL is an excellent example of how sophisticated C++ programming techniques can lead to efficient code as well as a beautiful and concise interface.
4 Comments
Jim, thank you for your comment. You are right of course. On twitter @relativetoyou also asked for more examples and performance measurements. I think with regards to performance measurements, I suspect there are no big surprises when using VexCL. If might of course be interesting to do a platform comparison, but that is for now out of the scope of my reviews.
Edit: by “no big surprises” with regards to performance I meant to say that VexCL will most probably not reduce performance, so no negative surprises.
Some performance results may be found in [1] (already cited in the comments to the parent post) and [2].
[1] Programming CUDA and OpenCL: A Case Study Using Modern C++ Libraries, http://arxiv.org/abs/1212.6326
[2] VexCL – a Vector Expression Template Library for OpenCL, https://speakerdeck.com/ddemidov/vexcl-at-pecos-university-of-texas-2013
Thanks Denis, I updated the review and link to your paper as well as slides.
Sorry, the comment form is closed at this time.
Great write up and overall coverage. Would be great to seem more detailed coverage but this is very helpful and a great start to a series.