且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

性能:boost.compute v.s. opencl c ++ wrapper

更新时间:2022-06-27 02:40:38

transform() Boost.Compute中的函数应该与在C ++包装器版本中使用的内核代码几乎相同(虽然Boost.Compute会做一些展开)。

The kernel code generated by the transform() function in Boost.Compute should be almost identical to the kernel code you use in the C++ wrapper version (though Boost.Compute will do some unrolling).

看到的时间差异是,在第一个版本中,你只是测量将内核排入内核并将结果映射回主机所需的时间。在Boost.Compute版本中,您还要测量创建 transform()内核所需的时间,编译它,然后执行它。如果你想要一个更现实的比较,你应该测量第一个例子的总执行时间,包括设置和编译OpenCL程序所需的时间。

The reason you see a difference in timings is that in the first version you are only measuring the time it takes to enqueue the kernel and map the results back to the host. In the Boost.Compute version you are also measuring the amount of time it takes to create the transform() kernel, compile it, and then execute it. If you want a more realistic comparison you should measure the total execution time for the first example including the time it takes to set up and compile the OpenCL program.

(这是OpenCL的运行时编译模型中固有的)在Boost.Compute中通过在运行时自动缓存编译的内核(也可选择将其离线缓存,以便下次运行程序时重用)来缓解。多次调用 transform()将在第一次调用后快得多。

This initialization penalty (which is inherent in OpenCL's run-time compilation model) is somewhat mitigated in Boost.Compute by automatically caching compiled kernels during run-time (and also optionally caching them offline for reuse the next time the program is run). Calling transform() multiple times will be much faster after the first invocation.

您也可以只使用Boost.Compute中的核心封装类(如 device context )以及容器类(如 矢量< T> ),并仍然运行您自己的自定义内核。

P.S. You can also just use the core wrapper classes in Boost.Compute (like device and context) along with the container classes (like vector<T>) and still run your own custom kernels.