且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

CUDA/CUDA Thrust 中的多态性和派生类

更新时间:2022-05-20 23:53:08

我不会尝试回答这个问题的所有内容,它太大了.话虽如此,以下是对您发布的代码的一些观察,可能会有所帮助:

I am not going to attempt to answer everything in this question, it is just too large. Having said that here are some observations about the code you posted which might help:

  • GPU 端的 new 运算符从私有运行时堆中分配内存.从 CUDA 6 开始,主机端 CUDA API 无法访问该内存.您可以从内核和设备函数中访问内存,但主机无法访问该内存.因此,在推力设备函子中使用 new 是一个永远无法工作的破碎设计.这就是您的指针向量"模型失败的原因.
  • Thrust 的根本目的是允许将典型 STL 算法的数据并行版本应用于 POD 类型.使用复杂的多态对象构建代码库并尝试通过 Thrust 容器和算法填充它们可能会起作用,但这不是 Thrust 的设计目的,我不会推荐它.如果您以意想不到的方式打破推力,请不要感到惊讶.
  • CUDA 支持许多 C++ 功能,但编译和对象模型甚至比它们所基于的 C++98 标准要简单得多.CUDA 缺少一些使复杂的多态对象设计在 C++ 中可行的关键特性(例如 RTTI).我的建议是谨慎使用 C++ 功能.仅仅因为您可以在 CUDA 中做某事并不意味着您应该这样做.GPU 是一种简单的架构,简单的数据结构和代码几乎总是比功能相似的复杂对象具有更高的性能.
  • The GPU side new operator allocates memory from a private runtime heap. As of CUDA 6, that memory cannot be accessed by the host side CUDA APIs. You can access the memory from within kernels and device functions, but that memory cannot be accessed by the host. So using new inside a thrust device functor is a broken design that can never work. That is why your "vector of pointers" model fails.
  • Thrust is fundamentally intended to allow data parallel versions of typical STL algorithms to be applied to POD types. Building a codebase using complex polymorphic objects and trying to cram those through Thrust containers and algorithms might be made to work, but it isn't what Thrust was designed for, and I wouldn't recommend it. Don't be surprised if you break thrust in unexpected ways if you do.
  • CUDA supports a lot of C++ features, but the compilation and object models are much simpler than even the C++98 standard upon which they are based. CUDA lacks several key features (RTTI for example) which make complex polymorphic object designs workable in C++. My suggestion is use C++ features sparingly. Just because you can do something in CUDA doesn't mean you should. The GPU is a simple architecture and simple data structures and code are almost always more performant than functionally similar complex objects.

浏览您发布的代码后,我的总体建议是回到绘图板上.如果你想看一些非常优雅的 CUDA/C++ 设计,花点时间阅读 CUB的代码库> 和 CUSP.它们都非常不同,但都可以从中学到很多东西(我怀疑 CUSP 是建立在 Thrust 之上的,这使得它与您的用例更加相关).

Having skim read the code you posted, my overall recommendation is to go back to the drawing board. If you want to look at some very elegant CUDA/C++ designs, spend some time reading the code bases of CUB and CUSP. They are both very different, but there is a lot to learn from both (and CUSP is built on top of Thrust, which makes it even more relevant to your usage case, I suspect).