且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

cuda中的cuda函数应用

更新时间:2023-11-10 18:06:52

两种可能的方法是:


  1. 编写您自己的CUDA内核以执行操作

  2. 使用推荐(例如 thrust :: for_each())。

  1. Write your own CUDA kernel to perform the operation
  2. Use thrust (e.g. thrust::for_each() ).

下面是两种方法的工作示例:

Here is a worked example of both approaches:

$ cat t934.cu
#include <iostream>
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/copy.h>
#include <thrust/for_each.h>

#define DSIZE 4

#define nTPB 256

template <typename T>
__host__ __device__ T myfunc(T &d){

  return d + 5;  // define your own function here
}

struct mytfunc
{
template <typename T>
__host__ __device__
 void operator()(T &d){

  d = myfunc(d);
  }
};

template <typename T>
__global__ void mykernel(T *dvec, size_t dsize){

  int idx = threadIdx.x+blockDim.x*blockIdx.x;
  if (idx < dsize) dvec[idx] = myfunc(dvec[idx]);
}

int main(){

  // first using kernel
  float *h_data, *d_data;
  h_data = new float[DSIZE];
  cudaMalloc(&d_data, DSIZE*sizeof(float));
  for (int i = 0; i < DSIZE; i++) h_data[i] = i;
  cudaMemcpy(d_data, h_data, DSIZE*sizeof(float), cudaMemcpyHostToDevice);
  mykernel<<<(DSIZE+nTPB-1)/nTPB,nTPB>>>(d_data, DSIZE);
  cudaMemcpy(h_data, d_data, DSIZE*sizeof(float), cudaMemcpyDeviceToHost);
  for (int i = 0; i < DSIZE; i++) std::cout << h_data[i] << ",";
  std::cout << std::endl;

  // then using thrust
  thrust::host_vector<float>   hvec(h_data, h_data+DSIZE);
  thrust::device_vector<float> dvec = hvec;
  thrust::for_each(dvec.begin(), dvec.end(), mytfunc());
  thrust::copy_n(dvec.begin(), DSIZE, std::ostream_iterator<float>(std::cout, ","));
  std::cout << std::endl;
}

$ nvcc -o t934 t934.cu
$ ./t934
5,6,7,8,
10,11,12,13,
$

请注意,为了提供完整的示例,从主机存储器中的向量定义开始。如果你已经有了设备内存中的向量(可能是由于计算y = Ax),那么你可以通过将该向量传递给CUDA内核,或直接在推力函数中使用它,使用 thrust :: device_ptr wrapper(此方法在之前链接的推力快速入门指南中介绍。)

Note that in order to provide a complete example, I'm starting with a vector definition in host memory. If you already have the vector in device memory (perhaps as a result of computing y=Ax) then you can work directly on that, by passing that vector to the CUDA kernel, or using it directly in the thrust function, using a thrust::device_ptr wrapper (this method is covered in the thrust quick start guide previously linked.)

ve在这里是你想使用一个变量的任意函数。这应该处理在 myfunc 中定义的几乎任意的函数。但是,对于您可能感兴趣的某些类别的功能,您也可以实现一个或多个CUBLAS调用。

The assumption I've made here is you want to use an arbitrary function of one variable. This should handle pretty much arbitrary functions defined in myfunc. However, for some categories of functions that you may be interested in, you may be able to realize it one or more CUBLAS calls as well.