如何使用CUDA Fortran在结构中分配数组数组？

更新时间：2023-11-18 19:02:58

T_Device 。要使用主机端分配，首先要填充设备结构的主机存储器副本，然后将其复制到设备内存。这：

  type（mytypeDevice）:: T_Device（3）

 do i = 1,3 
 allocate（T_Device（i）％a（10））
 end do

将正常工作。这是基于C ++的CUDA代码中的非常标准设计模式，这里的原则是相同的。

With CUDA, I'm trying to allocate arrays in a structure, but I'm having an issue and I don't know why. So here is a short code (stored in a file called struct.cuf) that describe my problem. I'm compiling with the PGI 16.10 version, and I'm using the following options : -O3 -Mcuda=cc60 -tp=x64 struct.cuf -o struct_out

module structure
contains

type mytype
 integer :: alpha,beta,gamma
 real,dimension(:),pointer :: a
end type mytype

type mytypeDevice
 integer :: alpha,beta,gamma
 real,dimension(:),pointer,device :: a
end type mytypeDevice

end module structure

program main
 use cudafor
 use structure

 type(mytype) :: T(3)
 type(mytypeDevice),device :: T_Device(3)

 ! For the host
 do i=1,3
  allocate(T(i)%a(10))
 end do
 T(1)%a=1; T(2)%a=2; T(3)%a=3

 ! For the device
 print *, 'Everything from now is ok'
 do i=1,3
  allocate(T_Device(i)%a(10))
 end do
 !do i=1,3
 ! T_Device(i)%a=T(i)%a
 !end do

end program main

The output error :

 Everything from now is ok
Segmentation fault

What I am doing wrong here ?

The only solution I found (and working) is to stored the values in differents arrays and transfers them to the GPU, but it's very "Heavy". Mostly if I use a lot of structures like mytype.

EDIT : Code has been modified to use Vladimir F's solution. If I remove the device attribute from T_Device(3) declaration, then allocation seems ok and giving values too (commented lines below allocation). But I need that device attribute for T_Device(3), because I'm gonna use it in kernels.

Thanks !

The problem here is how you have declared T_Device. To use host side allocation you first populate a host memory copy of the device structure, and then copy it to device memory. This:

type(mytypeDevice) :: T_Device(3)

do i=1,3
  allocate(T_Device(i)%a(10))
 end do

will work correctly. This is a very standard design pattern in C++ based CUDA code, and the principle here is identical.

上一篇 : ：在spark.SQL DataFrame和pandas DataFrame之间转换下一篇 : 如何解决“NameError: name 'null' is not defined"?尝试在 Python 2.7 中导入任何模块时出错

如何使用CUDA Fortran在结构中分配数组数组？

相关阅读

推荐文章