[PyCUDA] dynamic parallelism compilation

Bruno Villasenor

2015-05-31 01:58:32 UTC

Hello pycuda-users:

Iâm trying to compile the simplest code that uses dynamic parallelism using
the regular SorceModule, my code:

------------------------------------------------------------------------
import numpy as np
import pycuda.driver as cuda
from pycuda.compiler import SourceModule
import pycuda.autoinit

cudaCodeString = """
__global__ void ChildKernel(void* data){
//Operate on data
}

__global__ void ParentKernel(void *data){
if (threadIdx.x == 0) {
ChildKernel<<<1, 32>>>(data);
cudaThreadSynchronize();
}
__syncthreads();
//Operate on data
}
"""
cudaCode = SourceModule(cudaCodeString, options=['-rdc=true' ,'-lcudart' ],
arch='compute_35' )

-------------------------------------------------------------------------------

I get the next error:
---------------------------------------------------------------------------------
pycuda.driver.CompileError: nvcc compilation of /tmp/tmpJJo9kU/kernel.cu
failed
[command: nvcc --cubin -rdc=true -lcudart -arch compute_35 -I/usr
/local/lib/python2.7/dist-packages/pycuda-2014.1-py2.7-linux-x86_64.egg/
pycuda/cuda kernel.cu]
[stderr:
nvcc fatal : Option '--cubin (-cubin)' is not allowed when compiling for
a virtual compute architecture

-----------------------------------------------------------------------------------

CUDA version:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2013 NVIDIA Corporation
Built on Wed_Jul_17_18:36:13_PDT_2013
Cuda compilation tools, release 5.5, V5.5.0

Driver version: 331.38

--------------------------------------------------------------------------------------

Any ideas?
Is anyone successfully using dynamic parallelism with pycuda?

Thanks in advance.
Bruno