Frank Ihle
2016-05-11 17:26:28 UTC
Hello CUDA,
I try to speed up my Python program with a not so trivial algorithm, so
I need to know:*
*
What is the correct way of transferring a list of lists of floats to the
(Py)CUDA Kernel?
*An example*
given as example the following list
|listToProc =[[-1,-2,-3,-4,-5],[1,2,3,4,5,6,7,8.1,9]]|
it shall be transfered to a PyCUDA kernel for further processing. I
would then proceed with common functions to transfer a list of values
(not a list of lists) like this
|listToProcAr =np.array(listToProc,dtype=np.object)listToProcAr_gpu
=cuda.mem_alloc(listToProcAr.nbytes)cuda.memcpy_htod(listToProcAr_gpu,listToProcAr)|
*However this results in two problems:*
1) |listToProcAr.nbytes = 2| - i.e. too less memory is reserved. I
believe this can be solved by
|listBytes =0forcurrentList inListToProc:listBytes
+=np.array(currentList,dtype=np.float32).nbytes|
and replace the variable here
|listToProcAr_gpu =cuda.mem_alloc(listBytes)|
2) and the *actual problem*
|cuda.memcpy_htod(listToProcAr_gpu, listToProcAr)| still seems to create
a wrong pointer in the Kernel. Because when trying to access the last
element of the second list (listToProc[1][8]) raises an
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
So I'm a little bit clueless at the moment
------------------------------------------------------------------------
*The PyCUDA code*
|__global__ void procTheListKernel(float
**listOfLists){listOfLists[0][0]=0;listOfLists[1][8]=0;__syncthreads();}
Can anyone help me out? Kind Regards Frank |
I try to speed up my Python program with a not so trivial algorithm, so
I need to know:*
*
What is the correct way of transferring a list of lists of floats to the
(Py)CUDA Kernel?
*An example*
given as example the following list
|listToProc =[[-1,-2,-3,-4,-5],[1,2,3,4,5,6,7,8.1,9]]|
it shall be transfered to a PyCUDA kernel for further processing. I
would then proceed with common functions to transfer a list of values
(not a list of lists) like this
|listToProcAr =np.array(listToProc,dtype=np.object)listToProcAr_gpu
=cuda.mem_alloc(listToProcAr.nbytes)cuda.memcpy_htod(listToProcAr_gpu,listToProcAr)|
*However this results in two problems:*
1) |listToProcAr.nbytes = 2| - i.e. too less memory is reserved. I
believe this can be solved by
|listBytes =0forcurrentList inListToProc:listBytes
+=np.array(currentList,dtype=np.float32).nbytes|
and replace the variable here
|listToProcAr_gpu =cuda.mem_alloc(listBytes)|
2) and the *actual problem*
|cuda.memcpy_htod(listToProcAr_gpu, listToProcAr)| still seems to create
a wrong pointer in the Kernel. Because when trying to access the last
element of the second list (listToProc[1][8]) raises an
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
So I'm a little bit clueless at the moment
------------------------------------------------------------------------
*The PyCUDA code*
|__global__ void procTheListKernel(float
**listOfLists){listOfLists[0][0]=0;listOfLists[1][8]=0;__syncthreads();}
Can anyone help me out? Kind Regards Frank |