Zhangsheng Lai
2018-04-19 01:34:08 UTC
I'm encountering this error as I run my code on the same docker environment
but on different workstations.
```
Traceback (most recent call last):
File "simple_peer.py", line 76, in <module>
tslr_gpu, lr_gpu = mp.initialise()
File "/root/distributed-mpp/naive/mccullochpitts.py", line 102, in
initialise
""", arch='sm_60')
File "/root/anaconda3/lib/python3.6/site-packages/pycuda/compiler.py",
line 294, in __init__
self.module = module_from_buffer(cubin)
pycuda._driver.LogicError: cuModuleLoadDataEx failed: device kernel image
is invalid -
```
I did a quick search and only found this :
https://github.com/inducer/pycuda/issues/45 , but it doesn't seem to be
relevant to my problem as it runs on my initial workstation. Can anyone see
what is the issue?
Below is my code that I'm trying to run:
```
def initialise(self):
"""
Documentation here
"""
mod = SourceModule("""
#include <math.h>
__global__ void initial(float *tslr_out, float *lr_out, float
*W_gpu,\
float *b_gpu, int *x_gpu, int d, float temp)
{
int tx = threadIdx.x;
// Wx stores the W_ji x_i product value
float Wx = 0;
// Matrix multiplication of W and x
for (int k=0; k<d; ++k)
{
float W_element = W_gpu[tx * d + k];
float x_element = x_gpu[k];
Wx += W_element * x_element;
}
// Computing the linear response, signed linear response with
temp
lr_out[tx] = Wx + b_gpu[tx];
tslr_out[tx] = (0.5/temp) * (1 - 2*x_gpu[tx])* (Wx + b_gpu[tx]);
}
""", arch='sm_60')
func = mod.get_function("initial")
# format for prepare defined at
https://docs.python.org/2/library/struct.html
func.prepare("PPPPPif")
dsize_nparray = np.zeros((self.d,), dtype = np.float32)
lr_gpu = cuda.mem_alloc(dsize_nparray.nbytes)
slr_gpu = cuda.mem_alloc(dsize_nparray.nbytes)
tslr_gpu = cuda.mem_alloc(dsize_nparray.nbytes)
grid=(1,1)
block=(self.d,1,1)
# block=(self.d,self.d,1)
func.prepared_call(grid, block, tslr_gpu, lr_gpu, self.W_gpu, \
self.b_gpu, self.x_gpu, self.d, self.temp)
return tslr_gpu, lr_gpu
```
but on different workstations.
```
Traceback (most recent call last):
File "simple_peer.py", line 76, in <module>
tslr_gpu, lr_gpu = mp.initialise()
File "/root/distributed-mpp/naive/mccullochpitts.py", line 102, in
initialise
""", arch='sm_60')
File "/root/anaconda3/lib/python3.6/site-packages/pycuda/compiler.py",
line 294, in __init__
self.module = module_from_buffer(cubin)
pycuda._driver.LogicError: cuModuleLoadDataEx failed: device kernel image
is invalid -
```
I did a quick search and only found this :
https://github.com/inducer/pycuda/issues/45 , but it doesn't seem to be
relevant to my problem as it runs on my initial workstation. Can anyone see
what is the issue?
Below is my code that I'm trying to run:
```
def initialise(self):
"""
Documentation here
"""
mod = SourceModule("""
#include <math.h>
__global__ void initial(float *tslr_out, float *lr_out, float
*W_gpu,\
float *b_gpu, int *x_gpu, int d, float temp)
{
int tx = threadIdx.x;
// Wx stores the W_ji x_i product value
float Wx = 0;
// Matrix multiplication of W and x
for (int k=0; k<d; ++k)
{
float W_element = W_gpu[tx * d + k];
float x_element = x_gpu[k];
Wx += W_element * x_element;
}
// Computing the linear response, signed linear response with
temp
lr_out[tx] = Wx + b_gpu[tx];
tslr_out[tx] = (0.5/temp) * (1 - 2*x_gpu[tx])* (Wx + b_gpu[tx]);
}
""", arch='sm_60')
func = mod.get_function("initial")
# format for prepare defined at
https://docs.python.org/2/library/struct.html
func.prepare("PPPPPif")
dsize_nparray = np.zeros((self.d,), dtype = np.float32)
lr_gpu = cuda.mem_alloc(dsize_nparray.nbytes)
slr_gpu = cuda.mem_alloc(dsize_nparray.nbytes)
tslr_gpu = cuda.mem_alloc(dsize_nparray.nbytes)
grid=(1,1)
block=(self.d,1,1)
# block=(self.d,self.d,1)
func.prepared_call(grid, block, tslr_gpu, lr_gpu, self.W_gpu, \
self.b_gpu, self.x_gpu, self.d, self.temp)
return tslr_gpu, lr_gpu
```