Fil Peters
2015-02-17 16:28:31 UTC
Hello,
I am just new to pycuda and started testing it. I was wondering if it is possible to use the gpuarray functions in a sourcemodule.
For example, I was trying to covert the following code into a pycuda sourcemodule:
numpy code:
fac1=np.float32(0.5)
fac2=np.float32(1.001)
for i in range(niter):
c = a
d = b
a = (c + d)*fac1
b = c*fac2
f += a*b + a*a
k = np.dot(a,b)
Until the line "f += a*b + a*a" it works well:
mod = SourceModule("""
__global__ void vecmul(float *dest, float *in1, float *in2, float *in3,
float *in4, int niter)
{
const int i = blockDim.x*blockIdx.x + threadIdx.x;
for(int n = 0; n < niter; n++) {
in3[i] = in1[i];
in4[i] = in2[i];
in1[i] = (in3[i]+in4[i])*0.5 ;
in2[i] = in3[i]*1.001;
dest[i] += in1[i] * in2[i] + in1[i]*in1[i];
}
}
""")
'of course I realize that the dot product is not very useful in this loop, but in my final program I will need to reuse this value in the loop).
So for this specific case the question is how to incorporate the function gpuarray.dot() in the code, or if that is not possible how to include a reduction kernel in the sourcemodule.
many thanks in advance,
Fil
_______________________________________________
PyCUDA mailing list
I am just new to pycuda and started testing it. I was wondering if it is possible to use the gpuarray functions in a sourcemodule.
For example, I was trying to covert the following code into a pycuda sourcemodule:
numpy code:
fac1=np.float32(0.5)
fac2=np.float32(1.001)
for i in range(niter):
c = a
d = b
a = (c + d)*fac1
b = c*fac2
f += a*b + a*a
k = np.dot(a,b)
Until the line "f += a*b + a*a" it works well:
mod = SourceModule("""
__global__ void vecmul(float *dest, float *in1, float *in2, float *in3,
float *in4, int niter)
{
const int i = blockDim.x*blockIdx.x + threadIdx.x;
for(int n = 0; n < niter; n++) {
in3[i] = in1[i];
in4[i] = in2[i];
in1[i] = (in3[i]+in4[i])*0.5 ;
in2[i] = in3[i]*1.001;
dest[i] += in1[i] * in2[i] + in1[i]*in1[i];
}
}
""")
'of course I realize that the dot product is not very useful in this loop, but in my final program I will need to reuse this value in the loop).
So for this specific case the question is how to incorporate the function gpuarray.dot() in the code, or if that is not possible how to include a reduction kernel in the sourcemodule.
many thanks in advance,
Fil
_______________________________________________
PyCUDA mailing list