archana sapkota
2017-04-25 19:49:38 UTC
Hello,
I just started working with PyCUDA. Basically whole CUDA is new to me. I
was trying to get to use the GPU to compute dot products of a large number
of vectors. Cause it was taking several days using multiple CPU cores.
But with my first try, I am sad that I did not see the boost in speed. Here
is a piece of code that I am currently running. This is just to see how
much speedup I will be getting. My vector of interest has a dimension of
around "3000". So eventually I will be computing dot product ( or L2 norm)
of those vectors.
I would highly appreciate if someone could suggest what I am missing and
how I could achieve my goal.
I also see some difference in results on numpy and on GPUs. Not as big a
concern right now but I am curious why.
Here is a sample code I m working with:
import pycuda.gpuarray as gpuarray
import pycuda.reduction as reduction
import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule
import numpy
import time
krnl = reduction.ReductionKernel(numpy.float32, neutral="0",
reduce_expr="a+b", map_expr="x[i]*y[i]",
arguments="float *x, float *y")
ssd = reduction.ReductionKernel(numpy.float32, neutral="0",
reduce_expr="a+b", map_expr="(x[i] - y[i])*(x[i] - y[i])",
arguments="float *x, float *y")
for i in range(10):
a = numpy.random.randn(3000)
b = numpy.random.randn(3000)
a_gpu = gpuarray.to_gpu(a.astype(numpy.float32))
b_gpu = gpuarray.to_gpu(b.astype(numpy.float32))
start = time.time()
numpy_dot = numpy.dot(a,b)
end = time.time()
dt = end - start
print ("CPU time", dt)
print ("numpy_dot", numpy_dot)
print ("numpy_euclid", numpy_ssd)
start = time.time()
my_dot_prod = krnl(a_gpu, b_gpu).get()
end = time.time()
dt = end - start
print ("GPU time", dt)
print ("my dot product", my_dot_prod)
print ("my euclid", my_euclid)
print ("\n")
Example timings are:
CPU time 5.9604644775390625e-06
numpy_dot -19.7736554062 <(773)%20655-4062>
numpy_ssd 5975.41368065
GPU time 0.0009388923645019531
my dot product -19.77365493774414
my ssd 5975.4140625
Thanks,
Arch
I just started working with PyCUDA. Basically whole CUDA is new to me. I
was trying to get to use the GPU to compute dot products of a large number
of vectors. Cause it was taking several days using multiple CPU cores.
But with my first try, I am sad that I did not see the boost in speed. Here
is a piece of code that I am currently running. This is just to see how
much speedup I will be getting. My vector of interest has a dimension of
around "3000". So eventually I will be computing dot product ( or L2 norm)
of those vectors.
I would highly appreciate if someone could suggest what I am missing and
how I could achieve my goal.
I also see some difference in results on numpy and on GPUs. Not as big a
concern right now but I am curious why.
Here is a sample code I m working with:
import pycuda.gpuarray as gpuarray
import pycuda.reduction as reduction
import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule
import numpy
import time
krnl = reduction.ReductionKernel(numpy.float32, neutral="0",
reduce_expr="a+b", map_expr="x[i]*y[i]",
arguments="float *x, float *y")
ssd = reduction.ReductionKernel(numpy.float32, neutral="0",
reduce_expr="a+b", map_expr="(x[i] - y[i])*(x[i] - y[i])",
arguments="float *x, float *y")
for i in range(10):
a = numpy.random.randn(3000)
b = numpy.random.randn(3000)
a_gpu = gpuarray.to_gpu(a.astype(numpy.float32))
b_gpu = gpuarray.to_gpu(b.astype(numpy.float32))
start = time.time()
numpy_dot = numpy.dot(a,b)
end = time.time()
dt = end - start
print ("CPU time", dt)
print ("numpy_dot", numpy_dot)
print ("numpy_euclid", numpy_ssd)
start = time.time()
my_dot_prod = krnl(a_gpu, b_gpu).get()
end = time.time()
dt = end - start
print ("GPU time", dt)
print ("my dot product", my_dot_prod)
print ("my euclid", my_euclid)
print ("\n")
Example timings are:
CPU time 5.9604644775390625e-06
numpy_dot -19.7736554062 <(773)%20655-4062>
numpy_ssd 5975.41368065
GPU time 0.0009388923645019531
my dot product -19.77365493774414
my ssd 5975.4140625
Thanks,
Arch