archana sapkota

2017-04-25 19:49:38 UTC

Hello,

I just started working with PyCUDA. Basically whole CUDA is new to me. I

was trying to get to use the GPU to compute dot products of a large number

of vectors. Cause it was taking several days using multiple CPU cores.

But with my first try, I am sad that I did not see the boost in speed. Here

is a piece of code that I am currently running. This is just to see how

much speedup I will be getting. My vector of interest has a dimension of

around "3000". So eventually I will be computing dot product ( or L2 norm)

of those vectors.

I would highly appreciate if someone could suggest what I am missing and

how I could achieve my goal.

I also see some difference in results on numpy and on GPUs. Not as big a

concern right now but I am curious why.

Here is a sample code I m working with:

import pycuda.gpuarray as gpuarray

import pycuda.reduction as reduction

import pycuda.driver as cuda

import pycuda.autoinit

from pycuda.compiler import SourceModule

import numpy

import time

krnl = reduction.ReductionKernel(numpy.float32, neutral="0",

reduce_expr="a+b", map_expr="x[i]*y[i]",

arguments="float *x, float *y")

ssd = reduction.ReductionKernel(numpy.float32, neutral="0",

reduce_expr="a+b", map_expr="(x[i] - y[i])*(x[i] - y[i])",

arguments="float *x, float *y")

for i in range(10):

a = numpy.random.randn(3000)

b = numpy.random.randn(3000)

a_gpu = gpuarray.to_gpu(a.astype(numpy.float32))

b_gpu = gpuarray.to_gpu(b.astype(numpy.float32))

start = time.time()

numpy_dot = numpy.dot(a,b)

end = time.time()

dt = end - start

print ("CPU time", dt)

print ("numpy_dot", numpy_dot)

print ("numpy_euclid", numpy_ssd)

start = time.time()

my_dot_prod = krnl(a_gpu, b_gpu).get()

end = time.time()

dt = end - start

print ("GPU time", dt)

print ("my dot product", my_dot_prod)

print ("my euclid", my_euclid)

print ("\n")

Example timings are:

CPU time 5.9604644775390625e-06

numpy_dot -19.7736554062 <(773)%20655-4062>

numpy_ssd 5975.41368065

GPU time 0.0009388923645019531

my dot product -19.77365493774414

my ssd 5975.4140625

Thanks,

Arch

I just started working with PyCUDA. Basically whole CUDA is new to me. I

was trying to get to use the GPU to compute dot products of a large number

of vectors. Cause it was taking several days using multiple CPU cores.

But with my first try, I am sad that I did not see the boost in speed. Here

is a piece of code that I am currently running. This is just to see how

much speedup I will be getting. My vector of interest has a dimension of

around "3000". So eventually I will be computing dot product ( or L2 norm)

of those vectors.

I would highly appreciate if someone could suggest what I am missing and

how I could achieve my goal.

I also see some difference in results on numpy and on GPUs. Not as big a

concern right now but I am curious why.

Here is a sample code I m working with:

import pycuda.gpuarray as gpuarray

import pycuda.reduction as reduction

import pycuda.driver as cuda

import pycuda.autoinit

from pycuda.compiler import SourceModule

import numpy

import time

krnl = reduction.ReductionKernel(numpy.float32, neutral="0",

reduce_expr="a+b", map_expr="x[i]*y[i]",

arguments="float *x, float *y")

ssd = reduction.ReductionKernel(numpy.float32, neutral="0",

reduce_expr="a+b", map_expr="(x[i] - y[i])*(x[i] - y[i])",

arguments="float *x, float *y")

for i in range(10):

a = numpy.random.randn(3000)

b = numpy.random.randn(3000)

a_gpu = gpuarray.to_gpu(a.astype(numpy.float32))

b_gpu = gpuarray.to_gpu(b.astype(numpy.float32))

start = time.time()

numpy_dot = numpy.dot(a,b)

end = time.time()

dt = end - start

print ("CPU time", dt)

print ("numpy_dot", numpy_dot)

print ("numpy_euclid", numpy_ssd)

start = time.time()

my_dot_prod = krnl(a_gpu, b_gpu).get()

end = time.time()

dt = end - start

print ("GPU time", dt)

print ("my dot product", my_dot_prod)

print ("my euclid", my_euclid)

print ("\n")

Example timings are:

CPU time 5.9604644775390625e-06

numpy_dot -19.7736554062 <(773)%20655-4062>

numpy_ssd 5975.41368065

GPU time 0.0009388923645019531

my dot product -19.77365493774414

my ssd 5975.4140625

Thanks,

Arch