Discussion:
[PyCUDA] pycuda and intersection of list (or sets) of strings
Luigi Assom
2014-12-20 09:33:57 UTC
Permalink
Hello!

I need to parallelize a computation of intersection of sets of keywords
over GPU .

As example, I will take a cosine similarity computing the intersection
between two sets.
(see also post:
http://stackoverflow.com/questions/22381939/python-calculate-cosine-similarity-of-two-dicts-faster
)

I want to compute the similiarity, for each key value pairs of large
dictionaries.

The value of a key is indeed a set of thousands of elements, and they can
be strings.

Using multiprocessing I was able to improve by 4x, but i would like to try
out GPU for really speed up the computation.

in the source module, i actually don't know how to declare my parameters
cause they are not float and i haven't found a tutorial using other data
structures than numerical arrays with numpy.
That's why I was I converted my lists of keywords in np.asarray() and I
have tried the following:



# convert list of strings into numpy array
key1 = 'key1'
array1 = np.asarray(D[key1])

# convert list of strings into numpy array
array2 = np.asarray(D[key2])

# assign memory to cuda

array1_cuda = cuda.mem_alloc(sys.getsizeof(array1))
array2_cuda = cuda.mem_alloc(sys.getsizeof(array2))

# and tried

mod = SourceModule("""
__global__ void cosine(*a, *b)
{
int idx = threadIdx.x + threadIdx.y*4;
proxy =
len(set(a[idx])&set(b[idx]))/math.sqrt(len(set(a[idx]))*len(set(b[idx])))

}
""")



a_gpu = gpuarray.to_gpu(array1)
b_gpu = gpuarray.to_gpu(array2)

proxy =
len(set(a_gpu)&set(b_gpu))/math.sqrt(len(set(a_gpu))*len(set(b_gpu)))




but I get

TypeError: GPUArrays are not hashable.


Is it a problem of data structure, or am I following a conceptual mistake ?


with multiprocessing (without pyCuda) my code is:

## Measuring Performance: 4x !
with Timer() as t:
key = 'key1'
setParent = D[key]
ngbrProxy = set([])
p = Pool()
for ngbr in p.imap_unordered(cosine,setParent):
ngbrProxy.add(ngbr)

print "=> elasped lpush: %s s" % t.secs

I wonder how I could exploit the GPU for this type of computation: I am not
working with numerical matrixes; on the documentation of pyCuda i read it
is possibile to assign any type of data structures, even str, but I
couldn't find an example.

Could you please help in working this out ?
Andreas Kloeckner
2014-12-21 20:44:19 UTC
Permalink
Luigi,

here are a few problems with your approach:

- The contents of your SourceModule is not valid C (as in, C the
programming language)

- 'set' is a Python data structure. PyCUDA will not magically swap out
the code of 'set' and execute its operations on the GPU.

- Working with arrays of variable-size objects (such as strings) on the
GPU is somewhat tricky. You'll have to come up with a good data
structure. In particular, just copying over a Python data structure
will not help--if it succeeds, the pointers in the structure will
point to CPU memory and be entirely useless on the GPU.

Andreas
Post by Luigi Assom
I need to parallelize a computation of intersection of sets of keywords
over GPU .
As example, I will take a cosine similarity computing the intersection
between two sets.
http://stackoverflow.com/questions/22381939/python-calculate-cosine-similarity-of-two-dicts-faster
)
I want to compute the similiarity, for each key value pairs of large
dictionaries.
The value of a key is indeed a set of thousands of elements, and they can
be strings.
Using multiprocessing I was able to improve by 4x, but i would like to try
out GPU for really speed up the computation.
in the source module, i actually don't know how to declare my parameters
cause they are not float and i haven't found a tutorial using other data
structures than numerical arrays with numpy.
That's why I was I converted my lists of keywords in np.asarray() and I
# convert list of strings into numpy array
key1 = 'key1'
array1 = np.asarray(D[key1])
# convert list of strings into numpy array
array2 = np.asarray(D[key2])
# assign memory to cuda
array1_cuda = cuda.mem_alloc(sys.getsizeof(array1))
array2_cuda = cuda.mem_alloc(sys.getsizeof(array2))
# and tried
mod = SourceModule("""
__global__ void cosine(*a, *b)
{
int idx = threadIdx.x + threadIdx.y*4;
proxy =
len(set(a[idx])&set(b[idx]))/math.sqrt(len(set(a[idx]))*len(set(b[idx])))
}
""")
a_gpu = gpuarray.to_gpu(array1)
b_gpu = gpuarray.to_gpu(array2)
proxy =
len(set(a_gpu)&set(b_gpu))/math.sqrt(len(set(a_gpu))*len(set(b_gpu)))
but I get
TypeError: GPUArrays are not hashable.
Is it a problem of data structure, or am I following a conceptual mistake ?
## Measuring Performance: 4x !
key = 'key1'
setParent = D[key]
ngbrProxy = set([])
p = Pool()
ngbrProxy.add(ngbr)
print "=> elasped lpush: %s s" % t.secs
I wonder how I could exploit the GPU for this type of computation: I am not
working with numerical matrixes; on the documentation of pyCuda i read it
is possibile to assign any type of data structures, even str, but I
couldn't find an example.
Could you please help in working this out ?
_______________________________________________
PyCUDA mailing list
http://lists.tiker.net/listinfo/pycuda
Luigi Assom
2014-12-21 21:28:38 UTC
Permalink
Hello Andreas,

thank you for your feedback:

Which prerequisite must have a data structure to be good for GPU?
Should I allocate exact size of memory for each array ?

Is it ok to use numpy data stractures to compute arrays and execute
operations on the GPU, instead of python 'set' data structure?
e.g. np.intersect1d(['a','beta','gamma'],['gamma','delta','omega'])
could it be parallelized ?

as approch, is it better to try to parallelize the operation of
intersection between two keys in a dictionary,
or rather import the whole dictionary (or a partition of it) in the GPU ?
Post by Andreas Kloeckner
Luigi,
- The contents of your SourceModule is not valid C (as in, C the
programming language)
- 'set' is a Python data structure. PyCUDA will not magically swap out
the code of 'set' and execute its operations on the GPU.
- Working with arrays of variable-size objects (such as strings) on the
GPU is somewhat tricky. You'll have to come up with a good data
structure. In particular, just copying over a Python data structure
will not help--if it succeeds, the pointers in the structure will
point to CPU memory and be entirely useless on the GPU.
Andreas
Post by Luigi Assom
I need to parallelize a computation of intersection of sets of keywords
over GPU .
As example, I will take a cosine similarity computing the intersection
between two sets.
http://stackoverflow.com/questions/22381939/python-calculate-cosine-similarity-of-two-dicts-faster
Post by Luigi Assom
)
I want to compute the similiarity, for each key value pairs of large
dictionaries.
The value of a key is indeed a set of thousands of elements, and they can
be strings.
Using multiprocessing I was able to improve by 4x, but i would like to
try
Post by Luigi Assom
out GPU for really speed up the computation.
in the source module, i actually don't know how to declare my parameters
cause they are not float and i haven't found a tutorial using other data
structures than numerical arrays with numpy.
That's why I was I converted my lists of keywords in np.asarray() and I
# convert list of strings into numpy array
key1 = 'key1'
array1 = np.asarray(D[key1])
# convert list of strings into numpy array
array2 = np.asarray(D[key2])
# assign memory to cuda
array1_cuda = cuda.mem_alloc(sys.getsizeof(array1))
array2_cuda = cuda.mem_alloc(sys.getsizeof(array2))
# and tried
mod = SourceModule("""
__global__ void cosine(*a, *b)
{
int idx = threadIdx.x + threadIdx.y*4;
proxy =
len(set(a[idx])&set(b[idx]))/math.sqrt(len(set(a[idx]))*len(set(b[idx])))
}
""")
a_gpu = gpuarray.to_gpu(array1)
b_gpu = gpuarray.to_gpu(array2)
proxy =
len(set(a_gpu)&set(b_gpu))/math.sqrt(len(set(a_gpu))*len(set(b_gpu)))
but I get
TypeError: GPUArrays are not hashable.
Is it a problem of data structure, or am I following a conceptual
mistake ?
Post by Luigi Assom
## Measuring Performance: 4x !
key = 'key1'
setParent = D[key]
ngbrProxy = set([])
p = Pool()
ngbrProxy.add(ngbr)
print "=> elasped lpush: %s s" % t.secs
I wonder how I could exploit the GPU for this type of computation: I am
not
Post by Luigi Assom
working with numerical matrixes; on the documentation of pyCuda i read it
is possibile to assign any type of data structures, even str, but I
couldn't find an example.
Could you please help in working this out ?
_______________________________________________
PyCUDA mailing list
http://lists.tiker.net/listinfo/pycuda
--
Luigi Assom

Skype contact: oggigigi
Andreas Kloeckner
2014-12-21 22:07:45 UTC
Permalink
Post by Luigi Assom
Hello Andreas,
Which prerequisite must have a data structure to be good for GPU?
Should I allocate exact size of memory for each array ?
I hate to say it, but let me just state two facts: (1) There's no canned
functionality for what you'd like to accomplish (intersection of string
lists, IIUC). You'd have to piece this together out of parallel
primitives (scan/sort) or just handwritten (in C) kernels. String
sorting on the GPU is something that people publish legitimate research
papers on. (2) Learning to do these things (unfortunately) takes a
considerable time investment--think months. So please understand that
the friendly folks on the mailing list won't be able to give you a quick
tutorial, or even provide you with an 'easy answer'.

Sorry,
Andreas
Luigi Assom
2014-12-21 23:57:05 UTC
Permalink
uuuhhh that's unfortunate.

I don't have expertise in that, I was trying to research on the web and
eventually found :
http://atbrox.com/2010/08/20/word-count-with-mapreduce-on-a-gpu-a-python-example/

that sounded promsing to me, but as I said I don't have expertise in this
to properly judge.

thank you anyway to you and the community.
however, if someone is researching on this topic (intersection of lists of
keywords), could please let the community informed ?

thank you so much!
Post by Andreas Kloeckner
Post by Luigi Assom
Hello Andreas,
Which prerequisite must have a data structure to be good for GPU?
Should I allocate exact size of memory for each array ?
I hate to say it, but let me just state two facts: (1) There's no canned
functionality for what you'd like to accomplish (intersection of string
lists, IIUC). You'd have to piece this together out of parallel
primitives (scan/sort) or just handwritten (in C) kernels. String
sorting on the GPU is something that people publish legitimate research
papers on. (2) Learning to do these things (unfortunately) takes a
considerable time investment--think months. So please understand that
the friendly folks on the mailing list won't be able to give you a quick
tutorial, or even provide you with an 'easy answer'.
Sorry,
Andreas
--
Luigi Assom

Skype contact: oggigigi
Loading...