Discussion:
[PyCUDA] ReductionKernel.__call__ needs an allocator kwarg
Simon Perkins
2014-10-10 12:45:43 UTC
Permalink
Hi there

Would it be possible to add an allocator keyword argument to
ReductionKernel.__call__ and gpuarray.sum etc.?

At the moment we have:

krnl = ReductionKernel(...)
result = krnl(a, stream)

Now __call__() uses a.allocator to make device allocations, but unless a
has been allocated using a DeviceMemoryPool, a device allocation and
deallocation occurs for the returned value. Additionally, this serialises
asynchronous stream calls. One possible work-around is:

pool = pycuda.tools.DeviceMemoryPool()
tmp_alloc = a.allocator
a.allocator = pool.allocate
result = krnl(a, stream)
a.allocator = tmp_alloc

thanks!
Simon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.tiker.net/pipermail/pycuda/attachments/20141010/63d8fd30/attachment.html>
Andreas Kloeckner
2014-10-10 15:41:44 UTC
Permalink
Post by Simon Perkins
Hi there
Would it be possible to add an allocator keyword argument to
ReductionKernel.__call__ and gpuarray.sum etc.?
krnl = ReductionKernel(...)
result = krnl(a, stream)
Now __call__() uses a.allocator to make device allocations, but unless a
has been allocated using a DeviceMemoryPool, a device allocation and
deallocation occurs for the returned value. Additionally, this serialises
pool = pycuda.tools.DeviceMemoryPool()
tmp_alloc = a.allocator
a.allocator = pool.allocate
result = krnl(a, stream)
a.allocator = tmp_alloc
I'd be happy to take a patch.

Andreas
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 810 bytes
Desc: not available
URL: <http://lists.tiker.net/pipermail/pycuda/attachments/20141010/62752929/attachment.sig>
Simon Perkins
2014-10-15 15:33:29 UTC
Permalink
Hi Andreas

Here's the patch!

best
Simon

On Fri, Oct 10, 2014 at 5:41 PM, Andreas Kloeckner <lists at informa.tiker.net>
Post by Andreas Kloeckner
Post by Simon Perkins
Hi there
Would it be possible to add an allocator keyword argument to
ReductionKernel.__call__ and gpuarray.sum etc.?
krnl = ReductionKernel(...)
result = krnl(a, stream)
Now __call__() uses a.allocator to make device allocations, but unless a
has been allocated using a DeviceMemoryPool, a device allocation and
deallocation occurs for the returned value. Additionally, this serialises
pool = pycuda.tools.DeviceMemoryPool()
tmp_alloc = a.allocator
a.allocator = pool.allocate
result = krnl(a, stream)
a.allocator = tmp_alloc
I'd be happy to take a patch.
Andreas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.tiker.net/pipermail/pycuda/attachments/20141015/dec53176/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: reduction_allocator.patch
Type: text/x-patch
Size: 4952 bytes
Desc: not available
URL: <http://lists.tiker.net/pipermail/pycuda/attachments/20141015/dec53176/attachment.bin>
Andreas Kloeckner
2014-10-16 05:15:34 UTC
Permalink
Hi Simon,
Post by Simon Perkins
Here's the patch!
The patch looks good. One minor complaint is that in absence of an
allocator kwarg, your patch changes existing behavior. Specifically, the
allocator that was previously used was the one of the array passed to
the reduction, and now it's pycuda.driver.mem_alloc.

Andreas
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 810 bytes
Desc: not available
URL: <http://lists.tiker.net/pipermail/pycuda/attachments/20141016/35230e3a/attachment.sig>
Simon Perkins
2014-10-16 09:16:44 UTC
Permalink
Hi Andreas

I've modified the patch to take the existing behaviour into account.

best
Simon

On Thu, Oct 16, 2014 at 7:15 AM, Andreas Kloeckner <lists at informa.tiker.net>
Post by Andreas Kloeckner
Hi Simon,
Post by Simon Perkins
Here's the patch!
The patch looks good. One minor complaint is that in absence of an
allocator kwarg, your patch changes existing behavior. Specifically, the
allocator that was previously used was the one of the array passed to
the reduction, and now it's pycuda.driver.mem_alloc.
Andreas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.tiker.net/pipermail/pycuda/attachments/20141016/1008ff02/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: reduction_allocator.patch
Type: text/x-patch
Size: 5037 bytes
Desc: not available
URL: <http://lists.tiker.net/pipermail/pycuda/attachments/20141016/1008ff02/attachment.bin>
Andreas Kloeckner
2014-10-16 17:08:42 UTC
Permalink
Post by Simon Perkins
I've modified the patch to take the existing behaviour into account.
Applied to git. Thanks for your contribution!

Andreas
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 810 bytes
Desc: not available
URL: <http://lists.tiker.net/pipermail/pycuda/attachments/20141016/1787ce49/attachment.sig>
Loading...