[PyCUDA] pycuda garbage collection

Discussion:

Matthias Lee

2014-07-23 15:59:58 UTC

Hi all,

I noticed something interesting today.
I am working on an image processing tool which loops several times over
each of a series of images. Everything is done in place and I should not be
growing my memory footprint between iterations.

Now when I tracked the actual GPU memory consumption I found that I would
ultimately I would run out of GPU memory (just a short excerpt):
Loading Image...

I double and triple checked that everything is happening in place, started
trying to delete GPU objects as soon as I'm finished with them to try to
trigger the GC, but that only had limited success. I would expect the GC to
kick in before the GPU runs out of memory..

I then started manually calling gc.collect() every few iteration and
suddenly everything started behaving and is now relatively stable. See here
(note the scale difference): Loading Image...

Is this normal? Is this a bug?

Thanks,

Matthias
--
Matthias Lee
IDIES/Johns Hopkins University
Performance @ Rational/IBM

Matthias.A.Lee at gmail.com
MatthiasLee at jhu.edu
(320) 496 6293

To know recursion, you must first know recursion.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.tiker.net/pipermail/pycuda/attachments/20140723/217d3de7/attachment.html>

Andreas Kloeckner

2014-07-23 16:46:24 UTC

Permalink

Hi Matthias,

Post by Matthias Lee
I noticed something interesting today.
I am working on an image processing tool which loops several times over
each of a series of images. Everything is done in place and I should not be
growing my memory footprint between iterations.
Now when I tracked the actual GPU memory consumption I found that I would
http://i.imgur.com/AjmmpEk.png
I double and triple checked that everything is happening in place, started
trying to delete GPU objects as soon as I'm finished with them to try to
trigger the GC, but that only had limited success. I would expect the GC to
kick in before the GPU runs out of memory..
I then started manually calling gc.collect() every few iteration and
suddenly everything started behaving and is now relatively stable. See here
(note the scale difference): http://i.imgur.com/Zzq5YdC.png
Is this normal? Is this a bug?

First off, you can force-free GPU memory using this, if all else fails:

http://documen.tician.de/pycuda/driver.html#pycuda.driver.DeviceAllocation.free

Next, the behavior you're seeing means that a reference cycle of some
sort must exist within the object that's holding on to you GPU
memory. (Could be PyCUDA's GPUArray--it has happened before, but I'd
consider it a bug. I'll go poke around if it is. Let me know.) A
reference cycle means that Python will only free these objects upon a GC
run (since the refcount will never return to zero on its own). Unless
told explicitly otherwise (see above), PyCUDA will only free GPU memory
once the associated Python handle objects have been pronounced unused by
the Python runtime.

PyCUDA is smart enough to force a GC run before declaring defeat on a
memory allocation, so if you're the only one using a GPU, this shouldn't
pose an issue. If you're using other libraries that also (try to)
allocate GPU memory, then this might pose an issue, because they *won't*
know to try GC'ing.

Hope that helps,
Andreas

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 810 bytes
Desc: not available
URL: <http://lists.tiker.net/pipermail/pycuda/attachments/20140723/89d6a564/attachment.sig>

Stanley Seibert

2014-07-23 16:53:02 UTC

Permalink

This has also been my experience when dealing with long-running programs that allocate large fractions of the GPU memory. However, I'm not sure why normal Python reference counting is insufficient to free GPU memory as soon as the Python object container goes out of scope.

The fact that gc.collect() fixes the problem suggests that there is a reference cycle associated with each GPU memory allocation, which is why garbage collection is required to free the memory. In my application, all of my GPU arrays were attributes in instances of a Python class, so I added a __del__ method to my class to call gc.collect() for me whenever an class instance was deallocated.

Post by Matthias Lee
Hi all,
I noticed something interesting today.
I am working on an image processing tool which loops several times over each of a series of images. Everything is done in place and I should not be growing my memory footprint between iterations.
Now when I tracked the actual GPU memory consumption I found that I would ultimately I would run out of GPU memory (just a short excerpt): http://i.imgur.com/AjmmpEk.png
I double and triple checked that everything is happening in place, started trying to delete GPU objects as soon as I'm finished with them to try to trigger the GC, but that only had limited success. I would expect the GC to kick in before the GPU runs out of memory..
I then started manually calling gc.collect() every few iteration and suddenly everything started behaving and is now relatively stable. See here (note the scale difference): http://i.imgur.com/Zzq5YdC.png
Is this normal? Is this a bug?
Thanks,
Matthias
--
Matthias Lee
IDIES/Johns Hopkins University
Matthias.A.Lee at gmail.com
MatthiasLee at jhu.edu
(320) 496 6293
To know recursion, you must first know recursion.
_______________________________________________
PyCUDA mailing list
PyCUDA at tiker.net
http://lists.tiker.net/listinfo/pycuda

Matthias Lee

2014-07-24 01:11:14 UTC

Permalink

Andreas,

I am using GPUArray very heavily, I will see how it behaves if I explicitly
free my GPUArrays.

Thanks,

Matthias

On Wed, Jul 23, 2014 at 12:53 PM, Stanley Seibert <sseibert at hep.upenn.edu>

Post by Stanley Seibert
This has also been my experience when dealing with long-running programs
that allocate large fractions of the GPU memory. However, I'm not sure why
normal Python reference counting is insufficient to free GPU memory as soon
as the Python object container goes out of scope.
The fact that gc.collect() fixes the problem suggests that there is a
reference cycle associated with each GPU memory allocation, which is why
garbage collection is required to free the memory. In my application, all
of my GPU arrays were attributes in instances of a Python class, so I added
a __del__ method to my class to call gc.collect() for me whenever an class
instance was deallocated.

Post by Matthias Lee
Hi all,
I noticed something interesting today.
I am working on an image processing tool which loops several times over

each of a series of images. Everything is done in place and I should not be
growing my memory footprint between iterations.

Post by Matthias Lee
Now when I tracked the actual GPU memory consumption I found that I

http://i.imgur.com/AjmmpEk.png

Post by Matthias Lee
I double and triple checked that everything is happening in place,

started trying to delete GPU objects as soon as I'm finished with them to
try to trigger the GC, but that only had limited success. I would expect
the GC to kick in before the GPU runs out of memory..

Post by Matthias Lee
I then started manually calling gc.collect() every few iteration and

suddenly everything started behaving and is now relatively stable. See here
(note the scale difference): http://i.imgur.com/Zzq5YdC.png

Post by Matthias Lee
Is this normal? Is this a bug?
Thanks,
Matthias
--
Matthias Lee
IDIES/Johns Hopkins University
Matthias.A.Lee at gmail.com
MatthiasLee at jhu.edu
(320) 496 6293
To know recursion, you must first know recursion.
_______________________________________________
PyCUDA mailing list
PyCUDA at tiker.net
http://lists.tiker.net/listinfo/pycuda

--
Matthias Lee
IDIES/Johns Hopkins University
Performance @ Rational/IBM

Matthias.A.Lee at gmail.com
MatthiasLee at jhu.edu
(320) 496 6293

To know recursion, you must first know recursion.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.tiker.net/pipermail/pycuda/attachments/20140723/ecdb44ab/attachment.html>