[PyCUDA] Shutting down/Re-initializing PyCUDA

Discussion:

Thomas Unterthiner

2014-09-24 08:58:51 UTC

Hi!

I have a program that makes extensive use of pycuda, but also calls out
to a C library which also uses CUDA internally (it does not share any
state or memory with the pycuda code, and uses the CUDA runtime API).
However, after the call to the C library ends, all my PyCUDA calls fail
with a "LogicError: cuFuncSetBlockShape failed: invalid handle". (The
same call with the same parameters worked fine before calling out to the
C library).

I have tried explicitly initializing/shutting down the PyCUDA contexts,
but I still can't get stuff to work. The relevant parts of my program
look as follows:

# initialize PyCUDA
def init_pycuda(gpu_id):
import pycuda.driver as cuda
global __pycuda_context, __pycuda_device
pycuda_drv.init()
__pycuda_device = pycuda_drv.Device(gpu_id)
__pycuda_context = __pycuda_device.make_context()
import scikits.cuda.misc
scikits.cuda.misc.init()

init_pycuda(0)
use_pycuda()

# trying to shut down PyCUDA
import scikits.cuda.misc
from pycuda.tools import clear_context_caches
cuda_memory_pool.free_held()
cuda_hostmemory_pool.free_held()
scikits.cuda.misc.shutdown()
__pycuda_context.pop()
clear_context_caches()
__pycuda_context = None
__pycuda_device = None

# ... now I'm calling out to the other library
call_external_library()

init_pycuda(0)
use_pycuda() # this will now fail with a LogicError

As said before, the C library uses the CUDA runtime API, so it uses
cudaSetDevice to initialize and calls cudaDeviceReset at the end. Is
there something I'm overlooking wrt. how to (de)initializing PyCUDA?

Cheers

Thomas

Andreas Kloeckner

2014-09-24 15:19:25 UTC

Permalink

Post by Thomas Unterthiner
Hi!
I have a program that makes extensive use of pycuda, but also calls out
to a C library which also uses CUDA internally (it does not share any
state or memory with the pycuda code, and uses the CUDA runtime API).
However, after the call to the C library ends, all my PyCUDA calls fail
with a "LogicError: cuFuncSetBlockShape failed: invalid handle". (The
same call with the same parameters worked fine before calling out to the
C library).
I have tried explicitly initializing/shutting down the PyCUDA contexts,
but I still can't get stuff to work. The relevant parts of my program
# initialize PyCUDA
import pycuda.driver as cuda
global __pycuda_context, __pycuda_device
pycuda_drv.init()
__pycuda_device = pycuda_drv.Device(gpu_id)
__pycuda_context = __pycuda_device.make_context()
import scikits.cuda.misc
scikits.cuda.misc.init()
init_pycuda(0)
use_pycuda()
# trying to shut down PyCUDA
import scikits.cuda.misc
from pycuda.tools import clear_context_caches
cuda_memory_pool.free_held()
cuda_hostmemory_pool.free_held()
scikits.cuda.misc.shutdown()
__pycuda_context.pop()
clear_context_caches()
__pycuda_context = None
__pycuda_device = None
# ... now I'm calling out to the other library
call_external_library()
init_pycuda(0)
use_pycuda() # this will now fail with a LogicError
As said before, the C library uses the CUDA runtime API, so it uses
cudaSetDevice to initialize and calls cudaDeviceReset at the end. Is
there something I'm overlooking wrt. how to (de)initializing PyCUDA?

Does the C library document what it does in regards to context
management?

Also, to the best of my knowledge, the CUDA driver library offers no
possibility to "shut down" CUDA once it's initialized, so PyCUDA can't
do that either.

HTH,
Andreas
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 810 bytes
Desc: not available
URL: <http://lists.tiker.net/pipermail/pycuda/attachments/20140924/ce555013/attachment.sig>

Thomas Unterthiner

2014-09-25 09:06:01 UTC

Permalink

[sending to pycuda at tiker.net again, I think I replied to the wrong
address last time so I'm not sure they ended up on the list]

The C library uses the runtime API, thus it does not do any explicit
context management. It calls cudaDeviceReset() before it returns to
Python (which as far as I understand should undo any implicit context
allocations it did before, according to
http://developer.download.nvidia.com/compute/cuda/4_1/rel/toolkit/docs/online/group__CUDART__DRIVER.html
).
If needed, I can see if I can provide a minimal C library that exhibits
the behavior. I just wanted to make sure I didn't have any mistakes in
my PyCUDA code beforehand.

I seem to remember that what Nvidia hacked together in terms of their runtime/driver API
interoperability requires that the runtime API (in your case the C
library) be in charge of managing CUDA contexts.
Andreas

From what little I found by googling around, mixing runtime API/context
APIs seems to be quite a mess. I found no clear indication of how to
handle this within the C library. Do you by any chance have any
pointers? (I have forgotten to point this out before: I do have the
source code to the C library and can modify it if there's something I
can do from there. But I'd rather not rewrite the whole thing using the
driver API if it can be avoided).

Ideally what I'd want is to shut down PyCUDA completely before calling
the C library, and re-initializing it from the ground up again
afterwards. But for some reason this doesn't seem to work the way I
envisioned. So either

1) I forgot some steps when shutting down PyCUDA
2) I forgot some steps when re-initializing PyCUDA
3) the C library doesn't clean up properly before exiting
4) as both PyCUDA and the C library operate within the same
process/thread, I can't avoid some sort of co-dependence between the two

I was hoping it would be one of the cases 1-3, as these are probably
easiest to remedy. Can you confirm that I did 1 and 2 correctly (the
code I used for these two steps is included in the first email to the
list)?

Cheers

Thomas