[PyCUDA] odd segfault when using pycuda with mpi4py and CUDA-enabled OpenMPI

Discussion:

Lev Givon

2014-12-23 23:45:08 UTC

(Not sure if this is more of an mpi4py or a pycuda issue at this point.)

I recently tried running a gist a wrote in the past [1] to test communication of
data stored in GPU memory with pycuda using mpi4py compiled against OpenMPI
1.8.* (which contains CUDA support). Using the latest revision (9a70e69)
compiled against OpenMPI 1.8.4 (which was in turn compiled against CUDA 6.5 on
Ubuntu 14.04.1) and installed in a Python 2.7.6 virtualenv along with pycuda
2014.1 (also manually compiled against CUDA 6.5), I was able to run the gist
without any problems. However, when I changed line 55 from

x_gpu = gpuarray.arange(100, 200, 10, dtype=np.double)

to

x_gpu = gpuarray.to_gpu(np.arange(100, 200, 10, dtype=np.double))

the data transfer succeeded but was immediately followed by the following error:

[avicenna:32494] *** Process received signal ***
[avicenna:32494] Signal: Segmentation fault (11)
[avicenna:32494] Signal code: Address not mapped (1)
[avicenna:32494] Failing at address: (nil)
[avicenna:32494] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x2ba2e8fe2340]
[avicenna:32494] [ 1] /usr/lib/x86_64-linux-gnu/libcuda.so.1(+0x1f60f5)[0x2ba2fd19b0f5]
[avicenna:32494] [ 2] /usr/lib/x86_64-linux-gnu/libcuda.so.1(+0x20470b)[0x2ba2fd1a970b]
[avicenna:32494] [ 3] /usr/lib/x86_64-linux-gnu/libcuda.so.1(+0x17ac02)[0x2ba2fd11fc02]
[avicenna:32494] [ 4] /usr/lib/x86_64-linux-gnu/libcuda.so.1(cuStreamDestroy_v2+0x52)[0x2ba2fd0eeb32]
[avicenna:32494] [ 5] /opt/openmpi-1.8.4/lib/libmpi.so.1(mca_common_cuda_fini+0x1c3)[0x2ba2f57718a3]
[avicenna:32494] [ 6] /opt/openmpi-1.8.4/lib/libmpi.so.1(+0xf5e3e)[0x2ba2f57aee3e]
[avicenna:32494] [ 7] /opt/openmpi-1.8.4/lib/libopen-pal.so.6(mca_base_component_close+0x19)[0x2ba2f6122099]
[avicenna:32494] [ 8] /opt/openmpi-1.8.4/lib/libopen-pal.so.6(mca_base_components_close+0x42)[0x2ba2f6122112]
[avicenna:32494] [ 9] /opt/openmpi-1.8.4/lib/libmpi.so.1(+0xd7515)[0x2ba2f5790515]
[avicenna:32494] [10] /opt/openmpi-1.8.4/lib/libopen-pal.so.6(mca_base_framework_close+0x63)[0x2ba2f612b3c3]
[avicenna:32494] [11] /opt/openmpi-1.8.4/lib/libopen-pal.so.6(mca_base_framework_close+0x63)[0x2ba2f612b3c3]
[avicenna:32494] [12] /opt/openmpi-1.8.4/lib/libmpi.so.1(ompi_mpi_finalize+0x56d)[0x2ba2f573693d]
[avicenna:32494] [13] /home/lev/Work/virtualenvs/PYTHON/lib/python2.7/site-packages/mpi4py/MPI.so(+0x2e694)[0x2ba2f53b2694]
[avicenna:32494] [14] python(Py_Finalize+0x1a6)[0x42fb0f]
[avicenna:32494] [15] python(Py_Main+0xbed)[0x46ac10]
[avicenna:32494] [16] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x2ba2e9211ec5]
[avicenna:32494] [17] python[0x57497e]
[avicenna:32494] *** End of error message ***

I also tried replacing line 55 with

x_gpu = gpuarray.zeros(10, dtype=np.double)
x_gpu.set(np.arange(100, 200, 10, dtype=np.double))

which resulted in no error and

x_gpu = gpuarray.empty(10, dtype=np.double)
x_gpu.set(np.arange(100, 200, 10, dtype=np.double))

which resulted in the same error as mentioned earlier.

Any ideas as to what could be going on?

[1] https://gist.github.com/8514d3456a94a6c73e6d

--
Lev Givon
Bionet Group | Neurokernel Project
http://www.columbia.edu/~lev/
http://lebedov.github.io/
http://neurokernel.github.io/

Ashwin Srinath

2014-12-24 19:51:42 UTC

Permalink

Hi Lev,

This code worked for me (even after changing line 55 to use
'gpuarray.to_gpu(np.arange...'). I'm on an environment very similar to
yours. Just a couple of suggestions:

1. Insert MPI.Finalize() at the end of your code.
2. If you're not already, pass the parameter '--mca pml ob1' to your
mpiexec command.

Thanks,
Ashwin

Post by Lev Givon
(Not sure if this is more of an mpi4py or a pycuda issue at this point.)
I recently tried running a gist a wrote in the past [1] to test communication of
data stored in GPU memory with pycuda using mpi4py compiled against OpenMPI
1.8.* (which contains CUDA support). Using the latest revision (9a70e69)
compiled against OpenMPI 1.8.4 (which was in turn compiled against CUDA 6.5 on
Ubuntu 14.04.1) and installed in a Python 2.7.6 virtualenv along with pycuda
2014.1 (also manually compiled against CUDA 6.5), I was able to run the gist
without any problems. However, when I changed line 55 from
x_gpu = gpuarray.arange(100, 200, 10, dtype=np.double)
to
x_gpu = gpuarray.to_gpu(np.arange(100, 200, 10, dtype=np.double))
[avicenna:32494] *** Process received signal ***
[avicenna:32494] Signal: Segmentation fault (11)
[avicenna:32494] Signal code: Address not mapped (1)
[avicenna:32494] Failing at address: (nil)
[avicenna:32494] [ 0]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x2ba2e8fe2340]
[avicenna:32494] [ 1]
/usr/lib/x86_64-linux-gnu/libcuda.so.1(+0x1f60f5)[0x2ba2fd19b0f5]
[avicenna:32494] [ 2]
/usr/lib/x86_64-linux-gnu/libcuda.so.1(+0x20470b)[0x2ba2fd1a970b]
[avicenna:32494] [ 3]
/usr/lib/x86_64-linux-gnu/libcuda.so.1(+0x17ac02)[0x2ba2fd11fc02]
[avicenna:32494] [ 4]
/usr/lib/x86_64-linux-gnu/libcuda.so.1(cuStreamDestroy_v2+0x52)[0x2ba2fd0eeb32]
[avicenna:32494] [ 5]
/opt/openmpi-1.8.4/lib/libmpi.so.1(mca_common_cuda_fini+0x1c3)[0x2ba2f57718a3]
[avicenna:32494] [ 6]
/opt/openmpi-1.8.4/lib/libmpi.so.1(+0xf5e3e)[0x2ba2f57aee3e]
[avicenna:32494] [ 7]
/opt/openmpi-1.8.4/lib/libopen-pal.so.6(mca_base_component_close+0x19)[0x2ba2f6122099]
[avicenna:32494] [ 8]
/opt/openmpi-1.8.4/lib/libopen-pal.so.6(mca_base_components_close+0x42)[0x2ba2f6122112]
[avicenna:32494] [ 9]
/opt/openmpi-1.8.4/lib/libmpi.so.1(+0xd7515)[0x2ba2f5790515]
[avicenna:32494] [10]
/opt/openmpi-1.8.4/lib/libopen-pal.so.6(mca_base_framework_close+0x63)[0x2ba2f612b3c3]
[avicenna:32494] [11]
/opt/openmpi-1.8.4/lib/libopen-pal.so.6(mca_base_framework_close+0x63)[0x2ba2f612b3c3]
[avicenna:32494] [12]
/opt/openmpi-1.8.4/lib/libmpi.so.1(ompi_mpi_finalize+0x56d)[0x2ba2f573693d]
[avicenna:32494] [13]
/home/lev/Work/virtualenvs/PYTHON/lib/python2.7/site-packages/mpi4py/MPI.so(+0x2e694)[0x2ba2f53b2694]
[avicenna:32494] [14] python(Py_Finalize+0x1a6)[0x42fb0f]
[avicenna:32494] [15] python(Py_Main+0xbed)[0x46ac10]
[avicenna:32494] [16]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x2ba2e9211ec5]
[avicenna:32494] [17] python[0x57497e]
[avicenna:32494] *** End of error message ***
I also tried replacing line 55 with
x_gpu = gpuarray.zeros(10, dtype=np.double)
x_gpu.set(np.arange(100, 200, 10, dtype=np.double))
which resulted in no error and
x_gpu = gpuarray.empty(10, dtype=np.double)
x_gpu.set(np.arange(100, 200, 10, dtype=np.double))
which resulted in the same error as mentioned earlier.
Any ideas as to what could be going on?
[1] https://gist.github.com/8514d3456a94a6c73e6d
--
Lev Givon
Bionet Group | Neurokernel Project
http://www.columbia.edu/~lev/
http://lebedov.github.io/
http://neurokernel.github.io/
_______________________________________________
PyCUDA mailing list
http://lists.tiker.net/listinfo/pycuda

Ashwin Srinath

2014-12-24 20:20:16 UTC

Permalink

Lev,

The code ran without MPI.Finalize(), but I got an error about
cuMemHostUnregister. Not sure why this happens, but I've mentioned it
before in the mpi4py forum:

https://groups.google.com/forum/#!msg/mpi4py/xd-SR1b6GZ0/CdyHFWUNhskJ

Thanks,
Ashwin

Post by Lev Givon

Post by Ashwin Srinath

Post by Lev Givon
(Not sure if this is more of an mpi4py or a pycuda issue at this

point.)

Post by Ashwin Srinath

Post by Lev Givon
I recently tried running a gist a wrote in the past [1] to test communication of
data stored in GPU memory with pycuda using mpi4py compiled against

OpenMPI

Post by Ashwin Srinath

Post by Lev Givon
1.8.* (which contains CUDA support). Using the latest revision

(9a70e69)

Post by Ashwin Srinath

Post by Lev Givon
compiled against OpenMPI 1.8.4 (which was in turn compiled against CUDA 6.5 on
Ubuntu 14.04.1) and installed in a Python 2.7.6 virtualenv along with pycuda
2014.1 (also manually compiled against CUDA 6.5), I was able to run the gist
without any problems. However, when I changed line 55 from
x_gpu = gpuarray.arange(100, 200, 10, dtype=np.double)
to
x_gpu = gpuarray.to_gpu(np.arange(100, 200, 10, dtype=np.double))
the data transfer succeeded but was immediately followed by the

following

Post by Ashwin Srinath

Post by Lev Givon
[avicenna:32494] *** Process received signal ***
[avicenna:32494] Signal: Segmentation fault (11)
[avicenna:32494] Signal code: Address not mapped (1)
[avicenna:32494] Failing at address: (nil)
[avicenna:32494] [ 0]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x2ba2e8fe2340]
[avicenna:32494] [ 1]
/usr/lib/x86_64-linux-gnu/libcuda.so.1(+0x1f60f5)[0x2ba2fd19b0f5]
[avicenna:32494] [ 2]
/usr/lib/x86_64-linux-gnu/libcuda.so.1(+0x20470b)[0x2ba2fd1a970b]
[avicenna:32494] [ 3]
/usr/lib/x86_64-linux-gnu/libcuda.so.1(+0x17ac02)[0x2ba2fd11fc02]
[avicenna:32494] [ 4]

/usr/lib/x86_64-linux-gnu/libcuda.so.1(cuStreamDestroy_v2+0x52)[0x2ba2fd0eeb32]

Post by Ashwin Srinath

Post by Lev Givon
[avicenna:32494] [ 5]

/opt/openmpi-1.8.4/lib/libmpi.so.1(mca_common_cuda_fini+0x1c3)[0x2ba2f57718a3]

Post by Ashwin Srinath

Post by Lev Givon
[avicenna:32494] [ 6]
/opt/openmpi-1.8.4/lib/libmpi.so.1(+0xf5e3e)[0x2ba2f57aee3e]
[avicenna:32494] [ 7]

/opt/openmpi-1.8.4/lib/libopen-pal.so.6(mca_base_component_close+0x19)[0x2ba2f6122099]

Post by Ashwin Srinath

Post by Lev Givon
[avicenna:32494] [ 8]

/opt/openmpi-1.8.4/lib/libopen-pal.so.6(mca_base_components_close+0x42)[0x2ba2f6122112]

Post by Ashwin Srinath

Post by Lev Givon
[avicenna:32494] [ 9]
/opt/openmpi-1.8.4/lib/libmpi.so.1(+0xd7515)[0x2ba2f5790515]
[avicenna:32494] [10]

/opt/openmpi-1.8.4/lib/libopen-pal.so.6(mca_base_framework_close+0x63)[0x2ba2f612b3c3]

Post by Ashwin Srinath

Post by Lev Givon
[avicenna:32494] [11]

/opt/openmpi-1.8.4/lib/libopen-pal.so.6(mca_base_framework_close+0x63)[0x2ba2f612b3c3]

Post by Ashwin Srinath

Post by Lev Givon
[avicenna:32494] [12]

/opt/openmpi-1.8.4/lib/libmpi.so.1(ompi_mpi_finalize+0x56d)[0x2ba2f573693d]

Post by Ashwin Srinath

Post by Lev Givon
[avicenna:32494] [13]

/home/lev/Work/virtualenvs/PYTHON/lib/python2.7/site-packages/mpi4py/MPI.so(+0x2e694)[0x2ba2f53b2694]

Post by Ashwin Srinath

Post by Lev Givon
[avicenna:32494] [14] python(Py_Finalize+0x1a6)[0x42fb0f]
[avicenna:32494] [15] python(Py_Main+0xbed)[0x46ac10]
[avicenna:32494] [16]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x2ba2e9211ec5]
[avicenna:32494] [17] python[0x57497e]
[avicenna:32494] *** End of error message ***
I also tried replacing line 55 with
x_gpu = gpuarray.zeros(10, dtype=np.double)
x_gpu.set(np.arange(100, 200, 10, dtype=np.double))
which resulted in no error and
x_gpu = gpuarray.empty(10, dtype=np.double)
x_gpu.set(np.arange(100, 200, 10, dtype=np.double))
which resulted in the same error as mentioned earlier.
Any ideas as to what could be going on?
[1] https://gist.github.com/8514d3456a94a6c73e6d

Hi Lev,
This code worked for me (even after changing line 55 to use
'gpuarray.to_gpu(np.arange...'). I'm on an environment very similar to
yours.

Did the code run without error on your system after modifying line 55 even
without MPI.Finalize() added to the end of the code?

Post by Ashwin Srinath
1. Insert MPI.Finalize() at the end of your code.
2. If you're not already, pass the parameter '--mca pml ob1' to your
mpiexec command.

Adding the call to MPI.Finalize() made the error go away even when using
gpuarray.to_gpu(); adding the extra mca parameters didn't appear to have
any effect.
My understanding is that the call to MPI.Finalize() should be automatically
registered to be executed when the processes exit; this makes me wonder
whether
my explicitly registering the pycuda method that cleans up the current
context
is causing problems. I'll see what the folks on the mpi4py list have to
say.
Thanks,
--
Lev Givon
Bionet Group | Neurokernel Project
http://www.columbia.edu/~lev/
http://lebedov.github.io/
http://neurokernel.github.io/

Freddie Witherden

2014-12-24 20:37:15 UTC

Permalink

Post by Lev Givon
Adding the call to MPI.Finalize() made the error go away even when using
gpuarray.to_gpu(); adding the extra mca parameters didn't appear to have any effect.
My understanding is that the call to MPI.Finalize() should be automatically
registered to be executed when the processes exit; this makes me wonder whether
my explicitly registering the pycuda method that cleans up the current context
is causing problems. I'll see what the folks on the mpi4py list have to say.

The order is almost certainly important. If the MPI library allocates
some CUDA resources -- or expects to be able to call the CUDA API during
MPI_Finalize() -- then it is important that the CUDA context is still
valid. Therefore, one must ensure that MPI_Finalize() is called before
PyCUDA begins its cleanup.

In my experience it is better to manage these things manually and
explicitly through a single atexit handler function.

Regards, Freddie.