Lev Givon
2014-12-23 23:45:08 UTC
(Not sure if this is more of an mpi4py or a pycuda issue at this point.)
I recently tried running a gist a wrote in the past [1] to test communication of
data stored in GPU memory with pycuda using mpi4py compiled against OpenMPI
1.8.* (which contains CUDA support). Using the latest revision (9a70e69)
compiled against OpenMPI 1.8.4 (which was in turn compiled against CUDA 6.5 on
Ubuntu 14.04.1) and installed in a Python 2.7.6 virtualenv along with pycuda
2014.1 (also manually compiled against CUDA 6.5), I was able to run the gist
without any problems. However, when I changed line 55 from
x_gpu = gpuarray.arange(100, 200, 10, dtype=np.double)
to
x_gpu = gpuarray.to_gpu(np.arange(100, 200, 10, dtype=np.double))
the data transfer succeeded but was immediately followed by the following error:
[avicenna:32494] *** Process received signal ***
[avicenna:32494] Signal: Segmentation fault (11)
[avicenna:32494] Signal code: Address not mapped (1)
[avicenna:32494] Failing at address: (nil)
[avicenna:32494] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x2ba2e8fe2340]
[avicenna:32494] [ 1] /usr/lib/x86_64-linux-gnu/libcuda.so.1(+0x1f60f5)[0x2ba2fd19b0f5]
[avicenna:32494] [ 2] /usr/lib/x86_64-linux-gnu/libcuda.so.1(+0x20470b)[0x2ba2fd1a970b]
[avicenna:32494] [ 3] /usr/lib/x86_64-linux-gnu/libcuda.so.1(+0x17ac02)[0x2ba2fd11fc02]
[avicenna:32494] [ 4] /usr/lib/x86_64-linux-gnu/libcuda.so.1(cuStreamDestroy_v2+0x52)[0x2ba2fd0eeb32]
[avicenna:32494] [ 5] /opt/openmpi-1.8.4/lib/libmpi.so.1(mca_common_cuda_fini+0x1c3)[0x2ba2f57718a3]
[avicenna:32494] [ 6] /opt/openmpi-1.8.4/lib/libmpi.so.1(+0xf5e3e)[0x2ba2f57aee3e]
[avicenna:32494] [ 7] /opt/openmpi-1.8.4/lib/libopen-pal.so.6(mca_base_component_close+0x19)[0x2ba2f6122099]
[avicenna:32494] [ 8] /opt/openmpi-1.8.4/lib/libopen-pal.so.6(mca_base_components_close+0x42)[0x2ba2f6122112]
[avicenna:32494] [ 9] /opt/openmpi-1.8.4/lib/libmpi.so.1(+0xd7515)[0x2ba2f5790515]
[avicenna:32494] [10] /opt/openmpi-1.8.4/lib/libopen-pal.so.6(mca_base_framework_close+0x63)[0x2ba2f612b3c3]
[avicenna:32494] [11] /opt/openmpi-1.8.4/lib/libopen-pal.so.6(mca_base_framework_close+0x63)[0x2ba2f612b3c3]
[avicenna:32494] [12] /opt/openmpi-1.8.4/lib/libmpi.so.1(ompi_mpi_finalize+0x56d)[0x2ba2f573693d]
[avicenna:32494] [13] /home/lev/Work/virtualenvs/PYTHON/lib/python2.7/site-packages/mpi4py/MPI.so(+0x2e694)[0x2ba2f53b2694]
[avicenna:32494] [14] python(Py_Finalize+0x1a6)[0x42fb0f]
[avicenna:32494] [15] python(Py_Main+0xbed)[0x46ac10]
[avicenna:32494] [16] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x2ba2e9211ec5]
[avicenna:32494] [17] python[0x57497e]
[avicenna:32494] *** End of error message ***
I also tried replacing line 55 with
x_gpu = gpuarray.zeros(10, dtype=np.double)
x_gpu.set(np.arange(100, 200, 10, dtype=np.double))
which resulted in no error and
x_gpu = gpuarray.empty(10, dtype=np.double)
x_gpu.set(np.arange(100, 200, 10, dtype=np.double))
which resulted in the same error as mentioned earlier.
Any ideas as to what could be going on?
[1] https://gist.github.com/8514d3456a94a6c73e6d
I recently tried running a gist a wrote in the past [1] to test communication of
data stored in GPU memory with pycuda using mpi4py compiled against OpenMPI
1.8.* (which contains CUDA support). Using the latest revision (9a70e69)
compiled against OpenMPI 1.8.4 (which was in turn compiled against CUDA 6.5 on
Ubuntu 14.04.1) and installed in a Python 2.7.6 virtualenv along with pycuda
2014.1 (also manually compiled against CUDA 6.5), I was able to run the gist
without any problems. However, when I changed line 55 from
x_gpu = gpuarray.arange(100, 200, 10, dtype=np.double)
to
x_gpu = gpuarray.to_gpu(np.arange(100, 200, 10, dtype=np.double))
the data transfer succeeded but was immediately followed by the following error:
[avicenna:32494] *** Process received signal ***
[avicenna:32494] Signal: Segmentation fault (11)
[avicenna:32494] Signal code: Address not mapped (1)
[avicenna:32494] Failing at address: (nil)
[avicenna:32494] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x2ba2e8fe2340]
[avicenna:32494] [ 1] /usr/lib/x86_64-linux-gnu/libcuda.so.1(+0x1f60f5)[0x2ba2fd19b0f5]
[avicenna:32494] [ 2] /usr/lib/x86_64-linux-gnu/libcuda.so.1(+0x20470b)[0x2ba2fd1a970b]
[avicenna:32494] [ 3] /usr/lib/x86_64-linux-gnu/libcuda.so.1(+0x17ac02)[0x2ba2fd11fc02]
[avicenna:32494] [ 4] /usr/lib/x86_64-linux-gnu/libcuda.so.1(cuStreamDestroy_v2+0x52)[0x2ba2fd0eeb32]
[avicenna:32494] [ 5] /opt/openmpi-1.8.4/lib/libmpi.so.1(mca_common_cuda_fini+0x1c3)[0x2ba2f57718a3]
[avicenna:32494] [ 6] /opt/openmpi-1.8.4/lib/libmpi.so.1(+0xf5e3e)[0x2ba2f57aee3e]
[avicenna:32494] [ 7] /opt/openmpi-1.8.4/lib/libopen-pal.so.6(mca_base_component_close+0x19)[0x2ba2f6122099]
[avicenna:32494] [ 8] /opt/openmpi-1.8.4/lib/libopen-pal.so.6(mca_base_components_close+0x42)[0x2ba2f6122112]
[avicenna:32494] [ 9] /opt/openmpi-1.8.4/lib/libmpi.so.1(+0xd7515)[0x2ba2f5790515]
[avicenna:32494] [10] /opt/openmpi-1.8.4/lib/libopen-pal.so.6(mca_base_framework_close+0x63)[0x2ba2f612b3c3]
[avicenna:32494] [11] /opt/openmpi-1.8.4/lib/libopen-pal.so.6(mca_base_framework_close+0x63)[0x2ba2f612b3c3]
[avicenna:32494] [12] /opt/openmpi-1.8.4/lib/libmpi.so.1(ompi_mpi_finalize+0x56d)[0x2ba2f573693d]
[avicenna:32494] [13] /home/lev/Work/virtualenvs/PYTHON/lib/python2.7/site-packages/mpi4py/MPI.so(+0x2e694)[0x2ba2f53b2694]
[avicenna:32494] [14] python(Py_Finalize+0x1a6)[0x42fb0f]
[avicenna:32494] [15] python(Py_Main+0xbed)[0x46ac10]
[avicenna:32494] [16] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x2ba2e9211ec5]
[avicenna:32494] [17] python[0x57497e]
[avicenna:32494] *** End of error message ***
I also tried replacing line 55 with
x_gpu = gpuarray.zeros(10, dtype=np.double)
x_gpu.set(np.arange(100, 200, 10, dtype=np.double))
which resulted in no error and
x_gpu = gpuarray.empty(10, dtype=np.double)
x_gpu.set(np.arange(100, 200, 10, dtype=np.double))
which resulted in the same error as mentioned earlier.
Any ideas as to what could be going on?
[1] https://gist.github.com/8514d3456a94a6c73e6d
--
Lev Givon
Bionet Group | Neurokernel Project
http://www.columbia.edu/~lev/
http://lebedov.github.io/
http://neurokernel.github.io/
Lev Givon
Bionet Group | Neurokernel Project
http://www.columbia.edu/~lev/
http://lebedov.github.io/
http://neurokernel.github.io/