[PyCUDA] Autoinit failing after driver update

Discussion:

Josh Willis

2017-01-25 22:05:42 UTC

Hi,

After updating the NVIDIA driver from 367.48 to 375.26, I can no longer get PyCUDA to run. I have tried a fresh build of PyCUDA-2016.1.2, and the configure/make/make install steps seem to proceed fine. However if I do:

$ python
Python 2.7.5 (default, Nov 3 2016, 22:05:29)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux2
Type "help", "copyright", "credits" or "license" for more information.

import pycuda
import pycuda.autoinit

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/jwillis/envs/er10/lib/python2.7/site-packages/pycuda-2016.1.2-py2.7-linux-x86_64.egg/pycuda/autoinit.py", line 5, in <module>
cuda.init()
pycuda._driver.Error: cuInit failed: unknown error
If I look to make sure that kernel modules are loaded, I see the following (though I’m not sure what I *should* see, this just seemed to be a common source of this kind of problem after an upgrade):

$ lsmod | grep nvi
nvidia 11944366 0
i2c_core 40756 7 ast,drm,igb,i2c_i801,drm_kms_helper,i2c_algo_bit,nvidia

Does anyone have any suggestions on what to try next in debugging the source of this error? I can compile a “hello world” kernel directly with nvcc and run it with no problem.

Thanks,

Josh

Andreas Kloeckner

2017-01-25 22:13:09 UTC

Permalink

Post by Josh Willis
Hi,
$ python
Python 2.7.5 (default, Nov 3 2016, 22:05:29)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux2
Type "help", "copyright", "credits" or "license" for more information.

import pycuda
import pycuda.autoinit

File "<stdin>", line 1, in <module>
File "/home/jwillis/envs/er10/lib/python2.7/site-packages/pycuda-2016.1.2-py2.7-linux-x86_64.egg/pycuda/autoinit.py", line 5, in <module>
cuda.init()
pycuda._driver.Error: cuInit failed: unknown error
$ lsmod | grep nvi
nvidia 11944366 0
i2c_core 40756 7 ast,drm,igb,i2c_i801,drm_kms_helper,i2c_algo_bit,nvidia
Does anyone have any suggestions on what to try next in debugging the source of this error? I can compile a “hello world” kernel directly with nvcc and run it with no problem.

Check the output of 'dmesg'. You may need to reboot.

Andreas

Josh Willis

2017-01-25 22:49:29 UTC

Permalink

Hi Andreas,

Thanks for the quick response. The machine had been rebooted (uptime less than since the driver upgrade was applied) and when I look through the output of dmesg, I see:

[ 11.645200] nvidia-nvlink: Nvlink Core is being initialized, major device number 243
[ 11.645234] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 375.26 Thu Dec 8 18:36:43 PST 2016 (using threaded interrupts)

which to me would indicate the new kernel module has in fact been loaded. Is there something else I should be looking for there?

Thanks,

Josh

Post by Andreas Kloeckner

import pycuda
import pycuda.autoinit

Check the output of 'dmesg'. You may need to reboot.
Andreas

--
Josh Willis
***@acu.edu

Associate Professor of Engineering & Physics
Abilene Christian University
Onstead Science Center Room 321C

Phone: (325) 674-2527
Fax: (325) 674-2146
Dept: (325) 674-2165