Discussion:
[PyCUDA] Precompiling kernels?
samie abdul
2015-02-19 14:49:38 UTC
Permalink
Hi,

is it possible to "precompile" the invoked kernels beforehand? My code makes use of several CUDA kernels, which are basically called within a "fit" function. Profiling the code with cProfile yields:

42272 function calls (42228 primitive calls) in 1.662 seconds
...

11 0.000 0.000 0.344 0.031 compiler.py:185(compile)
11 0.002 0.000 0.346 0.031 compiler.py:245(__init__)
4 0.000 0.000 0.317 0.079 compiler.py:33(preprocess_source)
11 0.000 0.000 0.342 0.031 compiler.py:66(compile_plain)
...

Thus, about 0.344 of the 1.662 seconds are spent on compiling the code. When executing the function "fit" twice, the code is not compiled again (hence, saving these 0.344 seconds for the second call of "fit"). I would like to somehow precompile all involved kernels as soon as the object the "fit" function belongs to is initialized...


Can one invoke the overall compilation process beforehand?

Thanks!
Fabian
Andreas Kloeckner
2015-02-19 16:08:40 UTC
Permalink
Post by samie abdul
Hi,
42272 function calls (42228 primitive calls) in 1.662 seconds
...
11 0.000 0.000 0.344 0.031 compiler.py:185(compile)
11 0.002 0.000 0.346 0.031 compiler.py:245(__init__)
4 0.000 0.000 0.317 0.079 compiler.py:33(preprocess_source)
11 0.000 0.000 0.342 0.031 compiler.py:66(compile_plain)
...
Thus, about 0.344 of the 1.662 seconds are spent on compiling the
code. When executing the function "fit" twice, the code is not
compiled again (hence, saving these 0.344 seconds for the second call
of "fit"). I would like to somehow precompile all involved kernels as
soon as the object the "fit" function belongs to is initialized...
Can one invoke the overall compilation process beforehand?
Sure! That's what the SourceModule constructor does. Just keep the
instance around.

Andreas
samie abdul
2015-02-20 13:34:27 UTC
Permalink
Hi Andreas,

thanks for the quick answer! It seems that my question is somehow related to scikits.cuda. Executing the code attached to this email sketches the issue: By uncommenting the line "#model.fit(a_gpu, b_gpu)", the profiling output "compiler.py:185(compile)" vanishes. The manual PyCUDA kernel seems be compiled beforehand (during the initializing the model). The call "culinalg.dot", however, seems to cause the compiler output after having initialized the model ...


For this toy example, there seems to be no big time difference for the second call of "fit" (i.e., whether the first "fit" line is uncommented or not; on my machine, it takes about 0.41 seconds in both cases). However, for the project I am working on (which is too big to share), it makes a difference (1.35 seconds instead of 1.662 seconds, as mentioned in my previous email).

Cheers
Fabian
Post by samie abdul
Hi,
42272 function calls (42228 primitive calls) in 1.662 seconds
...
11 0.000 0.000 0.344 0.031 compiler.py:185(compile)
11 0.002 0.000 0.346 0.031 compiler.py:245(__init__)
4 0.000 0.000 0.317 0.079 compiler.py:33(preprocess_source)
11 0.000 0.000 0.342 0.031 compiler.py:66(compile_plain)
...
Thus, about 0.344 of the 1.662 seconds are spent on compiling the
code. When executing the function "fit" twice, the code is not
compiled again (hence, saving these 0.344 seconds for the second call
of "fit"). I would like to somehow precompile all involved kernels as
soon as the object the "fit" function belongs to is initialized...
Can one invoke the overall compilation process beforehand?
Sure! That's what the SourceModule constructor does. Just keep the
instance around.

Andreas
Lev Givon
2015-02-20 14:55:03 UTC
Permalink
Post by samie abdul
Hi Andreas,
thanks for the quick answer! It seems that my question is somehow related to
scikits.cuda. Executing the code attached to this email sketches the issue: By
uncommenting the line "#model.fit(a_gpu, b_gpu)", the profiling output
"compiler.py:185(compile)" vanishes. The manual PyCUDA kernel seems be
compiled beforehand (during the initializing the model). The call
"culinalg.dot", however, seems to cause the compiler output after having
initialized the model ...
What version of scikits.cuda are you using? The latest version of the dot()
function (either in GitHub or in scikits.cuda 0.5.0a2) relies entirely on the
cublas library and hence shouldn't trigger any PyCUDA kernel compilation itself.
--
Lev Givon
Bionet Group | Neurokernel Project
http://www.columbia.edu/~lev/
http://lebedov.github.io/
http://neurokernel.github.io/
Fasram
2015-02-21 10:29:38 UTC
Permalink
Hi Lev,

I am using the newest scikits.cuda version I think (on Ubuntu 14.04, 64bit; several Nvidia GPUS; CUDA 6.5):


Python 2.7.6 (default, Mar 22 2014, 22:59:56)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Post by samie abdul
import scikits.cuda
scikits.cuda.__version__
'0.5.0a2'



Thanks
Fabian
Post by samie abdul
Hi Andreas,
thanks for the quick answer! It seems that my question is somehow related to
scikits.cuda. Executing the code attached to this email sketches the issue: By
uncommenting the line "#model.fit(a_gpu, b_gpu)", the profiling output
"compiler.py:185(compile)" vanishes. The manual PyCUDA kernel seems be
compiled beforehand (during the initializing the model). The call
"culinalg.dot", however, seems to cause the compiler output after having
initialized the model ...
What version of scikits.cuda are you using? The latest version of the dot()
function (either in GitHub or in scikits.cuda 0.5.0a2) relies entirely on the
cublas library and hence shouldn't trigger any PyCUDA kernel compilation itself.
--
Lev Givon
Bionet Group | Neurokernel Project
http://www.columbia.edu/~lev/
http://lebedov.github.io/
http://neurokernel.github.io/
Loading...