[PyCUDA] Questions about PyCUDA from a former CS450 student

Andreas Kloeckner

2016-08-29 15:17:04 UTC

Hi Andreas,
I am a former student of your CS 450 and now I am a incoming PhD student in
operations research at Northwestern.
Since I am interested in applying parallel computing, preferably using
Python, to my future research, I have been looking for software which
combines Python with CUDA. Then I found PyCUDA on your website. And I found
NumbaPro. It seems that these two are the most popular choices for people
with needs like mine.
So my question is: which one do I begin to learn and use first? Could you
give some comments on pros and cons about the two?

Cc'ing the PyCUDA list for archival/searchability.

- PyCUDA lets you/forces you to write CUDA C for your kernels.

- Numba lets you write (a narrow subset of) Python for your kernels,
including arrays I believe.

- The code you write for both will be roughly equivalent modulo
spelling, since you'll have to

- PyCUDA exposes (nearly) the entire CUDA runtime, including streams,
profiling, textures, ... Numba is more restricted.

- PyCUDA comes with an on-device array type. I'm not sure if Numba's
arrays stay on-device after the computation finishes--i.e. you may
have some implicit copying.

- PyCUDA comes with some pre-made parallel algorithms such as scans
and reductions.

- You may also want to take a look at

- https://documen.tician.de/pyopencl/
- https://documen.tician.de/loopy/

Hope that helps,
Andreas