[PyCUDA] General question about CUDA compiler and early returns

Andreas Kloeckner

2015-10-26 15:02:57 UTC

Post by Walter White
Hello,
I have a question and hope that you can help me.
I am trying to find the bottleneck in my code but I can't get a
grip at the moment.
For a while I thought it was the writes to global memory
At the moment I am using an early "return" statement in my
code to skip parts of the code, e.g. a for-loop.
Now I am wondering if this is working at all.
Could it be that the code exits even way before
the "return" statement when the compiler recognizes that
calculations done in a for-loop are not written to
global memory or used anywhere else?

The real way to tell is to look at the PTX. But, generally, yes, if you
don't write results to global, I think the Nv compiler will get rid of
your entire kernel.

Andreas