Bruce Labitt
2015-03-28 15:29:01 UTC
From reading the documentation, I am confused if paralleling of this kind
of function is worth doing in pycuda.
I'm trying to add the effect of phase noise in to a radar simulation. The
simulation is written in Scipy/numpy. Currently I am using joblib to run
multiple cores. It is too slow for the scenarios I wish to try. It does
work for a small number of targets and reduced phase noise array sizes.
The following is the current approach:
Function to parallelize
def MSIN( farray, Mf, tf, jj ):
"""
farray, Mf, tf, ii
farray array of frequencies (size = 10000)
Mf array of coefficients (size = 10000)
tf 2D array ~[2048 x 256] of time
jj list of indices (fraction of the problem to solve)
"""
Msin = 0.0
for ii in jj:
Msin = Msin + Mf[ii] * 2.0*cos( 2.0*pi*farray[ii]*tf )
return Msin
Current method to call function in parallel (multiprocessing)
"""
====================================================
Parallel computes the function MSIN with njobs cores
====================================================
"""
MMM = Parallel(n_jobs=njobs, max_nbytes=None)\
(delayed(MSIN)( f, aa, tf1, ii ) for ii in idx)
Msin = reduce(add, MMM) # add all the results of the cores together
Any suggestions to port this to pycuda? Reasonable candidate?
In essence, it is accumulating a scalar weighted cos function for many
elements of a 2D array. It 'feels' like it should be portable. Any road
blocks forseen? The 2D array of times is continuous in the sense of
stride. But there are discontinuous jumps in time values in the array,
which I do not think is a problem.
I have from DumpProperties.py
Device #0: GeForce GTX 680M
Compute Capability: 3.0
Total Memory: 4193984 KB
CAN_MAP_HOST_MEMORY: 1
CLOCK_RATE: 758000
MAX_BLOCK_DIM_X: 1024
MAX_BLOCK_DIM_Y: 1024
MAX_BLOCK_DIM_Z: 64
MAX_GRID_DIM_X: 2147483647
MAX_GRID_DIM_Y: 65535
MAX_GRID_DIM_Z: 65535
CUDA6.5
Thanks in advance for any insight, or suggestions on how to attack the
problem
-Bruce
of function is worth doing in pycuda.
I'm trying to add the effect of phase noise in to a radar simulation. The
simulation is written in Scipy/numpy. Currently I am using joblib to run
multiple cores. It is too slow for the scenarios I wish to try. It does
work for a small number of targets and reduced phase noise array sizes.
The following is the current approach:
Function to parallelize
def MSIN( farray, Mf, tf, jj ):
"""
farray, Mf, tf, ii
farray array of frequencies (size = 10000)
Mf array of coefficients (size = 10000)
tf 2D array ~[2048 x 256] of time
jj list of indices (fraction of the problem to solve)
"""
Msin = 0.0
for ii in jj:
Msin = Msin + Mf[ii] * 2.0*cos( 2.0*pi*farray[ii]*tf )
return Msin
Current method to call function in parallel (multiprocessing)
"""
====================================================
Parallel computes the function MSIN with njobs cores
====================================================
"""
MMM = Parallel(n_jobs=njobs, max_nbytes=None)\
(delayed(MSIN)( f, aa, tf1, ii ) for ii in idx)
Msin = reduce(add, MMM) # add all the results of the cores together
Any suggestions to port this to pycuda? Reasonable candidate?
In essence, it is accumulating a scalar weighted cos function for many
elements of a 2D array. It 'feels' like it should be portable. Any road
blocks forseen? The 2D array of times is continuous in the sense of
stride. But there are discontinuous jumps in time values in the array,
which I do not think is a problem.
I have from DumpProperties.py
Device #0: GeForce GTX 680M
Compute Capability: 3.0
Total Memory: 4193984 KB
CAN_MAP_HOST_MEMORY: 1
CLOCK_RATE: 758000
MAX_BLOCK_DIM_X: 1024
MAX_BLOCK_DIM_Y: 1024
MAX_BLOCK_DIM_Z: 64
MAX_GRID_DIM_X: 2147483647
MAX_GRID_DIM_Y: 65535
MAX_GRID_DIM_Z: 65535
CUDA6.5
Thanks in advance for any insight, or suggestions on how to attack the
problem
-Bruce