I am trying to write a CUDA kernel for computing a piece of Python code. I would need to sort the data inside the kernel but since there are no built in functions that would sort the data so I am not sure what would be the best way to implement this. Any suggestions on how to implement it would be appreciated. Feel free to put some sample code or point me to online resources that have a similar problem.
for i in xrange(0,img_size_x-window_size):
for j in xrange(0,img_size_y-window_size):
kernel = img[i:i+window_size,j:j+window_size]
kernel_flat = np.sort(kernel.flatten())
# Calculate the rank
rank = np.where(kernel_flat == img[i,j])
img_mod[i,j] = int((rank * 255 )/(window_size_squared))