i trying parallelize following code includes numerous numpy array operations
#fft_fit.pyx import cython import numpy np cimport numpy np cython.parallel cimport prange libc.stdlib cimport malloc, free dat1 = np.genfromtxt('/home/bagchilab/sumanta_files/fourier_ecology_sample_data_set.csv',delimiter=',') dat = np.delete(dat1, 0, 0) yr = np.unique(dat[:,0]) fit_dat = np.empty([1,2]) def fft_fit_yr(np.ndarray[double, ndim=1] yr, np.ndarray[double, ndim=2] dat, int yr_idx, int pix_idx): cdef np.ndarray[double, ndim=2] yr_dat1 cdef np.ndarray[double, ndim=2] yr_dat cdef np.ndarray[double, ndim=2] fft_dat cdef np.ndarray[double, ndim=2] fft_imp_dat cdef int len_yr = len(yr) in prange(len_yr ,nogil=true): gil: yr_dat1 = dat[dat[:,yr_idx]==yr[i]] yr_dat = yr_dat1[~np.isnan(yr_dat1).any(axis=1)] print "index" ,i y_fft = np.fft.fft(yr_dat[:,pix_idx]) y_fft_abs = np.abs(y_fft) y_fft_freq = np.fft.fftfreq(len(y_fft), 1) x_fft = range(len(y_fft)) fft_dat = np.column_stack((y_fft, y_fft_abs)) cut_off_freq = np.percentile(y_fft_abs, 25) imp_freq = np.array(y_fft_abs[y_fft_abs > cut_off_freq]) fft_imp_dat = np.empty((1,2)) j in range(len(imp_freq)): freq_dat = fft_dat[fft_dat[:, 1]==imp_freq[j]] fft_imp_dat = np.vstack((fft_imp_dat , freq_dat[0,:])) fft_imp_dat = np.delete(fft_imp_dat, 0, 0) fit_dat1 = np.fft.ifft(fft_imp_dat[:,0]) fit_dat2 = np.column_stack((fit_dat1.real, [yr[i]] * len(fit_dat1))) fit_dat = np.concatenate((fit_dat, fit_dat2), axis = 0)
i have used following code setup.py
####setup.py distutils.core import setup distutils.extension import extension cython.distutils import build_ext setup( cmdclass = {'build_ext': build_ext}, ext_modules = [extension("fft_fit_yr", ["fft_fit.pyx"])] extra_compile_args=['-fopenmp'], extra_link_args=['-fopenmp'])] )
but getting following error when compile fft_fit.pyx in cython:
in prange(len_yr ,nogil=true): target may not python object don't have gil
please let me know going wrong while using prange function. thanks.
you can't (at least not using cython).
numpy functions operate on python objects , therefore require gil, prevents multiple native threads executing in parallel. if compile code using cython -a
, annotated html file shows python c-api calls being made (and therefore gil can't released).
cython useful have specific bottleneck in code cannot speeded using vectorization. if code spending of time in numpy function calls calling exact same functions cython not going result in significant performance improvement. in order see noticeable difference need write or of array operations explicit for
loops. looks me though there simpler optimizations made code.
i suggest following:
- profile original python code (e.g. using
line_profiler
) see bottlenecks are. - focus attention on speeding these bottlenecks in single-threaded version. should ask separate question on if want this.
- if optimized single-threaded version still slow needs, parallelize using
joblib
ormultiprocessing
. parallelization last tool reach once you've tried else can think of.
Comments
Post a Comment