python - numpy mask array limiting the frequency of masked values -


starting array:

a = np.array([1,1,1,2,3,4,5,5]) 

and filter:

m = np.array([1,5]) 

i building mask with:

b = np.in1d(a,m) 

that correctly returns:

array([ true,  true,  true, false, false, false,  true,  true], dtype=bool) 

i need limit number of boolean trues unique values maximum value of 2, 1 masked 2 times instead of three). resulting mask appear (no matter order of first real true values):

array([ true,  true,  false, false, false, false,  true,  true], dtype=bool) 

or

array([ true,  false,  true, false, false, false,  true,  true], dtype=bool) 

or

array([ false,  true,  true, false, false, false,  true,  true], dtype=bool) 

ideally kind of "random" masking on limited frequency of values. far tried random select original unique elements in array, mask select true values no matter frequency.

for generic case unsorted input array, here's 1 approach based on np.searchsorted -

n = 2 # parameter decide how many duplicates allowed  sortidx = a.argsort() idx = np.searchsorted(a,m,sorter=sortidx)[:,none] + np.arange(n) lim_counts = (a[:,none] == m).sum(0).clip(max=n) idx_clipped = idx[lim_counts[:,none] > np.arange(n)] out = np.in1d(np.arange(a.size),idx_clipped)[sortidx.argsort()] 

sample run -

in [37]: out[37]: array([5, 1, 4, 2, 1, 3, 5, 1])  in [38]: m out[38]: [1, 2, 5]  in [39]: n out[39]: 2  in [40]: out out[40]: array([ true, true, false, true, true, false, true, false], dtype=bool) 

Comments