i new python, , need advice on following. have file several fields, example below
# duplicates name1 14019 3 0.5564 0.0929 0.6494 name1 14022 0 0.5557 0.0990 0.6547 name1 14016 0 0.5511 0.0984 0.6495 name2 11 8 0.5119 0.0938 0.6057 name2 12 18 0.5331 0.0876 0.6206 name3 16 20 0.5172 0.0875 0.6047 name3 17 29 0.5441 0.0657 0.6098 # without duplicates name1 14022 0 0.5557 0.0990 0.6547 name2 12 18 0.5331 0.0876 0.6206 name3 17 29 0.5441 0.0657 0.6098
first name, other fields numerals (from prediction). there duplicates prediction have same name different predictions. task remove duplicates based on comparison of last field. line maximum in last column should taken.
i stack on step of comparison of last fields duplicate entries. should go lambda or direct filtering possible? lists correct use or it's possible on flow while reading row row file?
you appreciated!
import csv fi = open("filein.txt", "rb") fo = open("fileout.txt", "wb") reader = csv.reader(fi,delimiter=' ') writer = csv.writer(fo,delimiter=' ') names = set() datum = [] datum2 = [] row in reader: if row[0] not in names: names.add(row[0]) row_new1 = [row[0],row[3],row[4],row[5]] datum.append(row_new) writer1.writerow(row_new1) else: row_new2 = [row[0],row[3],row[4],row[5]] datum2.append(row_new2) writer2.writerow(row_new2)
the code below may of use, did using dictionary:
import csv fi = open("filein.txt", "rb") reader = csv.reader(fi,delimiter=' ') dict = {} row in reader: if row[0] in dict: if float(dict[row[0]][-1]) < float(row[-1]): dict[row[0]] = row[1:] else: dict[row[0]] = row[1:] print dict
this outputs:
{'name2': ['12', '18', '0.5331', '0.0876', '0.6206'], 'name3': ['17', '29', '0.5441', '0.0657', '0.6098'], 'name1': ['14022', '0', '0.5557', '0.0990', '0.6547']}
Comments
Post a Comment