python - optimal method to parse a json object in a datafile -


i trying setup simple data file format, , working these files in python analysis. format consists of header information, followed data. syntax , future extensibility reasons, want use json object header information. example file looks this:

{   "name": "my material",   "sample-id": null,   "description": "some material",   "funit": "mhz",   "filetype": "material_data" } 18  6.269311533 0.128658208 0.962033017 0.566268827 18.10945274 6.268810641 0.128691962 0.961950095 0.565591807 18.21890547 6.268312637 0.128725463 0.961814928 0.564998228... 

if data length/structure same, not hard parse. however, brought in mind question flexible way parse out json object, given unknown number of lines, , unknown number of nested curly braces, , potentially more 1 json object in file.

if there 1 json object in file, 1 can use regular expression:

with open(fname, 'r') fp:     fstring = fp.read()  json_string = re.search('{.*}', fstring, flags=re.s) 

however, if there more 1 json string, , want grab first one, need use this:

def grab_json(mystring):     lbracket = 0     rbracket = 0     lbracket_pos = 0     rbracket_pos = 0      in range(len(mystring)):         if mystring[i] == '{':             lbracket = 1             lbracket_pos =             break      in range(lbracket_pos+1, len(mystring)):         if mystring[i] == '}':             rbracket += 1             if rbracket == lbracket:                 rbracket_pos =                 break         elif mystring[i] == '{':             lbracket += 1      json_string = mystring[lbracket_pos : rbracket_pos + 1]     return json_string, lbracket_pos, rbracket_pos  json_string, beg_pos, end_pos = grab_json(fstring) 

i guess question always: there better way this? better meaning simpler code, more flexible code, more robust code, or anything?

the easiest solution, klaus suggested, use json entire file. makes life simpler because writing json.dump , reading json.load.

a second solution put metadata in separate file, keeps reading , writing simple @ expense of multiple files each data set.

a third solution be, when writing file disk, prepend length of json data. writing might like:

metadata_json = json.dumps(metadata) myfile.write('%d\n' % len(metadata_json)) myfile.write(metadata_json) myfile.write(data) 

then reading looks like:

with open('myfile') fd:   len = fd.readline()   metadata_json = fd.read(int(len))   metadata = json.loads(metadata)   data = fd.read() 

a fourth option adopt existing storage format (maybe hdf?) has features looking in terms of storing both data , metadata in same file.


Comments