numpy - Big data image processing in Python -


i have 7gb .tgz file archive of thousands of high-res photos i'd work in python. able of following in case of single image, i'm not sure how work such large data , .tgz file format. have googled, perhaps i'm not using best search terms. explicit code helpful me understand.

how load .tgz data python? (pickle, numpy, tarfile? pip install tarfile fails.) want convert them numpy arrays.

how make of images set resolution?

how convert of images greyscale?

the goal manipulate data use in convolutional neural network (cnn).

i'm not sure if handling archive problem. it's quite obvious .tgz file should handled using tarfile. tarfile in inbuilt module in python , not need pip install it.

#!/usr/bin/env python  # import tarfile tarfile import tarfile  # open tarfile reading itgz = tarfile.gzopen( "photos.tgz", 'r' )  # open tarfile saving images otgz = tarfile.gzopen( "photos_edited.tgz", 'w' )  # handle images one-by-one img_name in itgz.getnames() :     # extract ever want     itgz.extract( img_name )      # image processing numpy, pil or tool of choice      # if want save edited images tar file     otgz.add( img_name )  else:     itgz.close()     otgz.close() 

Comments