i have download multiple files ftp link. download stops above error after 5 files irrespective of order. can suggest solution
import pandas pd import os import urllib import zipfile zipfilepath=['ftp://ftp.sec.gov/edgar/data/1000069/000089418911000620/0000894189-11-000620-xbrl.zip', 'ftp://ftp.sec.gov/edgar/data/1000180/000100018011000006/0001000180-11-000006-xbrl.zip', 'ftp://ftp.sec.gov/edgar/data/1000228/000100022811000014/0001000228-11-000014-xbrl.zip', 'ftp://ftp.sec.gov/edgar/data/1000229/000100022911000015/0001000229-11-000015-xbrl.zip', 'ftp://ftp.sec.gov/edgar/data/1000351/000089418911000615/0000894189-11-000615-xbrl.zip', 'ftp://ftp.sec.gov/edgar/data/1000351/000089418911000655/0000894189-11-000655-xbrl.zip', 'ftp://ftp.sec.gov/edgar/data/1000697/000095012311018381/0000950123-11-018381-xbrl.zip', 'ftp://ftp.sec.gov/edgar/data/1000753/000114036111008714/0001140361-11-008714-xbrl.zip', 'ftp://ftp.sec.gov/edgar/data/1001039/000119312511027450/0001193125-11-027450-xbrl.zip', 'ftp://ftp.sec.gov/edgar/data/1001082/000110465911009436/0001104659-11-009436-xbrl.zip', 'ftp://ftp.sec.gov/edgar/data/100122/000095012311020431/0000950123-11-020431-xbrl.zip', 'ftp://ftp.sec.gov/edgar/data/1001250/000110465911005139/0001104659-11-005139-xbrl.zip', 'ftp://ftp.sec.gov/edgar/data/1001288/000095012311019815/0000950123-11-019815-xbrl.zip', 'ftp://ftp.sec.gov/edgar/data/1001604/000100160411000022/0001001604-11-000022-xbrl.zip', 'ftp://ftp.sec.gov/edgar/data/1001838/000110465911011083/0001104659-11-011083-xbrl.zip', 'ftp://ftp.sec.gov/edgar/data/1002047/000119312511056223/0001193125-11-056223-xbrl.zip', 'ftp://ftp.sec.gov/edgar/data/1002517/000095012311011086/0000950123-11-011086-xbrl.zip', 'ftp://ftp.sec.gov/edgar/data/1002638/000119312511022882/0001193125-11-022882-xbrl.zip', 'ftp://ftp.sec.gov/edgar/data/1002718/000119312511040571/0001193125-11-040571-xbrl.zip', 'ftp://ftp.sec.gov/edgar/data/1002718/000119312511042365/0001193125-11-042365-xbrl.zip'] tempfolderpath = "<give path>" tempdownloadpath=os.path.join(tempfolderpath,"xbrl.zip") xbrlfinal=pd.dataframe() inds,paths in enumerate(zipfilepath): print "processing xmls " + str(inds+1) +" of " + str(len(zipfilepath)) urllib.urlretrieve(paths,tempdownloadpath) fh=open(tempdownloadpath,'rb') z=zipfile.zipfile(fh) files=z.extract(z.namelist()[0], tempfolderpath) z.close() fh.close()
i figured answer. download works fine in r site not imposing request problems. tried different packages in python, urllib, wget , requests did not work urllib2 worked. code given below:
response = urllib2.urlopen(paths) zipcontent= response.read() open(tempdownloadpath, 'wb') f: f.write(zipcontent)
and urllib2 5 times faster rest
Comments
Post a Comment