i want load lot of data google datastore.
so, step 1: run query (using keysonly=true) , loop through cursors, each 1 pointing start of page of 600 objects. store cursors in local variable.
step 2: spin off 1 thread per cursor, loading , processing 600 objects in each thread.
it not usual way cursors used.
however, looks correct me. actual query strings in step 1 , step 2 identical. resembles usual stateless web use-case user may ask next, back, reload previous page; there no need cursor come directly result of previous cursor-query.
i don't want step through cursors sequentially , spin off threads in order parallelize processing of objects loaded in given cursor-query, because want parallelize actual io-intensive querying db.
i getting inconsistency in results seem involve missed pages , duplicate loading of objects. correct way multithread loading of large amounts of data google datastore? or if not, is?
i recommend different approach. run 1 query cycles through of entities. happens fast (don't forget set batch size 500, default 10). still may need use cursors, if query huge.
for every entity create task using task api , add task queue. these tasks can executed in parallel. can set parameters on queue.
with approach don't have worry threads, can set tasks automatically retry when fail, etc. find important part of app engine's appeal - write own logic, , let app engine worry execution part.
Comments
Post a Comment