i have number of jobs execute. each job consists of buffer write, kernel execution , buffer read , operations must of course executed in order. various jobs indipendent , can therefore executed concurrently.
is there performance difference between using multiple in-order command queues (like 1 cuda streams) , single out-of-order one, equivalent synchronization? better?
some implementations don't support out-of-order command queues.
based on description i'd use multiple out-of-order queues. using single out-of-order queue required events synchronize within virtual queue, work you.
Comments
Post a Comment