You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@impala.apache.org by "Tim Armstrong (JIRA)" <ji...@apache.org> on 2017/07/24 18:01:00 UTC

[jira] [Created] (IMPALA-5705) Parallelise read I/O by prefetching pages when iterating over unpinned BufferedTupleStream

Tim Armstrong created IMPALA-5705:
-------------------------------------

             Summary: Parallelise read I/O by prefetching pages when iterating over unpinned BufferedTupleStream
                 Key: IMPALA-5705
                 URL: https://issues.apache.org/jira/browse/IMPALA-5705
             Project: IMPALA
          Issue Type: Sub-task
          Components: Backend
    Affects Versions: Impala 2.10.0
            Reporter: Tim Armstrong


We could improve read I/O performance when iterating over unpinned streams in the hash join and hash aggregation by using additional memory to prefetch pages ahead of the current read position. Currently iterating over the unpinned stream only uses a single buffer, and only issues a read I/O when it has finished processing the previous page.

This slows down processing of spilled probe rows in the hash join and spilled unaggregated rows in the hash aggregation.

We'd need to figure out how to expose this in the BufferedTupleStream interface, but probably when preparing to read a stream, the client could specify a number of bytes to read ahead in the stream, which would require additional memory but increase performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)