You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Tim Armstrong (JIRA)" <ji...@apache.org> on 2017/07/24 18:01:00 UTC
[jira] [Created] (IMPALA-5705) Parallelise read I/O by prefetching
pages when iterating over unpinned BufferedTupleStream
Tim Armstrong created IMPALA-5705:
-------------------------------------
Summary: Parallelise read I/O by prefetching pages when iterating over unpinned BufferedTupleStream
Key: IMPALA-5705
URL: https://issues.apache.org/jira/browse/IMPALA-5705
Project: IMPALA
Issue Type: Sub-task
Components: Backend
Affects Versions: Impala 2.10.0
Reporter: Tim Armstrong
We could improve read I/O performance when iterating over unpinned streams in the hash join and hash aggregation by using additional memory to prefetch pages ahead of the current read position. Currently iterating over the unpinned stream only uses a single buffer, and only issues a read I/O when it has finished processing the previous page.
This slows down processing of spilled probe rows in the hash join and spilled unaggregated rows in the hash aggregation.
We'd need to figure out how to expose this in the BufferedTupleStream interface, but probably when preparing to read a stream, the client could specify a number of bytes to read ahead in the stream, which would require additional memory but increase performance.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)