You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Sahil Takiar (Jira)" <ji...@apache.org> on 2019/09/09 18:31:00 UTC
[jira] [Updated] (IMPALA-8525) preads should use hdfsPreadFully
rather than hdfsPread
[ https://issues.apache.org/jira/browse/IMPALA-8525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sahil Takiar updated IMPALA-8525:
---------------------------------
Description:
Impala preads (only enabled if {{use_hdfs_pread}} is true) use the {{hdfsPread}} API from libhdfs, which ultimately invokes {{PositionedReadable#read(long position, byte[] buffer, int offset, int length)}} in the HDFS-client.
{{PositionedReadable}} also exposes the method {{readFully(long position, byte[] buffer, int offset, int length)}}. The difference is that {{#read}} will "Read up to the specified number of bytes" whereas {{#readFully}} will "Read the specified number of bytes". So there is no guarantee that {{#read}} will read *all* of the request bytes.
Impala calls {{hdfsPread}} inside {{hdfs-file-reader.cc}} and invokes it inside a while loop until all the requested bytes have been read from the file. This can cause a few performance issues:
(1) if the underlying {{FileSystem}} does not support ByteBuffer reads (HDFS-2834) (e.g. S3A does not support this feature) then {{hdfsPread}} will allocate a Java array equal in size to specified length of the buffer; the call to {{PositionedReadable#read}} may only fill up the buffer partially; Impala will repeat the call to {{hdfsPread}} since the buffer was not filled, which will cause another large array allocation; this can result in a lot of wasted time doing unnecessary array allocations
(2) given that Impala calls {{hdfsPread}} in a while loop, there is no point in continuously calling {{hdfsPread}} when a single call to {{hdfsPreadFully}} will achieve the same thing (this doesn't actually affect performance much, but is unnecessary)
Prior solutions to this problem have been to introduce a "chunk-size" to Impala reads (https://gerrit.cloudera.org/#/c/63/ - S3: DiskIoMgr related changes for S3). However, with the migration to {{hdfsPreadFully}} the chunk-size is no longer necessary.
Furthermore, preads are most effective when the data is read all at once (e.g. in 8 MB chunks as specified by {{read_size}}) rather than in smaller chunks (typically 128K). For example, {{DFSInputStream#read(long position, byte[] buffer, int offset, int length)}} opens up remote block readers with a byte range determined by the value of {{length}} passed into the {{#read}} call. Similarly, {{S3AInputStream#readFully}} will issue an HTTP GET request with the size of the read specified by the given {{length}} (although fadvise must be set to RANDOM for this to work).
This work is dependent on exposing {{readFully}} via libhdfs first: HDFS-14564
was:
Impala preads (only enabled if {{use_hdfs_pread}} is true) use the {{hdfsPread}} API from libhdfs, which ultimately invokes {{PositionedReadable#read(long position, byte[] buffer, int offset, int length)}} in the HDFS-client.
{{PositionedReadable}} also exposes the method {{readFully(long position, byte[] buffer, int offset, int length)}}. The difference is that {{#read}} will "Read up to the specified number of bytes" whereas {{#readFully}} will "Read the specified number of bytes". So there is no guarantee that {{#read}} will read *all* of the request bytes.
Impala calls {{hdfsPread}} inside {{hdfs-file-reader.cc}} and invokes it inside a while loop until all the requested bytes have been read from the file. This can cause a few performance issues:
(1) if the underlying {{FileSystem}} does not support ByteBuffer reads (HDFS-2834) (e.g. S3A does not support this feature) then {{hdfsPread}} will allocate a Java array equal in size to specified length of the buffer; the call to {{PositionedReadable#read}} may only fill up the buffer partially; Impala will repeat the call to {{hdfsPread}} since the buffer was not filled, which will cause another large array allocation; this can result in a lot of wasted time doing unnecessary array allocations
(2) given that Impala calls {{hdfsPread}} in a while loop, there is no point in continuously calling {{hdfsPread}} when a single call to {{hdfsPreadFully}} will achieve the same thing (this doesn't actually affect performance much, but is unnecessary)
Prior solutions to this problem have been to introduce a "chunk-size" to Impala reads (https://gerrit.cloudera.org/#/c/63/ - S3: DiskIoMgr related changes for S3). However, with the migration to {{hdfsPreadFully}} the chunk-size is no longer necessary.
Furthermore, preads are most effective when the data is read all at once (e.g. in 8 MB chunks as specified by {{read_size}}) rather than in smaller chunks (typically 128K). For example, {{DFSInputStream#read(long position, byte[] buffer, int offset, int length)}} opens up remote block readers with a byte range determined by the value of {{length}} passed into the {{#read}} call. Similarly, {{S3AInputStream#readFully}} will issue an HTTP GET request with the size of the read specified by the given {{length}} (although fadvise must be set to RANDOM for this to work).
This work is dependent on exposing {{readFully}} via libhdfs first: HDFS-14478
> preads should use hdfsPreadFully rather than hdfsPread
> ------------------------------------------------------
>
> Key: IMPALA-8525
> URL: https://issues.apache.org/jira/browse/IMPALA-8525
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend
> Reporter: Sahil Takiar
> Assignee: Sahil Takiar
> Priority: Major
>
> Impala preads (only enabled if {{use_hdfs_pread}} is true) use the {{hdfsPread}} API from libhdfs, which ultimately invokes {{PositionedReadable#read(long position, byte[] buffer, int offset, int length)}} in the HDFS-client.
> {{PositionedReadable}} also exposes the method {{readFully(long position, byte[] buffer, int offset, int length)}}. The difference is that {{#read}} will "Read up to the specified number of bytes" whereas {{#readFully}} will "Read the specified number of bytes". So there is no guarantee that {{#read}} will read *all* of the request bytes.
> Impala calls {{hdfsPread}} inside {{hdfs-file-reader.cc}} and invokes it inside a while loop until all the requested bytes have been read from the file. This can cause a few performance issues:
> (1) if the underlying {{FileSystem}} does not support ByteBuffer reads (HDFS-2834) (e.g. S3A does not support this feature) then {{hdfsPread}} will allocate a Java array equal in size to specified length of the buffer; the call to {{PositionedReadable#read}} may only fill up the buffer partially; Impala will repeat the call to {{hdfsPread}} since the buffer was not filled, which will cause another large array allocation; this can result in a lot of wasted time doing unnecessary array allocations
> (2) given that Impala calls {{hdfsPread}} in a while loop, there is no point in continuously calling {{hdfsPread}} when a single call to {{hdfsPreadFully}} will achieve the same thing (this doesn't actually affect performance much, but is unnecessary)
> Prior solutions to this problem have been to introduce a "chunk-size" to Impala reads (https://gerrit.cloudera.org/#/c/63/ - S3: DiskIoMgr related changes for S3). However, with the migration to {{hdfsPreadFully}} the chunk-size is no longer necessary.
> Furthermore, preads are most effective when the data is read all at once (e.g. in 8 MB chunks as specified by {{read_size}}) rather than in smaller chunks (typically 128K). For example, {{DFSInputStream#read(long position, byte[] buffer, int offset, int length)}} opens up remote block readers with a byte range determined by the value of {{length}} passed into the {{#read}} call. Similarly, {{S3AInputStream#readFully}} will issue an HTTP GET request with the size of the read specified by the given {{length}} (although fadvise must be set to RANDOM for this to work).
> This work is dependent on exposing {{readFully}} via libhdfs first: HDFS-14564
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org