You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Sahil Takiar (Jira)" <ji...@apache.org> on 2019/09/09 18:31:00 UTC
[jira] [Updated] (IMPALA-8525) preads should use hdfsPreadFully rather than hdfsPread

     [ https://issues.apache.org/jira/browse/IMPALA-8525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sahil Takiar updated IMPALA-8525:
---------------------------------
    Description: 
Impala preads (only enabled if {{use_hdfs_pread}} is true) use the {{hdfsPread}} API from libhdfs, which ultimately invokes {{PositionedReadable#read(long position, byte[] buffer, int offset, int length)}} in the HDFS-client.

{{PositionedReadable}} also exposes the method {{readFully(long position, byte[] buffer, int offset, int length)}}. The difference is that {{#read}} will "Read up to the specified number of bytes" whereas {{#readFully}} will "Read the specified number of bytes". So there is no guarantee that {{#read}} will read *all* of the request bytes.

Impala calls {{hdfsPread}} inside {{hdfs-file-reader.cc}} and invokes it inside a while loop until all the requested bytes have been read from the file. This can cause a few performance issues:

(1) if the underlying {{FileSystem}} does not support ByteBuffer reads (HDFS-2834) (e.g. S3A does not support this feature) then {{hdfsPread}} will allocate a Java array equal in size to specified length of the buffer; the call to {{PositionedReadable#read}} may only fill up the buffer partially; Impala will repeat the call to {{hdfsPread}} since the buffer was not filled, which will cause another large array allocation; this can result in a lot of wasted time doing unnecessary array allocations

(2) given that Impala calls {{hdfsPread}} in a while loop, there is no point in continuously calling {{hdfsPread}} when a single call to {{hdfsPreadFully}} will achieve the same thing (this doesn't actually affect performance much, but is unnecessary)

Prior solutions to this problem have been to introduce a "chunk-size" to Impala reads (https://gerrit.cloudera.org/#/c/63/ - S3: DiskIoMgr related changes for S3). However, with the migration to {{hdfsPreadFully}} the chunk-size is no longer necessary.

Furthermore, preads are most effective when the data is read all at once (e.g. in 8 MB chunks as specified by {{read_size}}) rather than in smaller chunks (typically 128K). For example, {{DFSInputStream#read(long position, byte[] buffer, int offset, int length)}} opens up remote block readers with a byte range determined by the value of {{length}} passed into the {{#read}} call. Similarly, {{S3AInputStream#readFully}} will issue an HTTP GET request with the size of the read specified by the given {{length}} (although fadvise must be set to RANDOM for this to work).

This work is dependent on exposing {{readFully}} via libhdfs first: HDFS-14564

  was:
Impala preads (only enabled if {{use_hdfs_pread}} is true) use the {{hdfsPread}} API from libhdfs, which ultimately invokes {{PositionedReadable#read(long position, byte[] buffer, int offset, int length)}} in the HDFS-client.

{{PositionedReadable}} also exposes the method {{readFully(long position, byte[] buffer, int offset, int length)}}. The difference is that {{#read}} will "Read up to the specified number of bytes" whereas {{#readFully}} will "Read the specified number of bytes". So there is no guarantee that {{#read}} will read *all* of the request bytes.

Impala calls {{hdfsPread}} inside {{hdfs-file-reader.cc}} and invokes it inside a while loop until all the requested bytes have been read from the file. This can cause a few performance issues:

(1) if the underlying {{FileSystem}} does not support ByteBuffer reads (HDFS-2834) (e.g. S3A does not support this feature) then {{hdfsPread}} will allocate a Java array equal in size to specified length of the buffer; the call to {{PositionedReadable#read}} may only fill up the buffer partially; Impala will repeat the call to {{hdfsPread}} since the buffer was not filled, which will cause another large array allocation; this can result in a lot of wasted time doing unnecessary array allocations

(2) given that Impala calls {{hdfsPread}} in a while loop, there is no point in continuously calling {{hdfsPread}} when a single call to {{hdfsPreadFully}} will achieve the same thing (this doesn't actually affect performance much, but is unnecessary)

Prior solutions to this problem have been to introduce a "chunk-size" to Impala reads (https://gerrit.cloudera.org/#/c/63/ - S3: DiskIoMgr related changes for S3). However, with the migration to {{hdfsPreadFully}} the chunk-size is no longer necessary.

Furthermore, preads are most effective when the data is read all at once (e.g. in 8 MB chunks as specified by {{read_size}}) rather than in smaller chunks (typically 128K). For example, {{DFSInputStream#read(long position, byte[] buffer, int offset, int length)}} opens up remote block readers with a byte range determined by the value of {{length}} passed into the {{#read}} call. Similarly, {{S3AInputStream#readFully}} will issue an HTTP GET request with the size of the read specified by the given {{length}} (although fadvise must be set to RANDOM for this to work).

This work is dependent on exposing {{readFully}} via libhdfs first: HDFS-14478


> preads should use hdfsPreadFully rather than hdfsPread
> ------------------------------------------------------
>
>                 Key: IMPALA-8525
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8525
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>            Priority: Major
>
> Impala preads (only enabled if {{use_hdfs_pread}} is true) use the {{hdfsPread}} API from libhdfs, which ultimately invokes {{PositionedReadable#read(long position, byte[] buffer, int offset, int length)}} in the HDFS-client.
> {{PositionedReadable}} also exposes the method {{readFully(long position, byte[] buffer, int offset, int length)}}. The difference is that {{#read}} will "Read up to the specified number of bytes" whereas {{#readFully}} will "Read the specified number of bytes". So there is no guarantee that {{#read}} will read *all* of the request bytes.
> Impala calls {{hdfsPread}} inside {{hdfs-file-reader.cc}} and invokes it inside a while loop until all the requested bytes have been read from the file. This can cause a few performance issues:
> (1) if the underlying {{FileSystem}} does not support ByteBuffer reads (HDFS-2834) (e.g. S3A does not support this feature) then {{hdfsPread}} will allocate a Java array equal in size to specified length of the buffer; the call to {{PositionedReadable#read}} may only fill up the buffer partially; Impala will repeat the call to {{hdfsPread}} since the buffer was not filled, which will cause another large array allocation; this can result in a lot of wasted time doing unnecessary array allocations
> (2) given that Impala calls {{hdfsPread}} in a while loop, there is no point in continuously calling {{hdfsPread}} when a single call to {{hdfsPreadFully}} will achieve the same thing (this doesn't actually affect performance much, but is unnecessary)
> Prior solutions to this problem have been to introduce a "chunk-size" to Impala reads (https://gerrit.cloudera.org/#/c/63/ - S3: DiskIoMgr related changes for S3). However, with the migration to {{hdfsPreadFully}} the chunk-size is no longer necessary.
> Furthermore, preads are most effective when the data is read all at once (e.g. in 8 MB chunks as specified by {{read_size}}) rather than in smaller chunks (typically 128K). For example, {{DFSInputStream#read(long position, byte[] buffer, int offset, int length)}} opens up remote block readers with a byte range determined by the value of {{length}} passed into the {{#read}} call. Similarly, {{S3AInputStream#readFully}} will issue an HTTP GET request with the size of the read specified by the given {{length}} (although fadvise must be set to RANDOM for this to work).
> This work is dependent on exposing {{readFully}} via libhdfs first: HDFS-14564



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org