You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Sahil Takiar (Jira)" <ji...@apache.org> on 2019/11/20 18:09:00 UTC

[jira] [Resolved] (IMPALA-8525) preads should use hdfsPreadFully rather than hdfsPread

     [ https://issues.apache.org/jira/browse/IMPALA-8525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sahil Takiar resolved IMPALA-8525.
----------------------------------
    Fix Version/s: Impala 3.4.0
       Resolution: Fixed

Done. Did some additional benchmarking to confirm the expect perf improvement. For the following query {{select * from tpcds_parquet.inventory order by inv_quantity_on_hand limit 10}} on a 10 TB TPC-DS dataset on S3, this change improves performance by 70% (12.06s to 5.84s).

The expected performance improvement is dependent on the workload, but is generally a function of the Parquet file size and the amount of sequential data scanned from the file. If the Parquet files are small (e.g. less than the chunk size (128K) then this change doesn't make a big difference). For larger Parquet files, especially for large scan ranges, this change makes a significant difference.

> preads should use hdfsPreadFully rather than hdfsPread
> ------------------------------------------------------
>
>                 Key: IMPALA-8525
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8525
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>            Priority: Major
>             Fix For: Impala 3.4.0
>
>
> Impala preads (only enabled if {{use_hdfs_pread}} is true) use the {{hdfsPread}} API from libhdfs, which ultimately invokes {{PositionedReadable#read(long position, byte[] buffer, int offset, int length)}} in the HDFS-client.
> {{PositionedReadable}} also exposes the method {{readFully(long position, byte[] buffer, int offset, int length)}}. The difference is that {{#read}} will "Read up to the specified number of bytes" whereas {{#readFully}} will "Read the specified number of bytes". So there is no guarantee that {{#read}} will read *all* of the request bytes.
> Impala calls {{hdfsPread}} inside {{hdfs-file-reader.cc}} and invokes it inside a while loop until all the requested bytes have been read from the file. This can cause a few performance issues:
> (1) if the underlying {{FileSystem}} does not support ByteBuffer reads (HDFS-2834) (e.g. S3A does not support this feature) then {{hdfsPread}} will allocate a Java array equal in size to specified length of the buffer; the call to {{PositionedReadable#read}} may only fill up the buffer partially; Impala will repeat the call to {{hdfsPread}} since the buffer was not filled, which will cause another large array allocation; this can result in a lot of wasted time doing unnecessary array allocations
> (2) given that Impala calls {{hdfsPread}} in a while loop, there is no point in continuously calling {{hdfsPread}} when a single call to {{hdfsPreadFully}} will achieve the same thing (this doesn't actually affect performance much, but is unnecessary)
> Prior solutions to this problem have been to introduce a "chunk-size" to Impala reads (https://gerrit.cloudera.org/#/c/63/ - S3: DiskIoMgr related changes for S3). However, with the migration to {{hdfsPreadFully}} the chunk-size is no longer necessary.
> Furthermore, preads are most effective when the data is read all at once (e.g. in 8 MB chunks as specified by {{read_size}}) rather than in smaller chunks (typically 128K). For example, {{DFSInputStream#read(long position, byte[] buffer, int offset, int length)}} opens up remote block readers with a byte range determined by the value of {{length}} passed into the {{#read}} call. Similarly, {{S3AInputStream#readFully}} will issue an HTTP GET request with the size of the read specified by the given {{length}} (although fadvise must be set to RANDOM for this to work).
> This work is dependent on exposing {{readFully}} via libhdfs first: HDFS-14564



--
This message was sent by Atlassian Jira
(v8.3.4#803005)