You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Alex Rodoni (JIRA)" <ji...@apache.org> on 2018/08/30 18:53:00 UTC

[jira] [Updated] (IMPALA-3885) Parquet files with multiple blocks cause remote reads

     [ https://issues.apache.org/jira/browse/IMPALA-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alex Rodoni updated IMPALA-3885:
--------------------------------
    Docs Text:   (was: Explain this.)

> Parquet files with multiple blocks cause remote reads
> -----------------------------------------------------
>
>                 Key: IMPALA-3885
>                 URL: https://issues.apache.org/jira/browse/IMPALA-3885
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 2.7.0
>            Reporter: Lars Volker
>            Assignee: Sailesh Mukil
>            Priority: Critical
>
> For parquet files with multiple blocks we schedule the scan ranges across all replicas of the blocks, aiming for local execution on impalads, which are running on the datanodes where the blocks are located. However there seems to be a high number of remote reads in these scenarios.
> The scheduler makes local assignments:
> {noformat}
> I0613 16:43:01.741288 36424 simple-scheduler.cc:605] Total remote scan volume = 0
> I0613 16:43:01.741441 36424 simple-scheduler.cc:607] Total local scan volume = 1426.95 GB
> I0613 16:43:01.741576 36424 simple-scheduler.cc:609] Total cached scan volume = 0
> {noformat}
> However the profile shows remote scans:
> {noformat}
>          - RemoteScanRanges: 283 (283)
>          - RowsRead: 139.36M (139355074)
>          - RowsReturned: 239 (239)
>          - RowsReturnedRate: 127.00 /sec
>          - ScanRangesComplete: 304 (304)
> {noformat}
> Somehow Impala seems to read data from the wrong datanodes, possibly reading everything from the one where it read the footer from.
> [~sailesh] - We briefly talked about this before in person. Do you have time to look at this? If not feel free to re-assign it to me.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org