You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@impala.apache.org by "Zoltán Borók-Nagy (Jira)" <ji...@apache.org> on 2022/09/19 11:18:00 UTC

[jira] [Resolved] (IMPALA-11539) Mitigate intra-node skew of HDFS scans with MT_DOP

     [ https://issues.apache.org/jira/browse/IMPALA-11539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zoltán Borók-Nagy resolved IMPALA-11539.
----------------------------------------
    Fix Version/s: Impala 4.2.0
       Resolution: Fixed

> Mitigate intra-node skew of HDFS scans with MT_DOP
> --------------------------------------------------
>
>                 Key: IMPALA-11539
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11539
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>            Reporter: Zoltán Borók-Nagy
>            Assignee: Zoltán Borók-Nagy
>            Priority: Major
>             Fix For: Impala 4.2.0
>
>
> Before IMPALA-9655 scan ranges were statically assigned to intra-node fragment instances based on Longest-Processing Time algorithm:
> https://github.com/apache/impala/blame/a7866a94578be6289bbac31686de4d9032ad9261/be/src/scheduling/scheduler.cc#L499-L501
> From IMPALA-9655 we use dynamic intra-node load balancing for HDFS scans. It means fragment instances have a shared queue of scan ranges and the fragment instances grab the next scan range to be read from this queue.
> IMPALA-9655 got rid of the LPT-algorithm which means  the scan ranges are in a random order in the queue. This can lead to a skew if there are large scan ranges at the end.
> We could mix the above two by using a priority queue for the scan ranges, so each fragment instance would grab the largest scan range in the queue. This could further mitigate intra-node skewing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)