You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Lars Volker (JIRA)" <ji...@apache.org> on 2018/04/10 22:48:00 UTC

[jira] [Created] (IMPALA-6834) Enforce consistent, pseudo-random replica order during local, non-random scheduling

Lars Volker created IMPALA-6834:
-----------------------------------

             Summary: Enforce consistent, pseudo-random replica order during local, non-random scheduling
                 Key: IMPALA-6834
                 URL: https://issues.apache.org/jira/browse/IMPALA-6834
             Project: IMPALA
          Issue Type: Improvement
          Components: Backend, Frontend
    Affects Versions: Impala 2.12.0
            Reporter: Lars Volker


When scheduling local, non-cached reads, the scheduler breaks ties between otherwise equivalent replicas by picking the first one in the list of candidates as reported by the frontend ([scheduler.h:375|https://github.com/apache/impala/blob/master/be/src/scheduling/scheduler.h#L375]). We do this to optimize the use of OS buffer caches. We also provide the random_replica query option to improve cases where the default behavior leads to CPU hotspots.

The frontend merely passes the replicas of a block to the backend in the order they are returned from HDFS. This can result in poor load distribution for scans with many small scan ranges,  depending on the order in which HDFS returns block locations.

To alleviate this we should add another level of pseudo-random shuffling. The main idea is to change the order in which replicas are picked, but to keep the order consistent across backends. 

For example, we can compute a hash for each replica over (replica address, THdfsFileSplit::file_name, THdfsFileSplit::offset) and use this as a key to find the min_element in the list of candidates [scheduler.cc:L772|https://github.com/apache/impala/blob/master/be/src/scheduling/scheduler.cc#L772]. Then we use this element instead of the first one.

We need to make sure that we run sufficiently comprehensive performance tests on this change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)