You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Tim Armstrong (Jira)" <ji...@apache.org> on 2020/04/09 20:54:00 UTC

[jira] [Created] (IMPALA-9637) Scan range load-balancing within backend

Tim Armstrong created IMPALA-9637:
-------------------------------------

             Summary: Scan range load-balancing within backend
                 Key: IMPALA-9637
                 URL: https://issues.apache.org/jira/browse/IMPALA-9637
             Project: IMPALA
          Issue Type: Improvement
          Components: Distributed Exec
    Affects Versions: Impala 4.0
            Reporter: Tim Armstrong


Currently the scheduler statically divides scan ranges between fragment instances, Since IMPALA-9015 it statically load-balances scan ranges based on file size using the LPT algorithm in the schedule.

This has various pitfalls:
 * It interacts badly with dynamic partition pruning, which can filter
 * Different files that have the same byte size may involve different amounts of work to process for any number of reasons.

Those can cause both inter-node load balance problems and intra-node load balance problems. This Jira is about fixing the intra-node load balance problem, so that the situation is no worse than before mt_dop.

The proposed solution is to have a queue of scan ranges per backend, sorted from largest to smallest, and have each instance pull scan ranges off that queue. The DiskIOMgr ReaderContext probably is already sufficient to solve this problem, and we'll need to add a different mechanism for Kudu, Hbase, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org