You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2023/05/03 22:53:00 UTC
[jira] [Commented] (IMPALA-10973) Empty scan nodes are scheduled to the (exclusive) coordinator
[ https://issues.apache.org/jira/browse/IMPALA-10973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17719075#comment-17719075 ]
ASF subversion and git services commented on IMPALA-10973:
----------------------------------------------------------
Commit 69da2ff86ec0914419e3b2d2ce755d1bb73c46aa in impala's branch refs/heads/master from Riza Suminto
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=69da2ff86 ]
IMPALA-12106: Fix overparallelization of Union fragment by 1
IMPALA-10973 has a bug where a union fragment without a scan node can be
over-parallelized by the backend scheduler by 1. It is reproducible by
running TPC-DS Q11 with MT_DOP=1. This patch additionally checks that
such a fragment does not have an input fragment before randomizing the
host assignment.
Testing:
Add TPC-DS Q11 to test_mt_dop.py::TestMtDopScheduling::test_scheduling
and verify the number of fragment instances scheduled in the
ExecSummary.
Change-Id: Ic69e7c8c0cadb4b07ee398aff362fbc6513eb08d
Reviewed-on: http://gerrit.cloudera.org:8080/19816
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
> Empty scan nodes are scheduled to the (exclusive) coordinator
> -------------------------------------------------------------
>
> Key: IMPALA-10973
> URL: https://issues.apache.org/jira/browse/IMPALA-10973
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Reporter: Csaba Ringhofer
> Assignee: Csaba Ringhofer
> Priority: Critical
> Labels: scalability, scheduler
> Fix For: Impala 4.1.0
>
>
> Currently fragments with scan nodes that have no scan ranges are scheduled to the coordinator, even if it is an exclusive coordinator:
> https://github.com/apache/impala/blob/master/be/src/scheduling/scheduler.cc#L805
> As "parent" fragments are often scheduled to be collocated with their children, the condition of "being scheduled to the coordinator" can spread through the plan tree.
> This can be disastrous to scalability in clusters with lot of executors but few coordinators and is also very counter-intuitive, as scanning an empty table shouldn't have a major effect on the query.
>
> To reproduce locally:
> bin/start-impala-cluster.py --use_exclusive_coordinators -c 1
> in Impala shell:
> select id from functional.alltypes;
> profile; -- scan nodes will be scheduled to 2 hosts
> select f2 from functional.emptytable union all select id from functional.alltypes;
> profile; -- scan nodes will be scheduled to 3 hosts
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org