You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2023/05/03 22:53:00 UTC

[jira] [Commented] (IMPALA-10973) Empty scan nodes are scheduled to the (exclusive) coordinator

    [ https://issues.apache.org/jira/browse/IMPALA-10973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17719075#comment-17719075 ] 

ASF subversion and git services commented on IMPALA-10973:
----------------------------------------------------------

Commit 69da2ff86ec0914419e3b2d2ce755d1bb73c46aa in impala's branch refs/heads/master from Riza Suminto
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=69da2ff86 ]

IMPALA-12106: Fix overparallelization of Union fragment by 1

IMPALA-10973 has a bug where a union fragment without a scan node can be
over-parallelized by the backend scheduler by 1. It is reproducible by
running TPC-DS Q11 with MT_DOP=1. This patch additionally checks that
such a fragment does not have an input fragment before randomizing the
host assignment.

Testing:
Add TPC-DS Q11 to test_mt_dop.py::TestMtDopScheduling::test_scheduling
and verify the number of fragment instances scheduled in the
ExecSummary.

Change-Id: Ic69e7c8c0cadb4b07ee398aff362fbc6513eb08d
Reviewed-on: http://gerrit.cloudera.org:8080/19816
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Empty scan nodes are scheduled to the (exclusive) coordinator
> -------------------------------------------------------------
>
>                 Key: IMPALA-10973
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10973
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>            Reporter: Csaba Ringhofer
>            Assignee: Csaba Ringhofer
>            Priority: Critical
>              Labels: scalability, scheduler
>             Fix For: Impala 4.1.0
>
>
> Currently fragments with scan nodes that have no scan ranges are scheduled to the coordinator, even if it is an exclusive coordinator:
> https://github.com/apache/impala/blob/master/be/src/scheduling/scheduler.cc#L805
> As "parent" fragments are often scheduled to be collocated with their children, the condition of "being scheduled to the coordinator" can spread through the plan tree.
> This can be disastrous to scalability in clusters with lot of executors but few coordinators and is also very counter-intuitive, as scanning an empty table shouldn't have a major effect on the query. 
>  
> To reproduce locally:
> bin/start-impala-cluster.py --use_exclusive_coordinators -c 1
> in Impala shell:
> select id from functional.alltypes;
> profile; -- scan nodes will be scheduled to 2 hosts
> select f2 from functional.emptytable union all select id from functional.alltypes;
> profile; --  scan nodes will be scheduled to 3 hosts



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org