You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues-all@impala.apache.org by "Tim Armstrong (Jira)" <ji...@apache.org> on 2020/03/11 18:50:00 UTC

[jira] [Commented] (IMPALA-8964) Increase runtime filter wait timeout for mt_dop

    [ https://issues.apache.org/jira/browse/IMPALA-8964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17057311#comment-17057311 ] 

Tim Armstrong commented on IMPALA-8964:
---------------------------------------

Problematic places:
* Coordinator::BackendState::PublishFilter()
* BackendState::PublishFilterCompleteCb()
* RuntimeFilterBank::UpdateFilterFromLocal()
* RuntimeFilterBank::UpdateFilterCompleteCb()

In all of the above the RPC error is logged and we don't cancel the query.

I think the GetProxy() failures could just be fatal to the query - that's what KrpcDataStreamSender does.  It's a bit more problematic that the failed RPCs don't get retried.  DataStreamSender has HandleFailedRPC to handle this case.

It's prob best just to bump it to 5-10s in light of that. That also potentially allows more parallelism if there's a serial bottleneck in the plan somewhere (e.g. because of skew or large input files).

> Increase runtime filter wait timeout for mt_dop
> -----------------------------------------------
>
>                 Key: IMPALA-8964
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8964
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Distributed Exec
>            Reporter: Tim Armstrong
>            Priority: Major
>              Labels: multithreading
>
> When we enable joins for multithreaded plans, we should adjust the runtime filter wait time. 
> A large part of the motivation for the timeout was to allow parallelism between the different sides of the join - there was some concern that having a scan block indefinitely would effectively reduce the amount of parallelism that the plan executed with.
> With multithreading, we want to get parallelism across multiple copies of the same fragment, rather than parallelism across different fragments. So this motivation no longer applies. Making the filter wait time unlimited would make query execution more predictable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org