You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2020/04/14 16:53:00 UTC

[jira] [Commented] (IMPALA-9643) Local runtime filters can go missing when mt_dop > 1

    [ https://issues.apache.org/jira/browse/IMPALA-9643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17083427#comment-17083427 ] 

ASF subversion and git services commented on IMPALA-9643:
---------------------------------------------------------

Commit 76e4a17fb379bb232618dccb4ad3504dbe5c945c in impala's branch refs/heads/master from Tim Armstrong
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=76e4a17 ]

IMPALA-9643: fix runtime filter race for mt_dop

This patch avoids the race with registration of a
consumer filter by registering all filters upfront
when the filter bank is constructed. Then registration
of producers and consumers hands out references to the
pre-constructed filters.

A nice bonus of this change is that RegisterConsumer()
and RegisterProducer() don't mutate anything and
we can avoid lock acquisitions.

Also adds test infrastructure and fixes TestRuntimeRowFilters to
work with mt_dop=4 (it was accidentally not enabled before). That
mostly involved modifying the tests to use aggregates of counters
instead of picking out lines with regexes.

Testing:
Added a regression test that reliably failed before this
fix. This relies on extending debug actions to allow longer
delays, plus a minor extension to the RUNTIME_PROFILE .test
file parser to handle spaces in counter names.

Ran exhaustive tests.

Change-Id: I194c0d2515b6a0e5474e1c0c8647f0e54dc94397
Reviewed-on: http://gerrit.cloudera.org:8080/15715
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Local runtime filters can go missing when mt_dop > 1
> ----------------------------------------------------
>
>                 Key: IMPALA-9643
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9643
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 4.0
>            Reporter: Tim Armstrong
>            Assignee: Tim Armstrong
>            Priority: Blocker
>              Labels: multithreading, performance
>             Fix For: Impala 4.0
>
>         Attachments: profile_50467cb8e73eeac4_853461b400000000, tpcds-q77.sql
>
>
> On some TPC-DS queries with mt_dop > 0, LOCAL runtime filters go missing. I.e. the scan waits for RUNTIME_FILTER_WAIT_TIME_MS and they never show up. I can reproduce in my minicluster on tpcds_parquet [^tpcds-q77.sql] [^profile_50467cb8e73eeac4_853461b400000000]
> Interestingly, on this one run, one impalad received the filters fine and the others didn't get them. I set -vmodule=runtime-filter-bank=3 and it looks like it might be related to whether the consumer filter is registered before the producer.  Here are logs from the good and bad daemons.
> {noformat}
> tarmstrong@tarmstrong-Precision-7540:~/impala/impala$ grep 50467cb8e73eeac4 logs/cluster/impalad.INFO | grep filter-bank | grep 'filter 22'
> I0410 15:32:10.422222 29384 runtime-filter-bank.cc:124] 50467cb8e73eeac4:853461b400000022] registered consumer filter 22
> I0410 15:32:10.433528 29387 runtime-filter-bank.cc:129] 50467cb8e73eeac4:853461b400000023] re-registered consumer filter 22
> I0410 15:32:10.460548 29389 runtime-filter-bank.cc:129] 50467cb8e73eeac4:853461b400000024] re-registered consumer filter 22
> I0410 15:32:10.482293 29392 runtime-filter-bank.cc:129] 50467cb8e73eeac4:853461b400000025] re-registered consumer filter 22
> I0410 15:32:12.627218 29558 runtime-filter-bank.cc:186] 50467cb8e73eeac4:853461b4000000f3] Setting broadcast filter 22
> tarmstrong@tarmstrong-Precision-7540:~/impala/impala$ grep 50467cb8e73eeac4 logs/cluster/impalad_node1.INFO | grep filter-bank | grep 'filter 22'
> I0410 15:32:12.018474 29402 runtime-filter-bank.cc:186] 50467cb8e73eeac4:853461b4000000f2] Setting broadcast filter 22
> I0410 15:32:12.182348 29580 runtime-filter-bank.cc:124] 50467cb8e73eeac4:853461b40000001e] registered consumer filter 22
> I0410 15:32:12.212008 29581 runtime-filter-bank.cc:129] 50467cb8e73eeac4:853461b40000001f] re-registered consumer filter 22
> I0410 15:32:12.236542 29582 runtime-filter-bank.cc:129] 50467cb8e73eeac4:853461b400000020] re-registered consumer filter 22
> I0410 15:32:12.250748 29583 runtime-filter-bank.cc:129] 50467cb8e73eeac4:853461b400000021] re-registered consumer filter 22
>  {noformat}
> It looks like with mt_dop=0, this works because they are both registered in Prepare() of the same fragment. But with mt_dop>1, the fragments start up independently and the filter might be published before the consumer registers. This doesn't appear to be handled.
> Thanks to [~drorke]  for finding this.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org