You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Riza Suminto (Jira)" <ji...@apache.org> on 2023/09/12 15:53:00 UTC

[jira] [Resolved] (IMPALA-12357) Skip scheduling runtime filter from PK-FK join with full build scan

     [ https://issues.apache.org/jira/browse/IMPALA-12357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Riza Suminto resolved IMPALA-12357.
-----------------------------------
    Fix Version/s: Impala 4.3.0
         Assignee: Riza Suminto
       Resolution: Fixed

> Skip scheduling runtime filter from PK-FK join with full build scan
> -------------------------------------------------------------------
>
>                 Key: IMPALA-12357
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12357
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>            Reporter: Riza Suminto
>            Assignee: Riza Suminto
>            Priority: Major
>              Labels: bloom-filter, runtime-filters
>             Fix For: Impala 4.3.0
>
>         Attachments: Screen Shot 2023-08-04 at 3.13.56 PM.png
>
>
> PK-FK inner join between a dimension table and a fact table is a common occurrence in a query. It is also often that such join does not involve any predicate filter in the dimension table. Thus, runtime filter values coming from this kind of dimension table scan (PK) is likely inclusive to all values of the fact table column (FK). It is ineffective to generate this filter because this filter is unlikely to reject any rows.
> Attached screenshot shows visualization of RF 50, 52, 60, and 62 targeting 49:SCAN from TPC-DS Q64. These runtime filters coming from full dimension table scan on PK-FK join. In theory, these filters should not reject any probe rows. The query profile, however, shows that these filters can still reject some probe rows with NULL values in their target column. Unfortunately, due to the low number of NULL vs non-NULL, all of those filters still ended up disabled by scanners because the 49:SCAN deemed them ineffective.
> We can skip generating runtime filters that match all these criteria:
>  # Build side is full table scan
>  # No runtime filter targeting the build scan
>  # There is a PK-FK constraint between the runtime filter origin column in the build side and the target column in the probe side.
> If PK-FK constraint is not declared in table schema, which happen most of the time, criteria 3 can be replaced by checking the runtime filter’s false positive probability (eliminate one with high false positive probability).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)