You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2021/08/23 12:46:01 UTC

[jira] [Commented] (IMPALA-3430) Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

    [ https://issues.apache.org/jira/browse/IMPALA-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17403164#comment-17403164 ] 

ASF subversion and git services commented on IMPALA-3430:
---------------------------------------------------------

Commit cd902d8c22d22ea8ebdb85e31ddc143ee0d0bf69 in impala's branch refs/heads/master from Qifan Chen
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=cd902d8 ]

IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

This patch enables min/max filtering for non-correlated subqueries
that return one value. In this case, the filters are built from the
results of the subqueries and the filtering target is the scan node to
be qualified by one of the subqueries. Shown below is one such query
that normally gets compiled into a nested loop join. The filtering
limits the values from column store_sales.ss_sales_price to be within
[-infinite, min(ss_wholesale_cost)].

  select count(*) from store_sales
    where ss_sales_price <=
      (select min(ss_wholesale_cost) from store_sales);

In FE, the fact that the above scalar subquery exists is recorded
in a flag in InlineViewRef in analyzer and later on transferred to
AggregationNode in planner.

In BE, the min/max filtering infrastructure is integrated with the
nested loop join as follows.

 1. NljBuilderConfig is populated with filter descriptors from nested
    join plan node via NljBuilder::CreateEmbeddedBuilder() (similar
    to hash join), or in NljBuilderConfig::Init() when the sink config
    is created (for separate builder case);
 2. NljBuilder is populated with filter contexts utilizing the filter
    descriptors in NljBuilderConfig. Filter contexts are the interface
    to actual min/max filters;
 3. New insertion methods InsertFor<op>(), where <op> is LE, LT, GE and
    GT, are added to MinMaxFilter class hierarcy. They are used for
    join predicate target <op> src_expr;
 4. RuntimeContext::InsertPerCompareOp() calls one of the new
    insertion methods above based on the comparison op saved in the
    filter descriptor;
 5. NljBuilder::InsertRuntimeFilters() calls the new methods.

By default, the feature is turned on only for sorted or partitioned
join columns.

Testing:
 1. Add single range insertion tests in min-max-filter-test.cc;
 2. Add positive and negative plan tests in
    overlap_min_max_filters.test;
 3. Add tests in overlap_min_max_filters_on_partition_columns.test;
 4. Add tests in overlap_min_max_filters_on_sorted_columns.test;
 5. Run core tests.

TODO in follow-up patches:
 1. Extend min/max filter for inequality subquery for other use cases
    (IMPALA-10869).

Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Reviewed-on: http://gerrit.cloudera.org:8080/17706
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
> -------------------------------------------------------------------------------
>
>                 Key: IMPALA-3430
>                 URL: https://issues.apache.org/jira/browse/IMPALA-3430
>             Project: IMPALA
>          Issue Type: New Feature
>          Components: Backend
>    Affects Versions: Impala 2.6.0
>            Reporter: Mostafa Mokhtar
>            Assignee: Qifan Chen
>            Priority: Minor
>              Labels: performance, runtime-filters
>
> Annotating Runtime filters with Min/Max values can help with
> * Inequality joins 
> * Pushing more efficient filters to the scan
> * Used to skip reading Parquet blocks reducing IO.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org