You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2021/09/27 09:52:00 UTC

[jira] [Commented] (IMPALA-7560) Better selectivity estimate for != (not equals) binary predicate

    [ https://issues.apache.org/jira/browse/IMPALA-7560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17420630#comment-17420630 ] 

ASF subversion and git services commented on IMPALA-7560:
---------------------------------------------------------

Commit 8862719d87ac5dc214985025463f002d41b15672 in impala's branch refs/heads/branch-4.0.1 from liuyao
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=8862719 ]

IMPALA-7560: Set selectivity of Not-equal

Calculate binary predicate selectivity if one of the children is
a slotref and the other children are all constant.
eg. something like "col != 5", but not "2 * col != 10"

selectivity = 1 - 1/ndv

Testing:
Modify the function testNeSelectivity() of the
ExprCardinalityTest.java, change -1 to the correct value.

Change-Id: Icd6f5945840ea2a8194d72aa440ddfa6915cbb3a
Reviewed-on: http://gerrit.cloudera.org:8080/17344
Reviewed-by: Qifan Chen <qc...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-by: Zoltan Borok-Nagy <bo...@cloudera.com>


> Better selectivity estimate for != (not equals) binary predicate
> ----------------------------------------------------------------
>
>                 Key: IMPALA-7560
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7560
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 2.8.0, Impala 2.9.0, Impala 2.10.0, Impala 2.12.0, Impala 2.13.0
>            Reporter: Bharath Vissapragada
>            Assignee: liuyao
>            Priority: Major
>             Fix For: Impala 4.1.0
>
>
> Currently we use the default selectivity estimate for any binary predicate with op other than EQ / NON_DISTINCT.
> {noformat}
> // Determine selectivity
>     // TODO: Compute selectivity for nested predicates.
>     // TODO: Improve estimation using histograms.
>     Reference<SlotRef> slotRefRef = new Reference<SlotRef>();
>     if ((op_ == Operator.EQ || op_ == Operator.NOT_DISTINCT)
>         && isSingleColumnPredicate(slotRefRef, null)) {
>       long distinctValues = slotRefRef.getRef().getNumDistinctValues();
>       if (distinctValues > 0) {
>         selectivity_ = 1.0 / distinctValues;
>         selectivity_ = Math.max(0, Math.min(1, selectivity_));
>       }
>     }
> {noformat}
> This can give very conservative estimates. For example:
> {noformat}
> [localhost:21000] tpch> select * from nation where n_regionkey != 1;
> [localhost:21000] tpch> summary;
> +--------------+--------+----------+----------+-------+------------+-----------+---------------+-------------+
> | Operator     | #Hosts | Avg Time | Max Time | *#Rows* | *Est. #Rows* | Peak Mem  | Est. Peak Mem | Detail      |
> +--------------+--------+----------+----------+-------+------------+-----------+---------------+-------------+
> | 00:SCAN HDFS | 1      | 3.32ms   | 3.32ms   | *20*    | *3*          | 143.00 KB | 16.00 MB      | tpch.nation |
> +--------------+--------+----------+----------+-------+------------+-----------+---------------+-------------+
> [localhost:21000] tpch> 
> {noformat}
> Ideally we could've inversed the selecitivity  to 4/5 (=1 - 1/5) that can give better estimate.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org