You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2019/02/14 03:48:00 UTC

[jira] [Resolved] (IMPALA-8039) Incorrect selectivity estimate for not-equals predicate

     [ https://issues.apache.org/jira/browse/IMPALA-8039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paul Rogers resolved IMPALA-8039.
---------------------------------
    Resolution: Duplicate

> Incorrect selectivity estimate for not-equals predicate
> -------------------------------------------------------
>
>                 Key: IMPALA-8039
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8039
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 3.1.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Major
>
> Suppose we write a query that uses the not-equals predicate:
> {code:sql}
> select *
> from functional.alltypestiny
> where id != 10
> {code}
> How many rows will we get? Let's reason it out. Suppose we do this:
> {code:sql}
> select *
> from functional.alltypestiny
> where id = 10
> {code}
> We know that {{is}} is unique and the table has 8 rows. So, in the second query, we'll get only one row: where {{id = 10}}. Using this, we can see that the first query will return all the rows that the second one did not, that is {{8 - 1 = 7}}.
> Let's see what the planner says:
> {noformat}
> PLAN-ROOT SINK
> |  mem-estimate=0B mem-reservation=0B thread-reservation=0
> |
> 00:SCAN HDFS [functional.alltypestiny]
>    partitions=4/4 files=4 size=460B
>    predicates: id != CAST(10 AS INT)
>    tuple-ids=0 row-size=89B cardinality=1
> {noformat}
> So, the planner says that both equality and in-equality give the same number of rows. Clearly, this is wrong. It is, in fact, a symptom of the fact that Impala does not attempt to calculate selectivity for other than equality. (IMPALA-7601).
> The correct selectivity estimate for inequality is:
> {noformat}
> sel(c != x) = 1 - 1/ndv(c)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)