You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Jesus Camacho Rodriguez (JIRA)" <ji...@apache.org> on 2016/06/15 16:23:09 UTC

[jira] [Updated] (HIVE-14018) Make IN clause row selectivity estimation customizable

     [ https://issues.apache.org/jira/browse/HIVE-14018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jesus Camacho Rodriguez updated HIVE-14018:
-------------------------------------------
    Status: Patch Available  (was: In Progress)

> Make IN clause row selectivity estimation customizable
> ------------------------------------------------------
>
>                 Key: HIVE-14018
>                 URL: https://issues.apache.org/jira/browse/HIVE-14018
>             Project: Hive
>          Issue Type: Improvement
>          Components: Statistics
>    Affects Versions: 2.1.0, 2.2.0
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Jesus Camacho Rodriguez
>            Priority: Minor
>         Attachments: HIVE-14018.patch
>
>
> After HIVE-13287 went in, we calculate IN clause estimates natively (instead of just dividing incoming number of rows by 2). However, as the distribution of values of the columns is considered uniform, we might end up heavily underestimating/overestimating the resulting number of rows.
> This issue is to add a factor that multiplies the IN clause estimation so we can alleviate this problem. The solution is not very elegant, but it is the best we can do until we have histograms to improve our estimate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)