You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "Prasanth J (JIRA)" <ji...@apache.org> on 2013/12/03 20:57:35 UTC

[jira] [Commented] (HIVE-5921) Better heuristics for worst case statistics estimates for join, limit and filter operator

    [ https://issues.apache.org/jira/browse/HIVE-5921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13838099#comment-13838099 ] 

Prasanth J commented on HIVE-5921:
----------------------------------

FILTER rule is improved to evaluate each predicate expression. JOIN rule is improved to get hints from user in form of hive config. In absence of basic statistics (row count and data size), estimated row count/data size is computed from average row size which is computed from schema. Regenerated all affecting tests.

> Better heuristics for worst case statistics estimates for join, limit and filter operator
> -----------------------------------------------------------------------------------------
>
>                 Key: HIVE-5921
>                 URL: https://issues.apache.org/jira/browse/HIVE-5921
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Query Processor, Statistics
>    Affects Versions: 0.13.0
>            Reporter: Prasanth J
>            Assignee: Prasanth J
>             Fix For: 0.13.0
>
>         Attachments: HIVE-5921.1.patch
>
>
> This is a subtask of HIVE-5369. In worst case (i.e; absence of column statistics) HIVE-5849 improved the basic statistics with heuristics. But the heuristics failed to provide better estimates in few cases. For example: FILTER operator heuristics did not take into account the number of predicates and if the predicate contains partition column. Also, JOIN estimates were too aggressive and was not user configurable.



--
This message was sent by Atlassian JIRA
(v6.1#6144)