You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hive.apache.org by "Chao Sun (JIRA)" <ji...@apache.org> on 2016/12/20 22:42:58 UTC

[jira] [Updated] (HIVE-15477) Provide options to adjust filter stats when column stats are not available

     [ https://issues.apache.org/jira/browse/HIVE-15477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chao Sun updated HIVE-15477:
----------------------------
    Status: Patch Available  (was: Open)

> Provide options to adjust filter stats when column stats are not available
> --------------------------------------------------------------------------
>
>                 Key: HIVE-15477
>                 URL: https://issues.apache.org/jira/browse/HIVE-15477
>             Project: Hive
>          Issue Type: Bug
>          Components: Statistics
>    Affects Versions: 2.2.0
>            Reporter: Chao Sun
>            Assignee: Chao Sun
>         Attachments: HIVE-15477.1.patch
>
>
> Currently when column stats are not available, Hive will assume the "worst" case by setting the # of output rows to be 1/2 of the # of input rows, for each predicate expression. This could be inaccurate, especially in the presence of multiple predicates chained by AND. We have found in some cases this could cause map join to have wrong ordering and thus fail with memory issue.
> One suggestion is to provide a config (such as {{hive.stats.filter.factor}}) that can be used to control the percentage of rows emitted by a predicate expression. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)