You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Mostafa Mokhtar (JIRA)" <ji...@apache.org> on 2015/04/08 18:03:12 UTC

[jira] [Commented] (HIVE-10261) Data size can be underestimated when computed with partial column stats

    [ https://issues.apache.org/jira/browse/HIVE-10261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14485439#comment-14485439 ] 

Mostafa Mokhtar commented on HIVE-10261:
----------------------------------------

[~lirui]

Can you please attach an explain plan along with query and actual number of rows for the operator with underestimation?

> Data size can be underestimated when computed with partial column stats
> -----------------------------------------------------------------------
>
>                 Key: HIVE-10261
>                 URL: https://issues.apache.org/jira/browse/HIVE-10261
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Rui Li
>
> With {{hive.stats.fetch.column.stats=true}}, we'll estimate data size with column  stats when annotating operators with statistics. However, when column stats is partial, we're likely to underestimate data size, which may hurt performance, e.g. picking an inappropriate small table for map join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)