You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Lefty Leverenz (JIRA)" <ji...@apache.org> on 2014/03/15 23:32:43 UTC

[jira] [Commented] (HIVE-3777) add a property in the partition to figure out if stats are accurate

    [ https://issues.apache.org/jira/browse/HIVE-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13936327#comment-13936327 ] 

Lefty Leverenz commented on HIVE-3777:
--------------------------------------

Documented in the wiki's Configuration Properties, please review:

{quote}
hive.stats.reliable
Default Value: false
Added In: Hive 0.10.0 with HIVE-1653
New Behavior In:  Hive 0.13.0 with HIVE-3777

Whether queries will fail because statistics cannot be collected completely accurately. If this is set to true, reading/writing from/into a partition or unpartitioned table may fail because the statistics could not be computed accurately. If it is set to false, the operation will succeed.

In Hive 0.13.0 and later, if hive.stats.reliable is false and statistics could not be computed correctly, the operation can still succeed and update the statistics but it sets a partition property "areStatsAccurate" to false. If the application needs accurate statistics, they can then be obtained in the background.
{quote}

Questions: 

# Does an unpartitioned table have the "areStatsAccurate" property too?
# Does the new behavior happen when hive.stats.reliable is false, not true?  (I ask because the jira description implies that this is a fix for the problem of long-running queries failing when statistics aren't accurate, but as I understand it the query doesn't fail when hive.stats.reliable is false.  Perhaps I'm confused, so please make sure the wikidoc is correct.)

Quick ref:
* [Language Manual -- Configuration Properties:  hive.stats.reliable |https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.stats.reliable]

> add a property in the partition to figure out if stats are accurate
> -------------------------------------------------------------------
>
>                 Key: HIVE-3777
>                 URL: https://issues.apache.org/jira/browse/HIVE-3777
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.13.0
>            Reporter: Namit Jain
>            Assignee: Ashutosh Chauhan
>             Fix For: 0.13.0
>
>         Attachments: HIVE-3777.2.patch, HIVE-3777.2.patch, HIVE-3777.3.patch, HIVE-3777.4.patch, HIVE-3777.5.patch, HIVE-3777.patch
>
>
> Currently, stats task tries to update the statistics in the table/partition
> being updated after the table/partition is loaded. In case of a failure to 
> update these stats (due to the any reason), the operation either succeeds
> (writing inaccurate stats) or fails depending on whether hive.stats.reliable
> is set to true. This can be bad for applications who do not always care about
> reliable stats, since the query may have taken a long time to execute and then
> fail eventually.
> Another property should be added to the partition: areStatsAccurate. If hive.stats.reliable is
> set to false, and stats could not be computed correctly, the operation would
> still succeed, update the stats, but set areStatsAccurate to false.
> If the application cares about accurate stats, it can be obtained in the 
> background.



--
This message was sent by Atlassian JIRA
(v6.2#6252)