You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Namit Jain (JIRA)" <ji...@apache.org> on 2012/12/07 05:43:20 UTC

[jira] [Updated] (HIVE-3777) add a property in the partition to figure out if stats are accurate

     [ https://issues.apache.org/jira/browse/HIVE-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-3777:
-----------------------------

    Description: 
Currently, stats task tries to update the statistics in the table/partition
being updated after the table/partition is loaded. In case of a failure to 
update these stats (due to the any reason), the operation either succeeds
(writing inaccurate stats) or fails depending on whether hive.stats.reliable
is set to true. This can be bad for applications who do not always care about
reliable stats, since the query may have taken a long time to execute and then
fail eventually.

Another property should be added to the partition: areStatsAccurate. If hive.stats.reliable is
set to false, and stats could not be computed correctly, the operation would
still succeed, update the stats, but set areStatsAccurate to false.
If the application cares about accurate stats, it can be obtained in the 
background.

  was:
Currently, stats task tries to update the statistics in the table/partition
being updated after the table/partition is loaded. In case of a failure to 
update these stats (due to the any reason), the operation either succeeds
(writing inaccurate stats) or fails depending on whether hive.stats.reliable
is set to true. This can be bad for applications who do not always care about
reliable stats, since the query may have taken a long time to execute and then
fail eventually.

Another option should be added: hive.accurate.stats. If hive.stats.reliable is
set to false, and stats could not be computed correctly, the operation would
still succeed, update the stats, but set hive.accurate.stats to false.
If the application cares about accurate stats, it can be obtained in the 
background.

        Summary: add a property in the partition to figure out if stats are accurate  (was: add hive.stats.accurate in the partition)
    
> add a property in the partition to figure out if stats are accurate
> -------------------------------------------------------------------
>
>                 Key: HIVE-3777
>                 URL: https://issues.apache.org/jira/browse/HIVE-3777
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Namit Jain
>
> Currently, stats task tries to update the statistics in the table/partition
> being updated after the table/partition is loaded. In case of a failure to 
> update these stats (due to the any reason), the operation either succeeds
> (writing inaccurate stats) or fails depending on whether hive.stats.reliable
> is set to true. This can be bad for applications who do not always care about
> reliable stats, since the query may have taken a long time to execute and then
> fail eventually.
> Another property should be added to the partition: areStatsAccurate. If hive.stats.reliable is
> set to false, and stats could not be computed correctly, the operation would
> still succeed, update the stats, but set areStatsAccurate to false.
> If the application cares about accurate stats, it can be obtained in the 
> background.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira