You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Yongzhi Chen (JIRA)" <ji...@apache.org> on 2015/06/02 18:45:49 UTC

[jira] [Commented] (HIVE-8955) alter partition should check for "hive.stats.autogather" in hiveConf

    [ https://issues.apache.org/jira/browse/HIVE-8955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14569394#comment-14569394 ] 

Yongzhi Chen commented on HIVE-8955:
------------------------------------

[~pankit], I think it is by design. The code in HiveAlterHandler.java uses requireCalStats to check if need to  MetaStoreUtils.updatePartitionStatsFast. 
The requireCalStats will return true if STATS_GENERATED_VIA_STATS_TASK

{noformat}   if(newPart.getParameters().containsKey(StatsSetupConst.STATS_GENERATED_VIA_STATS_TASK)) {
      return true;
    }

{noformat}

The insert overwrite statement will use STATS_GENERATED_VIA_STATS_TASK, hive only create statstask when HIVESTATSAUTOGATHER is true. As shown in GenMapRedUtils.isMergeRequired code:
{noformat}
 ....     
if (mvTask != null && isInsertTable && hconf.getBoolVar(ConfVars.HIVESTATSAUTOGATHER)) {
        GenMapRedUtils.addStatsTask(fsOp, mvTask, currTask, hconf);
      }
...
{noformat}

Even updatePartitionStatsFast is called in HiveAlterHandler.java, there is no StatsTask when HIVESTATSAUTOGATHER, the stats are not updated.
Test result:
{noformat}
0: jdbc:hive2://localhost:10000> set hive.stats.autogather= true;
set hive.stats.autogather= true;
No rows affected (0.003 seconds)
0: jdbc:hive2://localhost:10000> insert overwrite table test_part2 partition (x) select description , code from jspsrcsmall;
insert overwrite table test_part2 partition (x) 
 select description , code from jspsrcsmall;
INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
INFO  : Job running in-process (local Hadoop)
INFO  : 2015-06-02 11:27:22,944 Stage-1 map = 0%,  reduce = 0%
INFO  : Ended Job = job_local688274287_0009
INFO  : Stage-4 is selected by condition resolver.
INFO  : Stage-3 is filtered out by condition resolver.
INFO  : Stage-5 is filtered out by condition resolver.
INFO  : Moving data to: file:/user/hive/warehouse/test_part2/.hive-staging_hive_2015-06-02_11-27-21_770_3880518525165134918-1/-ext-10000 from file:/user/hive/warehouse/test_part2/.hive-staging_hive_2015-06-02_11-27-21_770_3880518525165134918-1/-ext-10002
INFO  : Loading data to table default.test_part2 partition (x=null) from file:/user/hive/warehouse/test_part2/.hive-staging_hive_2015-06-02_11-27-21_770_3880518525165134918-1/-ext-10000
INFO  : 	 Time taken for load dynamic partitions : 532
INFO  : 	Loading partition {x=11-1031}
INFO  : 	Loading partition {x=11-1011}
INFO  : 	Loading partition {x=00-0000}
INFO  : 	Loading partition {x=11-1021}
INFO  : 	Loading partition {x=11-0000}
INFO  : 	 Time taken for adding to write entity : 1
INFO  : Partition default.test_part2{x=00-0000} stats: [numFiles=1, numRows=1, totalSize=16, rawDataSize=15]
INFO  : Partition default.test_part2{x=11-0000} stats: [numFiles=1, numRows=1, totalSize=23, rawDataSize=22]
INFO  : Partition default.test_part2{x=11-1011} stats: [numFiles=1, numRows=1, totalSize=17, rawDataSize=16]
INFO  : Partition default.test_part2{x=11-1021} stats: [numFiles=1, numRows=1, totalSize=32, rawDataSize=31]
INFO  : Partition default.test_part2{x=11-1031} stats: [numFiles=1, numRows=1, totalSize=12, rawDataSize=11]
No rows affected (564.874 seconds)
0: jdbc:hive2://localhost:10000> insert overwrite table test_part2 partition (x) select description , code from jspsrcsmall;
insert overwrite table test_part2 partition (x) 
 select description , code from jspsrcsmall;
INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
INFO  : Job running in-process (local Hadoop)
INFO  : 2015-06-02 11:42:00,687 Stage-1 map = 0%,  reduce = 0%
INFO  : Ended Job = job_local1912867068_0011
INFO  : Stage-3 is selected by condition resolver.
INFO  : Stage-2 is filtered out by condition resolver.
INFO  : Stage-4 is filtered out by condition resolver.
INFO  : Moving data to: file:/user/hive/warehouse/test_part2/.hive-staging_hive_2015-06-02_11-41-19_090_6995116246086250594-1/-ext-10000 from file:/user/hive/warehouse/test_part2/.hive-staging_hive_2015-06-02_11-41-19_090_6995116246086250594-1/-ext-10002
INFO  : Loading data to table default.test_part2 partition (x=null) from file:/user/hive/warehouse/test_part2/.hive-staging_hive_2015-06-02_11-41-19_090_6995116246086250594-1/-ext-10000
INFO  : 	 Time taken for load dynamic partitions : 535
INFO  : 	Loading partition {x=11-1011}
INFO  : 	Loading partition {x=11-1031}
INFO  : 	Loading partition {x=11-1021}
INFO  : 	Loading partition {x=00-0000}
INFO  : 	Loading partition {x=11-0000}
INFO  : 	 Time taken for adding to write entity : 1
No rows affected (42.263 seconds)
{noformat}


> alter partition should check for "hive.stats.autogather" in hiveConf
> --------------------------------------------------------------------
>
>                 Key: HIVE-8955
>                 URL: https://issues.apache.org/jira/browse/HIVE-8955
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>    Affects Versions: 0.13.1
>            Reporter: Pankit Thapar
>            Assignee: Yongzhi Chen
>
> When alter partition code path is triggered, it should check for the flag "hive.stats.autogather", if it is true, then only updateStats else skip them.
> This is done in append_partition code flow. 
> Is there any specific reason the alter_partition does not respect this conf variable?
> //code snippet : HiveMetastore.java 
>  private Partition append_partition_common(RawStore ms, String dbName, String tableName,
>         List<String> part_vals, EnvironmentContext envContext) throws InvalidObjectException,
>         AlreadyExistsException, MetaException {
> ...
> ....
>         if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVESTATSAUTOGATHER) &&
>             !MetaStoreUtils.isView(tbl)) {
>           MetaStoreUtils.updatePartitionStatsFast(part, wh, madeDir);
>         }
> ...
> ...
> }
> The above code snippet checks for the variable but this same check is absent in 
> //code snippet : HiveAlterHandler.java 
> public Partition alterPartition(final RawStore msdb, Warehouse wh, final String dbname,
>       final String name, final List<String> part_vals, final Partition new_part)
>       throws InvalidOperationException, InvalidObjectException, AlreadyExistsException,
>       MetaException {
> ....
> ...
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)