You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "Ashutosh Chauhan (JIRA)" <ji...@apache.org> on 2013/12/03 00:06:35 UTC

[jira] [Comment Edited] (HIVE-5916) No need to aggregate statistics collected via counter mechanism

    [ https://issues.apache.org/jira/browse/HIVE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13837038#comment-13837038 ] 

Ashutosh Chauhan edited comment on HIVE-5916 at 12/2/13 11:04 PM:
------------------------------------------------------------------

Currently from TableScanOp we publish statistics to JobTracker with aggrKey as counter group name, actual statistics type (numRows etc) as counter and value of statistics as counter value. We can simply use partition spec as counter group name, statistics as counter name and then we need not to do invoke stats aggregation from hive client when query finishes. This has following advantages:
* Client don't need to do any aggregation. After retrieving statistics from JobTracker (via JobClient) it can directly add them to metastore.
* It lowers memory footprint on JobTracker, since instead of having counters per task per partition, it will have counters per partition only.


was (Author: ashutoshc):
Currently from TableScanOp we publish statistics to JobTracker with aggrKey as counter group name, actual statistics type (numRows etc) as counter and value of statistics as counter value. We can simply use statistics type both as counter group name as well as counter name and then we need not to do invoke stats aggregation from hive client when query finishes. This has following advantages:
* Client don't need to do any aggregation. After retrieving statistics from JobTracker (via JobClient) it can directly add them to metastore.
* It lowers memory footprint on JobTracker, since instead of having counters per task per partition, it will have counters per partition only.

> No need to aggregate statistics collected via counter mechanism 
> ----------------------------------------------------------------
>
>                 Key: HIVE-5916
>                 URL: https://issues.apache.org/jira/browse/HIVE-5916
>             Project: Hive
>          Issue Type: Bug
>          Components: Statistics
>    Affects Versions: 0.13.0
>            Reporter: Ashutosh Chauhan
>
> This results in unnecessary computations and waste of cluster resources which is not required since aggregation of counter is anyway done by JobTracker.



--
This message was sent by Atlassian JIRA
(v6.1#6144)