You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Prasanth J (JIRA)" <ji...@apache.org> on 2014/09/19 09:55:33 UTC

[jira] [Commented] (HIVE-8188) ExprNodeGenericFuncEvaluator::_evaluate() loads class annotations in a tight loop

    [ https://issues.apache.org/jira/browse/HIVE-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140154#comment-14140154 ] 

Prasanth J commented on HIVE-8188:
----------------------------------

I think its because hash-aggregation needs to estimate the size of the hash map. The values of the hashmaps are UDAFs whose aggregation buffer size can be estimated if the aggregation buffer has this annotation "@AggregationType(estimable = true)". GroupByOperator.shouldBeFlushed() is called for every row that is added to hash map. shouldBeFlushed() calls isEstimable() helper function which uses reflection every time to see if the aggregation function is estimable. Not sure why it is done this way but yes this will be slow as hell. This needs to be fixed.

> ExprNodeGenericFuncEvaluator::_evaluate() loads class annotations in a tight loop
> ---------------------------------------------------------------------------------
>
>                 Key: HIVE-8188
>                 URL: https://issues.apache.org/jira/browse/HIVE-8188
>             Project: Hive
>          Issue Type: Bug
>          Components: UDF
>    Affects Versions: 0.14.0
>            Reporter: Gopal V
>         Attachments: udf-deterministic.png
>
>
> When running a near-constant UDF, most of the CPU is burnt within the VM trying to read the class annotations for every row.
> !udf-deterministic.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)