You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "Ashish Thusoo (JIRA)" <ji...@apache.org> on 2008/12/12 20:51:46 UTC

[jira] Commented: (HIVE-170) map-side aggregations does not work properly

    [ https://issues.apache.org/jira/browse/HIVE-170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12656134#action_12656134 ] 

Ashish Thusoo commented on HIVE-170:
------------------------------------

A few comments on this:

1. I think we should not have the number of rows as a session level parameter. Instead we should just estimate these by estimating the row size at run time (instead of the config time). I think we should just use that estimate and the session level max memory size set by the user to estimate the number of rows.
2. In order to estimate the size of the hash table we should do some simple experiments to see what the (hash table size/number of rows) factor is and then use that to scale our estimates.

Otherwise, this looks like a nice thing that can give us a lot of oomph in the performance side of things.

> map-side aggregations does not work properly
> --------------------------------------------
>
>                 Key: HIVE-170
>                 URL: https://issues.apache.org/jira/browse/HIVE-170
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>         Attachments: 170.patch, patch2
>
>
> map-side aggregation depends on runtime.freememory() which is not guaranteed to return the freeable memory - it depends on when the garbage collector is invoked last.
> It might be a good idea to estimate the number of rows that can fit in the hash table and then flush the hash table based on that

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.