You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hive.apache.org by "Sergey Shelukhin (JIRA)" <ji...@apache.org> on 2015/08/12 00:42:46 UTC

[jira] [Commented] (HIVE-10600) optimize group by for GC

    [ https://issues.apache.org/jira/browse/HIVE-10600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14692417#comment-14692417 ] 

Sergey Shelukhin commented on HIVE-10600:
-----------------------------------------

[~mmccline] could one say you are doing this as part of vectorized group by? :)

> optimize group by for GC
> ------------------------
>
>                 Key: HIVE-10600
>                 URL: https://issues.apache.org/jira/browse/HIVE-10600
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>
> Quoting [~gopalv]:
> {noformat}
> So, something like a sum() GROUP BY will create a few hundred thousand
> AbstractAggregationBuffer objects all of which will suddenly go out of
> scope when the map.aggr flushes it down to the sort buffer.
> That particular GC collection takes forever because the tiny buffers take
> a lot of time to walk over and then they leave the memory space
> fragmented, which requires a compaction pass (which btw, writes to a
> page-interleaved NUMA zone).
> And to make things worse, the pre-allocated sort buffers with absolutely
> zero data in them take up most of the tenured regions causing these chunks
> of memory to be visited more and more often as they are part of the Eden
> space.
> {noformat}
> We need flat data structures to be GC friendly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)