You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Ashutosh Chauhan (JIRA)" <ji...@apache.org> on 2010/04/27 00:43:32 UTC

[jira] Updated: (PIG-1395) Mapside cogroup runs out of memory

     [ https://issues.apache.org/jira/browse/PIG-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashutosh Chauhan updated PIG-1395:
----------------------------------

    Attachment: cogrp_mem.patch

While doing cogroup, we first put tuples from all the relations in a heap, then we drain the heap and generate the output tuple as appropriate. We need to look ahead atleast one tuple from all the relations before generating an output tuple to be sure that we have all the tuples belonging to a key. Currently, we look too far ahead and tuples starts to accumulate faster in heap then we are draining. At a certain point, we had enough information to generate output tuple instead of waiting and putting another tuple in heap. This patch generate the output tuple at that point.

> Mapside cogroup runs out of memory
> ----------------------------------
>
>                 Key: PIG-1395
>                 URL: https://issues.apache.org/jira/browse/PIG-1395
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>    Affects Versions: 0.8.0
>            Reporter: Ashutosh Chauhan
>            Assignee: Ashutosh Chauhan
>             Fix For: 0.8.0
>
>         Attachments: cogrp_mem.patch
>
>
> In a particular scenario when there aren't lot of tuples with a same key in a relation (i.e. there aren't many repeating keys) map tasks doing cogroup fails with GC overhead exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.