You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Ashutosh Chauhan (JIRA)" <ji...@apache.org> on 2010/04/27 00:41:32 UTC

[jira] Created: (PIG-1395) Mapside cogroup runs out of memory

Mapside cogroup runs out of memory
----------------------------------

                 Key: PIG-1395
                 URL: https://issues.apache.org/jira/browse/PIG-1395
             Project: Pig
          Issue Type: Improvement
          Components: impl
    Affects Versions: 0.8.0
            Reporter: Ashutosh Chauhan
            Assignee: Ashutosh Chauhan
             Fix For: 0.8.0


In a particular scenario when there aren't lot of tuples with a same key in a relation (i.e. there aren't many repeating keys) map tasks doing cogroup fails with GC overhead exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1395) Mapside cogroup runs out of memory

Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12861611#action_12861611 ] 

Pradeep Kamath commented on PIG-1395:
-------------------------------------

+1, the comment can be updated to reflect the nature of the comparison in the code - currently the comment and code seem to be different. - otherwise the change looks good.

> Mapside cogroup runs out of memory
> ----------------------------------
>
>                 Key: PIG-1395
>                 URL: https://issues.apache.org/jira/browse/PIG-1395
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>    Affects Versions: 0.8.0
>            Reporter: Ashutosh Chauhan
>            Assignee: Ashutosh Chauhan
>             Fix For: 0.8.0
>
>         Attachments: cogrp_mem.patch
>
>
> In a particular scenario when there aren't lot of tuples with a same key in a relation (i.e. there aren't many repeating keys) map tasks doing cogroup fails with GC overhead exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1395) Mapside cogroup runs out of memory

Posted by "Ashutosh Chauhan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashutosh Chauhan updated PIG-1395:
----------------------------------

    Attachment: cogrp_mem.patch

While doing cogroup, we first put tuples from all the relations in a heap, then we drain the heap and generate the output tuple as appropriate. We need to look ahead atleast one tuple from all the relations before generating an output tuple to be sure that we have all the tuples belonging to a key. Currently, we look too far ahead and tuples starts to accumulate faster in heap then we are draining. At a certain point, we had enough information to generate output tuple instead of waiting and putting another tuple in heap. This patch generate the output tuple at that point.

> Mapside cogroup runs out of memory
> ----------------------------------
>
>                 Key: PIG-1395
>                 URL: https://issues.apache.org/jira/browse/PIG-1395
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>    Affects Versions: 0.8.0
>            Reporter: Ashutosh Chauhan
>            Assignee: Ashutosh Chauhan
>             Fix For: 0.8.0
>
>         Attachments: cogrp_mem.patch
>
>
> In a particular scenario when there aren't lot of tuples with a same key in a relation (i.e. there aren't many repeating keys) map tasks doing cogroup fails with GC overhead exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1395) Mapside cogroup runs out of memory

Posted by "Ashutosh Chauhan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashutosh Chauhan updated PIG-1395:
----------------------------------

        Status: Resolved  (was: Patch Available)
    Resolution: Fixed

Patch checked-in with updated comment.

> Mapside cogroup runs out of memory
> ----------------------------------
>
>                 Key: PIG-1395
>                 URL: https://issues.apache.org/jira/browse/PIG-1395
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>    Affects Versions: 0.8.0
>            Reporter: Ashutosh Chauhan
>            Assignee: Ashutosh Chauhan
>             Fix For: 0.8.0
>
>         Attachments: cogrp_mem.patch
>
>
> In a particular scenario when there aren't lot of tuples with a same key in a relation (i.e. there aren't many repeating keys) map tasks doing cogroup fails with GC overhead exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1395) Mapside cogroup runs out of memory

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12861223#action_12861223 ] 

Hadoop QA commented on PIG-1395:
--------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12442908/cogrp_mem.patch
  against trunk revision 937570.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/302/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/302/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/302/console

This message is automatically generated.

> Mapside cogroup runs out of memory
> ----------------------------------
>
>                 Key: PIG-1395
>                 URL: https://issues.apache.org/jira/browse/PIG-1395
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>    Affects Versions: 0.8.0
>            Reporter: Ashutosh Chauhan
>            Assignee: Ashutosh Chauhan
>             Fix For: 0.8.0
>
>         Attachments: cogrp_mem.patch
>
>
> In a particular scenario when there aren't lot of tuples with a same key in a relation (i.e. there aren't many repeating keys) map tasks doing cogroup fails with GC overhead exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1395) Mapside cogroup runs out of memory

Posted by "Ashutosh Chauhan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashutosh Chauhan updated PIG-1395:
----------------------------------

    Status: Patch Available  (was: Open)

> Mapside cogroup runs out of memory
> ----------------------------------
>
>                 Key: PIG-1395
>                 URL: https://issues.apache.org/jira/browse/PIG-1395
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>    Affects Versions: 0.8.0
>            Reporter: Ashutosh Chauhan
>            Assignee: Ashutosh Chauhan
>             Fix For: 0.8.0
>
>         Attachments: cogrp_mem.patch
>
>
> In a particular scenario when there aren't lot of tuples with a same key in a relation (i.e. there aren't many repeating keys) map tasks doing cogroup fails with GC overhead exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.