You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Ashutosh Chauhan (JIRA)" <ji...@apache.org> on 2010/02/08 20:33:28 UTC

[jira] Created: (PIG-1230) Streaming input in POJoinPackage should use nonspillable bag to collect tuples

Streaming input in POJoinPackage should use nonspillable bag to collect tuples
------------------------------------------------------------------------------

                 Key: PIG-1230
                 URL: https://issues.apache.org/jira/browse/PIG-1230
             Project: Pig
          Issue Type: Bug
          Components: impl
    Affects Versions: 0.6.0
            Reporter: Ashutosh Chauhan
            Assignee: Ashutosh Chauhan
             Fix For: 0.7.0


Last table of join statement is streamed through instead of collecting all its tuple in a bag. As a further optimization of that, tuples of that relation are collected in chunks in a bag. Since we don't want to spill the tuples from this bag, NonSpillableBag should be used to hold tuples for this relation. Initially, DefaultDataBag was used, which was later changed to InternalCachedBag as a part of PIG-1209.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1230) Streaming input in POJoinPackage should use nonspillable bag to collect tuples

Posted by "Ashutosh Chauhan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831585#action_12831585 ] 

Ashutosh Chauhan commented on PIG-1230:
---------------------------------------

This patch switches POJoinPackage to use NonSpillableDataBag for last bag instead of currently used InternalCachedBag. Both of these bag implementations are already covered by existing unit tests and thus this patch needs no new tests. 

> Streaming input in POJoinPackage should use nonspillable bag to collect tuples
> ------------------------------------------------------------------------------
>
>                 Key: PIG-1230
>                 URL: https://issues.apache.org/jira/browse/PIG-1230
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.6.0
>            Reporter: Ashutosh Chauhan
>            Assignee: Ashutosh Chauhan
>             Fix For: 0.7.0
>
>         Attachments: pig-1230.patch, pig-1230_1.patch
>
>
> Last table of join statement is streamed through instead of collecting all its tuple in a bag. As a further optimization of that, tuples of that relation are collected in chunks in a bag. Since we don't want to spill the tuples from this bag, NonSpillableBag should be used to hold tuples for this relation. Initially, DefaultDataBag was used, which was later changed to InternalCachedBag as a part of PIG-1209.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1230) Streaming input in POJoinPackage should use nonspillable bag to collect tuples

Posted by "Ashutosh Chauhan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashutosh Chauhan updated PIG-1230:
----------------------------------

    Attachment: pig-1230_2.patch

As per comment changed lastBagIndex to numInputs - 1, no other changes.

> Streaming input in POJoinPackage should use nonspillable bag to collect tuples
> ------------------------------------------------------------------------------
>
>                 Key: PIG-1230
>                 URL: https://issues.apache.org/jira/browse/PIG-1230
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.6.0
>            Reporter: Ashutosh Chauhan
>            Assignee: Ashutosh Chauhan
>             Fix For: 0.7.0
>
>         Attachments: pig-1230.patch, pig-1230_1.patch, pig-1230_2.patch
>
>
> Last table of join statement is streamed through instead of collecting all its tuple in a bag. As a further optimization of that, tuples of that relation are collected in chunks in a bag. Since we don't want to spill the tuples from this bag, NonSpillableBag should be used to hold tuples for this relation. Initially, DefaultDataBag was used, which was later changed to InternalCachedBag as a part of PIG-1209.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1230) Streaming input in POJoinPackage should use nonspillable bag to collect tuples

Posted by "Ashutosh Chauhan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashutosh Chauhan updated PIG-1230:
----------------------------------

    Attachment: pig-1230_1.patch

Fixed findbugs warnings. Result of test-patch:
{code}
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec] 
     [exec]     -1 tests included.  The patch doesn't appear to include any new or modified tests.
     [exec]                         Please justify why no tests are needed for this patch.
     [exec] 
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec] 
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec] 
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
     [exec] 
     [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.

{code}

Hard to write unit test case. Patch ready for review.

> Streaming input in POJoinPackage should use nonspillable bag to collect tuples
> ------------------------------------------------------------------------------
>
>                 Key: PIG-1230
>                 URL: https://issues.apache.org/jira/browse/PIG-1230
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.6.0
>            Reporter: Ashutosh Chauhan
>            Assignee: Ashutosh Chauhan
>             Fix For: 0.7.0
>
>         Attachments: pig-1230.patch, pig-1230_1.patch
>
>
> Last table of join statement is streamed through instead of collecting all its tuple in a bag. As a further optimization of that, tuples of that relation are collected in chunks in a bag. Since we don't want to spill the tuples from this bag, NonSpillableBag should be used to hold tuples for this relation. Initially, DefaultDataBag was used, which was later changed to InternalCachedBag as a part of PIG-1209.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1230) Streaming input in POJoinPackage should use nonspillable bag to collect tuples

Posted by "Ashutosh Chauhan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashutosh Chauhan updated PIG-1230:
----------------------------------

    Attachment: pig-1230.patch

Patch as per description.

> Streaming input in POJoinPackage should use nonspillable bag to collect tuples
> ------------------------------------------------------------------------------
>
>                 Key: PIG-1230
>                 URL: https://issues.apache.org/jira/browse/PIG-1230
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.6.0
>            Reporter: Ashutosh Chauhan
>            Assignee: Ashutosh Chauhan
>             Fix For: 0.7.0
>
>         Attachments: pig-1230.patch
>
>
> Last table of join statement is streamed through instead of collecting all its tuple in a bag. As a further optimization of that, tuples of that relation are collected in chunks in a bag. Since we don't want to spill the tuples from this bag, NonSpillableBag should be used to hold tuples for this relation. Initially, DefaultDataBag was used, which was later changed to InternalCachedBag as a part of PIG-1209.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1230) Streaming input in POJoinPackage should use nonspillable bag to collect tuples

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831591#action_12831591 ] 

Olga Natkovich commented on PIG-1230:
-------------------------------------

The patch looks good. One comment: when iterating through bags,  we should say numInputs -1 rather than lastBagIndex (which happens to have the right value.) to make the code more readable and intent more clear. After the change is made, the patch can be committed

> Streaming input in POJoinPackage should use nonspillable bag to collect tuples
> ------------------------------------------------------------------------------
>
>                 Key: PIG-1230
>                 URL: https://issues.apache.org/jira/browse/PIG-1230
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.6.0
>            Reporter: Ashutosh Chauhan
>            Assignee: Ashutosh Chauhan
>             Fix For: 0.7.0
>
>         Attachments: pig-1230.patch, pig-1230_1.patch
>
>
> Last table of join statement is streamed through instead of collecting all its tuple in a bag. As a further optimization of that, tuples of that relation are collected in chunks in a bag. Since we don't want to spill the tuples from this bag, NonSpillableBag should be used to hold tuples for this relation. Initially, DefaultDataBag was used, which was later changed to InternalCachedBag as a part of PIG-1209.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1230) Streaming input in POJoinPackage should use nonspillable bag to collect tuples

Posted by "Ashutosh Chauhan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashutosh Chauhan updated PIG-1230:
----------------------------------

    Status: Patch Available  (was: Open)

> Streaming input in POJoinPackage should use nonspillable bag to collect tuples
> ------------------------------------------------------------------------------
>
>                 Key: PIG-1230
>                 URL: https://issues.apache.org/jira/browse/PIG-1230
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.6.0
>            Reporter: Ashutosh Chauhan
>            Assignee: Ashutosh Chauhan
>             Fix For: 0.7.0
>
>         Attachments: pig-1230.patch
>
>
> Last table of join statement is streamed through instead of collecting all its tuple in a bag. As a further optimization of that, tuples of that relation are collected in chunks in a bag. Since we don't want to spill the tuples from this bag, NonSpillableBag should be used to hold tuples for this relation. Initially, DefaultDataBag was used, which was later changed to InternalCachedBag as a part of PIG-1209.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1230) Streaming input in POJoinPackage should use nonspillable bag to collect tuples

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831224#action_12831224 ] 

Hadoop QA commented on PIG-1230:
--------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12435203/pig-1230.patch
  against trunk revision 907760.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 2 new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/204/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/204/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/204/console

This message is automatically generated.

> Streaming input in POJoinPackage should use nonspillable bag to collect tuples
> ------------------------------------------------------------------------------
>
>                 Key: PIG-1230
>                 URL: https://issues.apache.org/jira/browse/PIG-1230
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.6.0
>            Reporter: Ashutosh Chauhan
>            Assignee: Ashutosh Chauhan
>             Fix For: 0.7.0
>
>         Attachments: pig-1230.patch
>
>
> Last table of join statement is streamed through instead of collecting all its tuple in a bag. As a further optimization of that, tuples of that relation are collected in chunks in a bag. Since we don't want to spill the tuples from this bag, NonSpillableBag should be used to hold tuples for this relation. Initially, DefaultDataBag was used, which was later changed to InternalCachedBag as a part of PIG-1209.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1230) Streaming input in POJoinPackage should use nonspillable bag to collect tuples

Posted by "Ashutosh Chauhan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashutosh Chauhan updated PIG-1230:
----------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Patch checked-in.

> Streaming input in POJoinPackage should use nonspillable bag to collect tuples
> ------------------------------------------------------------------------------
>
>                 Key: PIG-1230
>                 URL: https://issues.apache.org/jira/browse/PIG-1230
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.6.0
>            Reporter: Ashutosh Chauhan
>            Assignee: Ashutosh Chauhan
>             Fix For: 0.7.0
>
>         Attachments: pig-1230.patch, pig-1230_1.patch, pig-1230_2.patch
>
>
> Last table of join statement is streamed through instead of collecting all its tuple in a bag. As a further optimization of that, tuples of that relation are collected in chunks in a bag. Since we don't want to spill the tuples from this bag, NonSpillableBag should be used to hold tuples for this relation. Initially, DefaultDataBag was used, which was later changed to InternalCachedBag as a part of PIG-1209.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Closed: (PIG-1230) Streaming input in POJoinPackage should use nonspillable bag to collect tuples

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai closed PIG-1230.
---------------------------


> Streaming input in POJoinPackage should use nonspillable bag to collect tuples
> ------------------------------------------------------------------------------
>
>                 Key: PIG-1230
>                 URL: https://issues.apache.org/jira/browse/PIG-1230
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.6.0
>            Reporter: Ashutosh Chauhan
>            Assignee: Ashutosh Chauhan
>             Fix For: 0.7.0
>
>         Attachments: pig-1230.patch, pig-1230_1.patch, pig-1230_2.patch
>
>
> Last table of join statement is streamed through instead of collecting all its tuple in a bag. As a further optimization of that, tuples of that relation are collected in chunks in a bag. Since we don't want to spill the tuples from this bag, NonSpillableBag should be used to hold tuples for this relation. Initially, DefaultDataBag was used, which was later changed to InternalCachedBag as a part of PIG-1209.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.