You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Olga Natkovich (JIRA)" <ji...@apache.org> on 2009/05/07 20:20:45 UTC

[jira] Created: (PIG-802) PERFORMANCE: not creating bags for ORDER BY

PERFORMANCE: not creating bags for ORDER BY
-------------------------------------------

                 Key: PIG-802
                 URL: https://issues.apache.org/jira/browse/PIG-802
             Project: Pig
          Issue Type: Improvement
            Reporter: Olga Natkovich


Order by should be changed to not use POPackage to put all of the tuples in a bag on the reduce side, as the bag is just immediately flattened. It can instead work like join does for the last input in the join. 


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-802) PERFORMANCE: not creating bags for ORDER BY

Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713294#action_12713294 ] 

Pradeep Kamath commented on PIG-802:
------------------------------------

+1

> PERFORMANCE: not creating bags for ORDER BY
> -------------------------------------------
>
>                 Key: PIG-802
>                 URL: https://issues.apache.org/jira/browse/PIG-802
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.2.0
>            Reporter: Olga Natkovich
>         Attachments: OrderByOptimization.patch
>
>
> Order by should be changed to not use POPackage to put all of the tuples in a bag on the reduce side, as the bag is just immediately flattened. It can instead work like join does for the last input in the join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (PIG-802) PERFORMANCE: not creating bags for ORDER BY

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates reassigned PIG-802:
------------------------------

    Assignee: Rakesh Setty

> PERFORMANCE: not creating bags for ORDER BY
> -------------------------------------------
>
>                 Key: PIG-802
>                 URL: https://issues.apache.org/jira/browse/PIG-802
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.2.0
>            Reporter: Olga Natkovich
>            Assignee: Rakesh Setty
>             Fix For: 0.4.0
>
>         Attachments: OrderByOptimization.patch
>
>
> Order by should be changed to not use POPackage to put all of the tuples in a bag on the reduce side, as the bag is just immediately flattened. It can instead work like join does for the last input in the join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-802) PERFORMANCE: not creating bags for ORDER BY

Posted by "Rakesh Setty (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rakesh Setty updated PIG-802:
-----------------------------

    Attachment: OrderByOptimization.patch

Attaching the new patch

> PERFORMANCE: not creating bags for ORDER BY
> -------------------------------------------
>
>                 Key: PIG-802
>                 URL: https://issues.apache.org/jira/browse/PIG-802
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.2.0
>            Reporter: Olga Natkovich
>         Attachments: OrderByOptimization.patch
>
>
> Order by should be changed to not use POPackage to put all of the tuples in a bag on the reduce side, as the bag is just immediately flattened. It can instead work like join does for the last input in the join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-802) PERFORMANCE: not creating bags for ORDER BY

Posted by "Rakesh Setty (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rakesh Setty updated PIG-802:
-----------------------------

    Attachment:     (was: OrderByOptimization.patch)

> PERFORMANCE: not creating bags for ORDER BY
> -------------------------------------------
>
>                 Key: PIG-802
>                 URL: https://issues.apache.org/jira/browse/PIG-802
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.2.0
>            Reporter: Olga Natkovich
>         Attachments: OrderByOptimization.patch
>
>
> Order by should be changed to not use POPackage to put all of the tuples in a bag on the reduce side, as the bag is just immediately flattened. It can instead work like join does for the last input in the join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-802) PERFORMANCE: not creating bags for ORDER BY

Posted by "Rakesh Setty (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rakesh Setty updated PIG-802:
-----------------------------

    Attachment:     (was: OrderByOptimization.patch)

> PERFORMANCE: not creating bags for ORDER BY
> -------------------------------------------
>
>                 Key: PIG-802
>                 URL: https://issues.apache.org/jira/browse/PIG-802
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.2.0
>            Reporter: Olga Natkovich
>         Attachments: OrderByOptimization.patch
>
>
> Order by should be changed to not use POPackage to put all of the tuples in a bag on the reduce side, as the bag is just immediately flattened. It can instead work like join does for the last input in the join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-802) PERFORMANCE: not creating bags for ORDER BY

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-802:
-------------------------------

    Status: Open  (was: Patch Available)

> PERFORMANCE: not creating bags for ORDER BY
> -------------------------------------------
>
>                 Key: PIG-802
>                 URL: https://issues.apache.org/jira/browse/PIG-802
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.2.0
>            Reporter: Olga Natkovich
>         Attachments: OrderByOptimization.patch
>
>
> Order by should be changed to not use POPackage to put all of the tuples in a bag on the reduce side, as the bag is just immediately flattened. It can instead work like join does for the last input in the join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-802) PERFORMANCE: not creating bags for ORDER BY

Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707064#action_12707064 ] 

Pradeep Kamath commented on PIG-802:
------------------------------------

PIG-744 is a duplicate - will be marking that one as duplicate.

Pasting the summary from PIG-744 which has a little more detail:
Currently order by results in multiple map reduce jobs (2 or 3 depending on the script) of which the last one does the actual ordering. In this last map reduce job, we create a bag of values (each value being the entire tuple that is getting sorted) for each sort key(s) using POPackage in the reduce phase. Then we turn around and flatten the bag in the foreach following the package. So there is really no need for the bag. But to be generic and use the existing operators, we can be more efficient by tagging the POPackage to create bags which are backed by the Hadoop iterator itself. This way we do not create a bag by making a copy of each tuple from the hadoop iterator. This should help both performance and scalability by making better use of memory.

> PERFORMANCE: not creating bags for ORDER BY
> -------------------------------------------
>
>                 Key: PIG-802
>                 URL: https://issues.apache.org/jira/browse/PIG-802
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Olga Natkovich
>
> Order by should be changed to not use POPackage to put all of the tuples in a bag on the reduce side, as the bag is just immediately flattened. It can instead work like join does for the last input in the join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-802) PERFORMANCE: not creating bags for ORDER BY

Posted by "Rakesh Setty (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rakesh Setty updated PIG-802:
-----------------------------

    Attachment:     (was: OrderByOptimization.patch)

> PERFORMANCE: not creating bags for ORDER BY
> -------------------------------------------
>
>                 Key: PIG-802
>                 URL: https://issues.apache.org/jira/browse/PIG-802
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.2.0
>            Reporter: Olga Natkovich
>         Attachments: OrderByOptimization.patch
>
>
> Order by should be changed to not use POPackage to put all of the tuples in a bag on the reduce side, as the bag is just immediately flattened. It can instead work like join does for the last input in the join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-802) PERFORMANCE: not creating bags for ORDER BY

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714594#action_12714594 ] 

Hadoop QA commented on PIG-802:
-------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12409408/OrderByOptimization.patch
  against trunk revision 779788.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/64/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/64/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/64/console

This message is automatically generated.

> PERFORMANCE: not creating bags for ORDER BY
> -------------------------------------------
>
>                 Key: PIG-802
>                 URL: https://issues.apache.org/jira/browse/PIG-802
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.2.0
>            Reporter: Olga Natkovich
>         Attachments: OrderByOptimization.patch
>
>
> Order by should be changed to not use POPackage to put all of the tuples in a bag on the reduce side, as the bag is just immediately flattened. It can instead work like join does for the last input in the join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-802) PERFORMANCE: not creating bags for ORDER BY

Posted by "Rakesh Setty (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rakesh Setty updated PIG-802:
-----------------------------

    Attachment: OrderByOptimization.patch

Attaching the modified patch. The detachInput method in POPackageLite will set key and tupIter to null. So ReadOnceBag maintains separate references to them. POPackageLite overloads the getValueTuple method with the additional key parameter to use the one provided by ReadOnceBag. The implementation of POPackage is untouched.

> PERFORMANCE: not creating bags for ORDER BY
> -------------------------------------------
>
>                 Key: PIG-802
>                 URL: https://issues.apache.org/jira/browse/PIG-802
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.2.0
>            Reporter: Olga Natkovich
>         Attachments: OrderByOptimization.patch
>
>
> Order by should be changed to not use POPackage to put all of the tuples in a bag on the reduce side, as the bag is just immediately flattened. It can instead work like join does for the last input in the join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-802) PERFORMANCE: not creating bags for ORDER BY

Posted by "Rakesh Setty (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rakesh Setty updated PIG-802:
-----------------------------

    Attachment:     (was: OrderByOptimization.patch)

> PERFORMANCE: not creating bags for ORDER BY
> -------------------------------------------
>
>                 Key: PIG-802
>                 URL: https://issues.apache.org/jira/browse/PIG-802
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.2.0
>            Reporter: Olga Natkovich
>         Attachments: OrderByOptimization.patch
>
>
> Order by should be changed to not use POPackage to put all of the tuples in a bag on the reduce side, as the bag is just immediately flattened. It can instead work like join does for the last input in the join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-802) PERFORMANCE: not creating bags for ORDER BY

Posted by "Rakesh Setty (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rakesh Setty updated PIG-802:
-----------------------------

    Attachment:     (was: OrderByOptimization.patch)

> PERFORMANCE: not creating bags for ORDER BY
> -------------------------------------------
>
>                 Key: PIG-802
>                 URL: https://issues.apache.org/jira/browse/PIG-802
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.2.0
>            Reporter: Olga Natkovich
>         Attachments: OrderByOptimization.patch
>
>
> Order by should be changed to not use POPackage to put all of the tuples in a bag on the reduce side, as the bag is just immediately flattened. It can instead work like join does for the last input in the join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-802) PERFORMANCE: not creating bags for ORDER BY

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-802:
-------------------------------

    Status: Patch Available  (was: Open)

> PERFORMANCE: not creating bags for ORDER BY
> -------------------------------------------
>
>                 Key: PIG-802
>                 URL: https://issues.apache.org/jira/browse/PIG-802
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.2.0
>            Reporter: Olga Natkovich
>         Attachments: OrderByOptimization.patch
>
>
> Order by should be changed to not use POPackage to put all of the tuples in a bag on the reduce side, as the bag is just immediately flattened. It can instead work like join does for the last input in the join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-802) PERFORMANCE: not creating bags for ORDER BY

Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713181#action_12713181 ] 

Pradeep Kamath commented on PIG-802:
------------------------------------

Changes look good - still have a comment about the change in MRCompiler.java:
In MRCompiler, does POPackageLite need to be used in the following too:

{noformat}
if (limit!=-1) {
             POPackage pkg_c = new POPackage(new OperatorKey(scope,nig.getNextNodeId(scope)));
...
}
{noformat}



> PERFORMANCE: not creating bags for ORDER BY
> -------------------------------------------
>
>                 Key: PIG-802
>                 URL: https://issues.apache.org/jira/browse/PIG-802
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.2.0
>            Reporter: Olga Natkovich
>         Attachments: OrderByOptimization.patch
>
>
> Order by should be changed to not use POPackage to put all of the tuples in a bag on the reduce side, as the bag is just immediately flattened. It can instead work like join does for the last input in the join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-802) PERFORMANCE: not creating bags for ORDER BY

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-802:
-------------------------------

    Status: Patch Available  (was: Open)

> PERFORMANCE: not creating bags for ORDER BY
> -------------------------------------------
>
>                 Key: PIG-802
>                 URL: https://issues.apache.org/jira/browse/PIG-802
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.2.0
>            Reporter: Olga Natkovich
>         Attachments: OrderByOptimization.patch
>
>
> Order by should be changed to not use POPackage to put all of the tuples in a bag on the reduce side, as the bag is just immediately flattened. It can instead work like join does for the last input in the join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-802) PERFORMANCE: not creating bags for ORDER BY

Posted by "Rakesh Setty (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711772#action_12711772 ] 

Rakesh Setty commented on PIG-802:
----------------------------------

I thought about having a POPackageLite instance as a parameter in the constructor of ReadOnceBag, but I thought it will make ReadOnceBag tied down to be very specific to POPackageLite and so it cannot be generic. So the choice we have to make I think is whether we want ReadOnceBag to be as generic as possible or we want to avoid duplicate code. What do you suggest?

> PERFORMANCE: not creating bags for ORDER BY
> -------------------------------------------
>
>                 Key: PIG-802
>                 URL: https://issues.apache.org/jira/browse/PIG-802
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.2.0
>            Reporter: Olga Natkovich
>         Attachments: OrderByOptimization.patch
>
>
> Order by should be changed to not use POPackage to put all of the tuples in a bag on the reduce side, as the bag is just immediately flattened. It can instead work like join does for the last input in the join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-802) PERFORMANCE: not creating bags for ORDER BY

Posted by "Rakesh Setty (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rakesh Setty updated PIG-802:
-----------------------------

    Attachment: OrderByOptimization.patch

Latest patch with the correction for limit clause.

> PERFORMANCE: not creating bags for ORDER BY
> -------------------------------------------
>
>                 Key: PIG-802
>                 URL: https://issues.apache.org/jira/browse/PIG-802
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.2.0
>            Reporter: Olga Natkovich
>         Attachments: OrderByOptimization.patch
>
>
> Order by should be changed to not use POPackage to put all of the tuples in a bag on the reduce side, as the bag is just immediately flattened. It can instead work like join does for the last input in the join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-802) PERFORMANCE: not creating bags for ORDER BY

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates updated PIG-802:
---------------------------

       Resolution: Fixed
    Fix Version/s: 0.4.0
           Status: Resolved  (was: Patch Available)

Fix checked in 30 May 2009

> PERFORMANCE: not creating bags for ORDER BY
> -------------------------------------------
>
>                 Key: PIG-802
>                 URL: https://issues.apache.org/jira/browse/PIG-802
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.2.0
>            Reporter: Olga Natkovich
>             Fix For: 0.4.0
>
>         Attachments: OrderByOptimization.patch
>
>
> Order by should be changed to not use POPackage to put all of the tuples in a bag on the reduce side, as the bag is just immediately flattened. It can instead work like join does for the last input in the join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-802) PERFORMANCE: not creating bags for ORDER BY

Posted by "Rakesh Setty (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rakesh Setty updated PIG-802:
-----------------------------

    Attachment:     (was: OrderByOptimization.patch)

> PERFORMANCE: not creating bags for ORDER BY
> -------------------------------------------
>
>                 Key: PIG-802
>                 URL: https://issues.apache.org/jira/browse/PIG-802
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.2.0
>            Reporter: Olga Natkovich
>         Attachments: OrderByOptimization.patch
>
>
> Order by should be changed to not use POPackage to put all of the tuples in a bag on the reduce side, as the bag is just immediately flattened. It can instead work like join does for the last input in the join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-802) PERFORMANCE: not creating bags for ORDER BY

Posted by "Rakesh Setty (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rakesh Setty updated PIG-802:
-----------------------------

    Affects Version/s: 0.2.0
               Status: Patch Available  (was: Open)

> PERFORMANCE: not creating bags for ORDER BY
> -------------------------------------------
>
>                 Key: PIG-802
>                 URL: https://issues.apache.org/jira/browse/PIG-802
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.2.0
>            Reporter: Olga Natkovich
>         Attachments: OrderByOptimization.patch
>
>
> Order by should be changed to not use POPackage to put all of the tuples in a bag on the reduce side, as the bag is just immediately flattened. It can instead work like join does for the last input in the join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-802) PERFORMANCE: not creating bags for ORDER BY

Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711811#action_12711811 ] 

Pradeep Kamath commented on PIG-802:
------------------------------------

I think even in the future if ReadOnceBags are used in places other than order by, they would need to be used immediately after a POPackageLite. So tying the two together is not bad and would reduce code duplication. 

> PERFORMANCE: not creating bags for ORDER BY
> -------------------------------------------
>
>                 Key: PIG-802
>                 URL: https://issues.apache.org/jira/browse/PIG-802
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.2.0
>            Reporter: Olga Natkovich
>         Attachments: OrderByOptimization.patch
>
>
> Order by should be changed to not use POPackage to put all of the tuples in a bag on the reduce side, as the bag is just immediately flattened. It can instead work like join does for the last input in the join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-802) PERFORMANCE: not creating bags for ORDER BY

Posted by "Rakesh Setty (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rakesh Setty updated PIG-802:
-----------------------------

    Attachment: OrderByOptimization.patch

Attaching new patch to remove a findbug warning. Unit tests are not required as this is not a new functionality, just a different implementation of an existing functionality.

> PERFORMANCE: not creating bags for ORDER BY
> -------------------------------------------
>
>                 Key: PIG-802
>                 URL: https://issues.apache.org/jira/browse/PIG-802
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.2.0
>            Reporter: Olga Natkovich
>         Attachments: OrderByOptimization.patch, OrderByOptimization.patch
>
>
> Order by should be changed to not use POPackage to put all of the tuples in a bag on the reduce side, as the bag is just immediately flattened. It can instead work like join does for the last input in the join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-802) PERFORMANCE: not creating bags for ORDER BY

Posted by "Rakesh Setty (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rakesh Setty updated PIG-802:
-----------------------------

    Attachment: OrderByOptimization.patch

Modified patch to remove the javac and findbugs warnings.

> PERFORMANCE: not creating bags for ORDER BY
> -------------------------------------------
>
>                 Key: PIG-802
>                 URL: https://issues.apache.org/jira/browse/PIG-802
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.2.0
>            Reporter: Olga Natkovich
>         Attachments: OrderByOptimization.patch
>
>
> Order by should be changed to not use POPackage to put all of the tuples in a bag on the reduce side, as the bag is just immediately flattened. It can instead work like join does for the last input in the join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-802) PERFORMANCE: not creating bags for ORDER BY

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714266#action_12714266 ] 

Hadoop QA commented on PIG-802:
-------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12409087/OrderByOptimization.patch
  against trunk revision 779788.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    -1 javac.  The applied patch generated 225 javac compiler warnings (more than the trunk's current 224 warnings).

    -1 findbugs.  The patch appears to introduce 2 new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/62/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/62/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/62/console

This message is automatically generated.

> PERFORMANCE: not creating bags for ORDER BY
> -------------------------------------------
>
>                 Key: PIG-802
>                 URL: https://issues.apache.org/jira/browse/PIG-802
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.2.0
>            Reporter: Olga Natkovich
>         Attachments: OrderByOptimization.patch
>
>
> Order by should be changed to not use POPackage to put all of the tuples in a bag on the reduce side, as the bag is just immediately flattened. It can instead work like join does for the last input in the join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-802) PERFORMANCE: not creating bags for ORDER BY

Posted by "Rakesh Setty (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rakesh Setty updated PIG-802:
-----------------------------

    Attachment: OrderByOptimization.patch

Attaching the patch file.

> PERFORMANCE: not creating bags for ORDER BY
> -------------------------------------------
>
>                 Key: PIG-802
>                 URL: https://issues.apache.org/jira/browse/PIG-802
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Olga Natkovich
>         Attachments: OrderByOptimization.patch
>
>
> Order by should be changed to not use POPackage to put all of the tuples in a bag on the reduce side, as the bag is just immediately flattened. It can instead work like join does for the last input in the join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-802) PERFORMANCE: not creating bags for ORDER BY

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714701#action_12714701 ] 

Hudson commented on PIG-802:
----------------------------

Integrated in Pig-trunk #458 (See [http://hudson.zones.apache.org/hudson/job/Pig-trunk/458/])
    : PERFORMANCE: not creating bags for ORDER BY (serakesh via olgan)


> PERFORMANCE: not creating bags for ORDER BY
> -------------------------------------------
>
>                 Key: PIG-802
>                 URL: https://issues.apache.org/jira/browse/PIG-802
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.2.0
>            Reporter: Olga Natkovich
>         Attachments: OrderByOptimization.patch
>
>
> Order by should be changed to not use POPackage to put all of the tuples in a bag on the reduce side, as the bag is just immediately flattened. It can instead work like join does for the last input in the join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-802) PERFORMANCE: not creating bags for ORDER BY

Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711769#action_12711769 ] 

Pradeep Kamath commented on PIG-802:
------------------------------------

Review comments:
In MRCompiler, does POPackageLite need to be used in the following too:
{noformat}
if (limit!=-1) {
             POPackage pkg_c = new POPackage(new OperatorKey(scope,nig.getNextNodeId(scope)));
...
}
{noformat}

In POPackage, the following declarations :
{noformat}
Iterator<NullableTuple> tupIter; 

Object key; 
{noformat}
should have "protected" access specifier to make the intent that these are used in POPackageLite explicit.

In ReadOnceBag.equals() you could also check if the keyInfo maps are equal.

The getValueTuple() in ReadOnceBag had duplicate code from POPackage.getValueTuple(). Instead of having the same code in two places, I am wondering if you could just construct ReadOnceBag with a POPackageLite instance passed in the constructor. Then if you make the POPackageLite.getValueTuple() method public, you can just invoke it from ReadOnceBag code. This way the code remains in one place. 

> PERFORMANCE: not creating bags for ORDER BY
> -------------------------------------------
>
>                 Key: PIG-802
>                 URL: https://issues.apache.org/jira/browse/PIG-802
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.2.0
>            Reporter: Olga Natkovich
>         Attachments: OrderByOptimization.patch
>
>
> Order by should be changed to not use POPackage to put all of the tuples in a bag on the reduce side, as the bag is just immediately flattened. It can instead work like join does for the last input in the join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-802) PERFORMANCE: not creating bags for ORDER BY

Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708551#action_12708551 ] 

Pradeep Kamath commented on PIG-802:
------------------------------------

Adding some more details:
A new kind of bag - ReadOnceBag needs to be implemented. This bag will have reference to the "key"  currently being processed and the iterator to values provided by hadoop in reduce(). The ReadOnceBag's iterator will simply iterate over the hadoop iterator at each call and construct a tuple by using the key and value (see POPackage.java for details on how this is done). POPackage should also be changed or a new class introduced which creates ReadOnceBags instead of regular bags. This creation of the bag should only initialize the bag with the key and iterator.

> PERFORMANCE: not creating bags for ORDER BY
> -------------------------------------------
>
>                 Key: PIG-802
>                 URL: https://issues.apache.org/jira/browse/PIG-802
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Olga Natkovich
>
> Order by should be changed to not use POPackage to put all of the tuples in a bag on the reduce side, as the bag is just immediately flattened. It can instead work like join does for the last input in the join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-802) PERFORMANCE: not creating bags for ORDER BY

Posted by "Rakesh Setty (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rakesh Setty updated PIG-802:
-----------------------------

    Attachment: OrderByOptimization.patch

New patch file to remove two more FindBugs warnings. The other one looks like is unavoidable. 

> PERFORMANCE: not creating bags for ORDER BY
> -------------------------------------------
>
>                 Key: PIG-802
>                 URL: https://issues.apache.org/jira/browse/PIG-802
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.2.0
>            Reporter: Olga Natkovich
>         Attachments: OrderByOptimization.patch
>
>
> Order by should be changed to not use POPackage to put all of the tuples in a bag on the reduce side, as the bag is just immediately flattened. It can instead work like join does for the last input in the join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-802) PERFORMANCE: not creating bags for ORDER BY

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12710511#action_12710511 ] 

Hadoop QA commented on PIG-802:
-------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12408398/OrderByOptimization.patch
  against trunk revision 775340.

    -1 @author.  The patch appears to contain 1 @author tags which the Pig community has agreed to not allow in code contributions.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    -1 javadoc.  The javadoc tool appears to have generated 1 warning messages.

    -1 javac.  The applied patch generated 227 javac compiler warnings (more than the trunk's current 226 warnings).

    -1 findbugs.  The patch appears to introduce 4 new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/47/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/47/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/47/console

This message is automatically generated.

> PERFORMANCE: not creating bags for ORDER BY
> -------------------------------------------
>
>                 Key: PIG-802
>                 URL: https://issues.apache.org/jira/browse/PIG-802
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.2.0
>            Reporter: Olga Natkovich
>         Attachments: OrderByOptimization.patch
>
>
> Order by should be changed to not use POPackage to put all of the tuples in a bag on the reduce side, as the bag is just immediately flattened. It can instead work like join does for the last input in the join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-802) PERFORMANCE: not creating bags for ORDER BY

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-802:
-------------------------------

    Status: Open  (was: Patch Available)

> PERFORMANCE: not creating bags for ORDER BY
> -------------------------------------------
>
>                 Key: PIG-802
>                 URL: https://issues.apache.org/jira/browse/PIG-802
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.2.0
>            Reporter: Olga Natkovich
>         Attachments: OrderByOptimization.patch
>
>
> Order by should be changed to not use POPackage to put all of the tuples in a bag on the reduce side, as the bag is just immediately flattened. It can instead work like join does for the last input in the join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.