You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Pradeep Kamath (JIRA)" <ji...@apache.org> on 2009/01/21 19:39:59 UTC

[jira] Created: (PIG-629) PERFORMANCE: Eliminate use of TargetedTuple for each input tuple in the map()

PERFORMANCE: Eliminate use of TargetedTuple for each input tuple in the map()
-----------------------------------------------------------------------------

                 Key: PIG-629
                 URL: https://issues.apache.org/jira/browse/PIG-629
             Project: Pig
          Issue Type: Improvement
    Affects Versions: types_branch
            Reporter: Pradeep Kamath
            Assignee: Pradeep Kamath
             Fix For: types_branch


Currently each Tuple read in by Pig is wrapped into a TargetedTuple which has an attribute holding a list of operator keys corresponding to the root operators for which the tuple is targeted. For example in a cogroup query the tuple would be destined for one of the two roots of the plan depending on which input it is sourced from. This information is contained in the TargetedTuple. However this adds unnecessary overhead at load time in a map as for each tuple this extra list needs to be attached and also on entry into the map(), the operators corresponding to the operator keys in the list need to be looked up in the map plan.

This overhead can be eliminated by just serializing this list of target operators at the Record Reader level and then deserializing the list in the configure() of the map(). After deserialization, the actual operators corresponding to the operator keys can also be looked up in the configure() itself. This way this setup is done one time in the configure() rather than adding extra overhead to each input tuple and each map() call.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-629) PERFORMANCE: Eliminate use of TargetedTuple for each input tuple in the map()

Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pradeep Kamath updated PIG-629:
-------------------------------

    Attachment: PIG-629.patch

Attached patch which implements the changes described in the description of the issue.

> PERFORMANCE: Eliminate use of TargetedTuple for each input tuple in the map()
> -----------------------------------------------------------------------------
>
>                 Key: PIG-629
>                 URL: https://issues.apache.org/jira/browse/PIG-629
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: types_branch
>            Reporter: Pradeep Kamath
>            Assignee: Pradeep Kamath
>             Fix For: types_branch
>
>         Attachments: PIG-629.patch
>
>
> Currently each Tuple read in by Pig is wrapped into a TargetedTuple which has an attribute holding a list of operator keys corresponding to the root operators for which the tuple is targeted. For example in a cogroup query the tuple would be destined for one of the two roots of the plan depending on which input it is sourced from. This information is contained in the TargetedTuple. However this adds unnecessary overhead at load time in a map as for each tuple this extra list needs to be attached and also on entry into the map(), the operators corresponding to the operator keys in the list need to be looked up in the map plan.
> This overhead can be eliminated by just serializing this list of target operators at the Record Reader level and then deserializing the list in the configure() of the map(). After deserialization, the actual operators corresponding to the operator keys can also be looked up in the configure() itself. This way this setup is done one time in the configure() rather than adding extra overhead to each input tuple and each map() call.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-629) PERFORMANCE: Eliminate use of TargetedTuple for each input tuple in the map()

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12666280#action_12666280 ] 

Olga Natkovich commented on PIG-629:
------------------------------------

patch committed, thanks pradeep

> PERFORMANCE: Eliminate use of TargetedTuple for each input tuple in the map()
> -----------------------------------------------------------------------------
>
>                 Key: PIG-629
>                 URL: https://issues.apache.org/jira/browse/PIG-629
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: types_branch
>            Reporter: Pradeep Kamath
>            Assignee: Pradeep Kamath
>             Fix For: types_branch
>
>         Attachments: PIG-629.patch
>
>
> Currently each Tuple read in by Pig is wrapped into a TargetedTuple which has an attribute holding a list of operator keys corresponding to the root operators for which the tuple is targeted. For example in a cogroup query the tuple would be destined for one of the two roots of the plan depending on which input it is sourced from. This information is contained in the TargetedTuple. However this adds unnecessary overhead at load time in a map as for each tuple this extra list needs to be attached and also on entry into the map(), the operators corresponding to the operator keys in the list need to be looked up in the map plan.
> This overhead can be eliminated by just serializing this list of target operators at the Record Reader level and then deserializing the list in the configure() of the map(). After deserialization, the actual operators corresponding to the operator keys can also be looked up in the configure() itself. This way this setup is done one time in the configure() rather than adding extra overhead to each input tuple and each map() call.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (PIG-629) PERFORMANCE: Eliminate use of TargetedTuple for each input tuple in the map()

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich resolved PIG-629.
--------------------------------

    Resolution: Fixed

> PERFORMANCE: Eliminate use of TargetedTuple for each input tuple in the map()
> -----------------------------------------------------------------------------
>
>                 Key: PIG-629
>                 URL: https://issues.apache.org/jira/browse/PIG-629
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: types_branch
>            Reporter: Pradeep Kamath
>            Assignee: Pradeep Kamath
>             Fix For: types_branch
>
>         Attachments: PIG-629.patch
>
>
> Currently each Tuple read in by Pig is wrapped into a TargetedTuple which has an attribute holding a list of operator keys corresponding to the root operators for which the tuple is targeted. For example in a cogroup query the tuple would be destined for one of the two roots of the plan depending on which input it is sourced from. This information is contained in the TargetedTuple. However this adds unnecessary overhead at load time in a map as for each tuple this extra list needs to be attached and also on entry into the map(), the operators corresponding to the operator keys in the list need to be looked up in the map plan.
> This overhead can be eliminated by just serializing this list of target operators at the Record Reader level and then deserializing the list in the configure() of the map(). After deserialization, the actual operators corresponding to the operator keys can also be looked up in the configure() itself. This way this setup is done one time in the configure() rather than adding extra overhead to each input tuple and each map() call.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.