You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@crunch.apache.org by "Gabriel Reid (JIRA)" <ji...@apache.org> on 2012/09/20 14:28:07 UTC

[jira] [Created] (CRUNCH-71) PType mapping functions are not initialized before being used for deep copying

Gabriel Reid created CRUNCH-71:
----------------------------------

             Summary: PType mapping functions are not initialized before being used for deep copying
                 Key: CRUNCH-71
                 URL: https://issues.apache.org/jira/browse/CRUNCH-71
             Project: Crunch
          Issue Type: Bug
    Affects Versions: 0.3.0
            Reporter: Gabriel Reid
            Assignee: Gabriel Reid


The PType#getDetachedValue method performs a deep copy (if needed) in order to allow DoFns to hold on to values that have been passed through them (for example, in join functions).

The WritablePType class uses the built-in input and output MapFns in the PType to handle this deep copying, but the input and output MapFns don't get initialized (i.e. initialize isn't called on them) after they are deserialized along with the DoFn that is using them. In some rare cases (at least for tuples), this can result in NullPointerExceptions or other nastiness.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CRUNCH-71) PType mapping functions are not initialized before being used for deep copying

Posted by "Gabriel Reid (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CRUNCH-71?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13459832#comment-13459832 ] 

Gabriel Reid commented on CRUNCH-71:
------------------------------------

Ok, committed.
                
> PType mapping functions are not initialized before being used for deep copying
> ------------------------------------------------------------------------------
>
>                 Key: CRUNCH-71
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-71
>             Project: Crunch
>          Issue Type: Bug
>    Affects Versions: 0.3.0
>            Reporter: Gabriel Reid
>            Assignee: Gabriel Reid
>             Fix For: 0.4.0
>
>         Attachments: CRUNCH-71.patch
>
>
> The PType#getDetachedValue method performs a deep copy (if needed) in order to allow DoFns to hold on to values that have been passed through them (for example, in join functions).
> The WritablePType class uses the built-in input and output MapFns in the PType to handle this deep copying, but the input and output MapFns don't get initialized (i.e. initialize isn't called on them) after they are deserialized along with the DoFn that is using them. In some rare cases (at least for tuples), this can result in NullPointerExceptions or other nastiness.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (CRUNCH-71) PType mapping functions are not initialized before being used for deep copying

Posted by "Gabriel Reid (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CRUNCH-71?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gabriel Reid resolved CRUNCH-71.
--------------------------------

       Resolution: Fixed
    Fix Version/s: 0.4.0
    
> PType mapping functions are not initialized before being used for deep copying
> ------------------------------------------------------------------------------
>
>                 Key: CRUNCH-71
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-71
>             Project: Crunch
>          Issue Type: Bug
>    Affects Versions: 0.3.0
>            Reporter: Gabriel Reid
>            Assignee: Gabriel Reid
>             Fix For: 0.4.0
>
>         Attachments: CRUNCH-71.patch
>
>
> The PType#getDetachedValue method performs a deep copy (if needed) in order to allow DoFns to hold on to values that have been passed through them (for example, in join functions).
> The WritablePType class uses the built-in input and output MapFns in the PType to handle this deep copying, but the input and output MapFns don't get initialized (i.e. initialize isn't called on them) after they are deserialized along with the DoFn that is using them. In some rare cases (at least for tuples), this can result in NullPointerExceptions or other nastiness.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CRUNCH-71) PType mapping functions are not initialized before being used for deep copying

Posted by "Josh Wills (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CRUNCH-71?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13459644#comment-13459644 ] 

Josh Wills commented on CRUNCH-71:
----------------------------------

+1.
                
> PType mapping functions are not initialized before being used for deep copying
> ------------------------------------------------------------------------------
>
>                 Key: CRUNCH-71
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-71
>             Project: Crunch
>          Issue Type: Bug
>    Affects Versions: 0.3.0
>            Reporter: Gabriel Reid
>            Assignee: Gabriel Reid
>         Attachments: CRUNCH-71.patch
>
>
> The PType#getDetachedValue method performs a deep copy (if needed) in order to allow DoFns to hold on to values that have been passed through them (for example, in join functions).
> The WritablePType class uses the built-in input and output MapFns in the PType to handle this deep copying, but the input and output MapFns don't get initialized (i.e. initialize isn't called on them) after they are deserialized along with the DoFn that is using them. In some rare cases (at least for tuples), this can result in NullPointerExceptions or other nastiness.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CRUNCH-71) PType mapping functions are not initialized before being used for deep copying

Posted by "Gabriel Reid (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CRUNCH-71?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gabriel Reid updated CRUNCH-71:
-------------------------------

    Attachment: CRUNCH-71.patch

Patch to resolve the issue attached.

I'm not 100% happy with this solution, as I would prefer that the PType would be supplied to the DoFn at runtime instead of the DoFn being responsible for calling PType#initialize.

However, that approach could bring a lot of extra work along with it in job setup as there is not a 1-to-1 relationship between DoFns and PTypes. 

As object reuse issues are pretty isolated in MR contexts (joins are the main place where I see them occurring) then this fix feels ok to me for now. Any objections to this patch?
                
> PType mapping functions are not initialized before being used for deep copying
> ------------------------------------------------------------------------------
>
>                 Key: CRUNCH-71
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-71
>             Project: Crunch
>          Issue Type: Bug
>    Affects Versions: 0.3.0
>            Reporter: Gabriel Reid
>            Assignee: Gabriel Reid
>         Attachments: CRUNCH-71.patch
>
>
> The PType#getDetachedValue method performs a deep copy (if needed) in order to allow DoFns to hold on to values that have been passed through them (for example, in join functions).
> The WritablePType class uses the built-in input and output MapFns in the PType to handle this deep copying, but the input and output MapFns don't get initialized (i.e. initialize isn't called on them) after they are deserialized along with the DoFn that is using them. In some rare cases (at least for tuples), this can result in NullPointerExceptions or other nastiness.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira