You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@crunch.apache.org by "Gabriel Reid (JIRA)" <ji...@apache.org> on 2012/07/21 12:50:33 UTC
[jira] [Created] (CRUNCH-22) Join functions break when non-mapped
types are used as a join key
Gabriel Reid created CRUNCH-22:
----------------------------------
Summary: Join functions break when non-mapped types are used as a join key
Key: CRUNCH-22
URL: https://issues.apache.org/jira/browse/CRUNCH-22
Project: Crunch
Issue Type: Bug
Reporter: Gabriel Reid
Attachments: CRUNCH-22.patch
When non-mapped types (i.e. custom Writables or Avro records) are used as a join key in all join functions, the join does not operate correctly (and becomes closer to producing a Cartesian product) due to object re-use in Hadoop.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CRUNCH-22) Join functions break when non-mapped
types are used as a join key
Posted by "Gabriel Reid (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CRUNCH-22?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gabriel Reid updated CRUNCH-22:
-------------------------------
Attachment: CRUNCH-22.patch
Patch to correct issue, as well as unit tests for basic functionality of all join types.
> Join functions break when non-mapped types are used as a join key
> -----------------------------------------------------------------
>
> Key: CRUNCH-22
> URL: https://issues.apache.org/jira/browse/CRUNCH-22
> Project: Crunch
> Issue Type: Bug
> Reporter: Gabriel Reid
> Assignee: Gabriel Reid
> Attachments: CRUNCH-22.patch
>
>
> When non-mapped types (i.e. custom Writables or Avro records) are used as a join key in all join functions, the join does not operate correctly (and becomes closer to producing a Cartesian product) due to object re-use in Hadoop.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CRUNCH-22) Join functions break when non-mapped
types are used as a join key
Posted by "Matthias Friedrich (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CRUNCH-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419811#comment-13419811 ]
Matthias Friedrich commented on CRUNCH-22:
------------------------------------------
Does anybody know the reason for Hadoop's object pooling? MR applications are typically I/O-bound so garbage collector overhead shouldn't matter much. I'm just curious, I've seen lots of bugs because of this.
> Join functions break when non-mapped types are used as a join key
> -----------------------------------------------------------------
>
> Key: CRUNCH-22
> URL: https://issues.apache.org/jira/browse/CRUNCH-22
> Project: Crunch
> Issue Type: Bug
> Reporter: Gabriel Reid
> Assignee: Gabriel Reid
> Fix For: 0.3.0
>
> Attachments: CRUNCH-22.patch
>
>
> When non-mapped types (i.e. custom Writables or Avro records) are used as a join key in all join functions, the join does not operate correctly (and becomes closer to producing a Cartesian product) due to object re-use in Hadoop.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (CRUNCH-22) Join functions break when non-mapped
types are used as a join key
Posted by "Gabriel Reid (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CRUNCH-22?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gabriel Reid resolved CRUNCH-22.
--------------------------------
Resolution: Fixed
Fix Version/s: 0.3.0
Pushed to master
> Join functions break when non-mapped types are used as a join key
> -----------------------------------------------------------------
>
> Key: CRUNCH-22
> URL: https://issues.apache.org/jira/browse/CRUNCH-22
> Project: Crunch
> Issue Type: Bug
> Reporter: Gabriel Reid
> Assignee: Gabriel Reid
> Fix For: 0.3.0
>
> Attachments: CRUNCH-22.patch
>
>
> When non-mapped types (i.e. custom Writables or Avro records) are used as a join key in all join functions, the join does not operate correctly (and becomes closer to producing a Cartesian product) due to object re-use in Hadoop.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CRUNCH-22) Join functions break when non-mapped
types are used as a join key
Posted by "Josh Wills (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CRUNCH-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419876#comment-13419876 ]
Josh Wills commented on CRUNCH-22:
----------------------------------
Textbook case of premature optimization.
> Join functions break when non-mapped types are used as a join key
> -----------------------------------------------------------------
>
> Key: CRUNCH-22
> URL: https://issues.apache.org/jira/browse/CRUNCH-22
> Project: Crunch
> Issue Type: Bug
> Reporter: Gabriel Reid
> Assignee: Gabriel Reid
> Fix For: 0.3.0
>
> Attachments: CRUNCH-22.patch
>
>
> When non-mapped types (i.e. custom Writables or Avro records) are used as a join key in all join functions, the join does not operate correctly (and becomes closer to producing a Cartesian product) due to object re-use in Hadoop.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (CRUNCH-22) Join functions break when non-mapped
types are used as a join key
Posted by "Gabriel Reid (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CRUNCH-22?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gabriel Reid reassigned CRUNCH-22:
----------------------------------
Assignee: Gabriel Reid
> Join functions break when non-mapped types are used as a join key
> -----------------------------------------------------------------
>
> Key: CRUNCH-22
> URL: https://issues.apache.org/jira/browse/CRUNCH-22
> Project: Crunch
> Issue Type: Bug
> Reporter: Gabriel Reid
> Assignee: Gabriel Reid
> Attachments: CRUNCH-22.patch
>
>
> When non-mapped types (i.e. custom Writables or Avro records) are used as a join key in all join functions, the join does not operate correctly (and becomes closer to producing a Cartesian product) due to object re-use in Hadoop.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira