You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Namit Jain (JIRA)" <ji...@apache.org> on 2010/02/05 18:13:28 UTC
[jira] Created: (HIVE-1134) bucketing mapjoin where the big table
contains more than 1 big table
bucketing mapjoin where the big table contains more than 1 big table
--------------------------------------------------------------------
Key: HIVE-1134
URL: https://issues.apache.org/jira/browse/HIVE-1134
Project: Hadoop Hive
Issue Type: New Feature
Components: Query Processor
Reporter: Namit Jain
Assignee: He Yongqiang
Fix For: 0.6.0
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1134) bucketing mapjoin where the big table
contains more than 1 big partition
Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
He Yongqiang updated HIVE-1134:
-------------------------------
Attachment: hive-1134-2010-02-17.patch
The attached patch also fixed a bug in Hive-917 's patch
Should use MOD instead of Div
// if the big table has more buckets than the current small table,
// use "MOD" to get small table bucket names. For example, if the big
// table has 4 buckets and the small table has 2 buckets, then the
// mapping should be 0->0, 1->1, 2->0, 3->1.
> bucketing mapjoin where the big table contains more than 1 big partition
> ------------------------------------------------------------------------
>
> Key: HIVE-1134
> URL: https://issues.apache.org/jira/browse/HIVE-1134
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1134-2010-02-17.patch
>
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1134) bucketing mapjoin where the big table
contains more than 1 big partition
Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
He Yongqiang updated HIVE-1134:
-------------------------------
Attachment: hive-1134-2010-02-20.patch
Thanks Namit. Updated the patch.
> bucketing mapjoin where the big table contains more than 1 big partition
> ------------------------------------------------------------------------
>
> Key: HIVE-1134
> URL: https://issues.apache.org/jira/browse/HIVE-1134
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1134-2010-02-17.patch, hive-1134-2010-02-18.patch, hive-1134-2010-02-19.patch, hive-1134-2010-02-20.patch
>
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1134) bucketing mapjoin where the big table
contains more than 1 big partition
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835898#action_12835898 ]
Namit Jain commented on HIVE-1134:
----------------------------------
Few minor comments:
1. CheckStyle - lot of code needs { }
For eg:
if (!checkBucketNumberAgainstBigTable(aliasToBucketNumberMapping,
bucketNumberInPart))
return null;
2. Modify existing tests to run the test without the hint and then compare the results.
3. Cleanup GenMapredUtils.setupBucketMapJoinInfo
> bucketing mapjoin where the big table contains more than 1 big partition
> ------------------------------------------------------------------------
>
> Key: HIVE-1134
> URL: https://issues.apache.org/jira/browse/HIVE-1134
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1134-2010-02-17.patch, hive-1134-2010-02-18.patch
>
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1134) bucketing mapjoin where the big table
contains more than 1 big partition
Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
He Yongqiang updated HIVE-1134:
-------------------------------
Attachment: hive-1134-2010-02-18.patch
Fixed some diff. Thanks Namit.
> bucketing mapjoin where the big table contains more than 1 big partition
> ------------------------------------------------------------------------
>
> Key: HIVE-1134
> URL: https://issues.apache.org/jira/browse/HIVE-1134
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1134-2010-02-17.patch, hive-1134-2010-02-18.patch
>
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1134) bucketing mapjoin where the big table
contains more than 1 big partition
Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
He Yongqiang updated HIVE-1134:
-------------------------------
Summary: bucketing mapjoin where the big table contains more than 1 big partition (was: bucketing mapjoin where the big table contains more than 1 big table)
> bucketing mapjoin where the big table contains more than 1 big partition
> ------------------------------------------------------------------------
>
> Key: HIVE-1134
> URL: https://issues.apache.org/jira/browse/HIVE-1134
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: He Yongqiang
> Fix For: 0.6.0
>
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1134) bucketing mapjoin where the big table
contains more than 1 big partition
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836212#action_12836212 ]
Namit Jain commented on HIVE-1134:
----------------------------------
The changes look good - but there is a problem with the test.
I did not debug, but it seems that you are not deleting the
table bucketmapjoin_has_result1 and 2 in some test - because
of which the tests input2/3 are failing intermittently (depending on the
order of tests).
Can you update the tests ?
> bucketing mapjoin where the big table contains more than 1 big partition
> ------------------------------------------------------------------------
>
> Key: HIVE-1134
> URL: https://issues.apache.org/jira/browse/HIVE-1134
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1134-2010-02-17.patch, hive-1134-2010-02-18.patch, hive-1134-2010-02-19.patch
>
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1134) bucketing mapjoin where the big table
contains more than 1 big partition
Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
He Yongqiang updated HIVE-1134:
-------------------------------
Attachment: hive-1134-2010-02-19.patch
Integrated Namit's comments. Thanks Namit.
> bucketing mapjoin where the big table contains more than 1 big partition
> ------------------------------------------------------------------------
>
> Key: HIVE-1134
> URL: https://issues.apache.org/jira/browse/HIVE-1134
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1134-2010-02-17.patch, hive-1134-2010-02-18.patch, hive-1134-2010-02-19.patch
>
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1134) bucketing mapjoin where the big table
contains more than 1 big partition
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Namit Jain updated HIVE-1134:
-----------------------------
Resolution: Fixed
Hadoop Flags: [Reviewed]
Status: Resolved (was: Patch Available)
Committed. Thanks Yongqiang
> bucketing mapjoin where the big table contains more than 1 big partition
> ------------------------------------------------------------------------
>
> Key: HIVE-1134
> URL: https://issues.apache.org/jira/browse/HIVE-1134
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1134-2010-02-17.patch, hive-1134-2010-02-18.patch, hive-1134-2010-02-19.patch, hive-1134-2010-02-20.patch
>
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1134) bucketing mapjoin where the big table
contains more than 1 big table
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834446#action_12834446 ]
Namit Jain commented on HIVE-1134:
----------------------------------
Some pending work from https://issues.apache.org/jira/browse/HIVE-917 - you can do that in separate jira if you want to.
1. Add the mapping in explain plan so that it can be compared - look at
https://issues.apache.org/jira/browse/HIVE-976
2. Add a negative test - the number of buckets in the 2 tables are not exact multiples of each other.
I mean, bucketed map join will not be used.
3. Instead of checking at runtime, set the defultbucketmatcher in the plan and initialize it using reflection
> bucketing mapjoin where the big table contains more than 1 big table
> --------------------------------------------------------------------
>
> Key: HIVE-1134
> URL: https://issues.apache.org/jira/browse/HIVE-1134
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: He Yongqiang
> Fix For: 0.6.0
>
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1134) bucketing mapjoin where the big table
contains more than 1 big partition
Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
He Yongqiang updated HIVE-1134:
-------------------------------
Status: Patch Available (was: Open)
> bucketing mapjoin where the big table contains more than 1 big partition
> ------------------------------------------------------------------------
>
> Key: HIVE-1134
> URL: https://issues.apache.org/jira/browse/HIVE-1134
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1134-2010-02-17.patch, hive-1134-2010-02-18.patch
>
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.