You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Namit Jain (JIRA)" <ji...@apache.org> on 2010/02/05 18:13:28 UTC

[jira] Created: (HIVE-1134) bucketing mapjoin where the big table contains more than 1 big table

bucketing mapjoin where the big table contains more than 1 big table
--------------------------------------------------------------------

                 Key: HIVE-1134
                 URL: https://issues.apache.org/jira/browse/HIVE-1134
             Project: Hadoop Hive
          Issue Type: New Feature
          Components: Query Processor
            Reporter: Namit Jain
            Assignee: He Yongqiang
             Fix For: 0.6.0




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1134) bucketing mapjoin where the big table contains more than 1 big partition

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang updated HIVE-1134:
-------------------------------

    Attachment: hive-1134-2010-02-17.patch

The attached patch also fixed a bug in Hive-917 's patch

Should use MOD instead of Div
// if the big table has more buckets than the current small table,
// use "MOD" to get small table bucket names. For example, if the big
// table has 4 buckets and the small table has 2 buckets, then the
// mapping should be 0->0, 1->1, 2->0, 3->1.

> bucketing mapjoin where the big table contains more than 1 big partition
> ------------------------------------------------------------------------
>
>                 Key: HIVE-1134
>                 URL: https://issues.apache.org/jira/browse/HIVE-1134
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: He Yongqiang
>             Fix For: 0.6.0
>
>         Attachments: hive-1134-2010-02-17.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1134) bucketing mapjoin where the big table contains more than 1 big partition

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang updated HIVE-1134:
-------------------------------

    Attachment: hive-1134-2010-02-20.patch

Thanks Namit.  Updated the patch.

> bucketing mapjoin where the big table contains more than 1 big partition
> ------------------------------------------------------------------------
>
>                 Key: HIVE-1134
>                 URL: https://issues.apache.org/jira/browse/HIVE-1134
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: He Yongqiang
>             Fix For: 0.6.0
>
>         Attachments: hive-1134-2010-02-17.patch, hive-1134-2010-02-18.patch, hive-1134-2010-02-19.patch, hive-1134-2010-02-20.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1134) bucketing mapjoin where the big table contains more than 1 big partition

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835898#action_12835898 ] 

Namit Jain commented on HIVE-1134:
----------------------------------

Few minor comments:

1. CheckStyle - lot of code needs { }

For eg:

 if (!checkBucketNumberAgainstBigTable(aliasToBucketNumberMapping, 	
	bucketNumberInPart)) 	 
   return null;

2. Modify existing tests to run the test without the hint and then compare the results.

3. Cleanup GenMapredUtils.setupBucketMapJoinInfo

> bucketing mapjoin where the big table contains more than 1 big partition
> ------------------------------------------------------------------------
>
>                 Key: HIVE-1134
>                 URL: https://issues.apache.org/jira/browse/HIVE-1134
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: He Yongqiang
>             Fix For: 0.6.0
>
>         Attachments: hive-1134-2010-02-17.patch, hive-1134-2010-02-18.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1134) bucketing mapjoin where the big table contains more than 1 big partition

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang updated HIVE-1134:
-------------------------------

    Attachment: hive-1134-2010-02-18.patch

Fixed some diff. Thanks Namit.

> bucketing mapjoin where the big table contains more than 1 big partition
> ------------------------------------------------------------------------
>
>                 Key: HIVE-1134
>                 URL: https://issues.apache.org/jira/browse/HIVE-1134
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: He Yongqiang
>             Fix For: 0.6.0
>
>         Attachments: hive-1134-2010-02-17.patch, hive-1134-2010-02-18.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1134) bucketing mapjoin where the big table contains more than 1 big partition

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang updated HIVE-1134:
-------------------------------

    Summary: bucketing mapjoin where the big table contains more than 1 big partition  (was: bucketing mapjoin where the big table contains more than 1 big table)

> bucketing mapjoin where the big table contains more than 1 big partition
> ------------------------------------------------------------------------
>
>                 Key: HIVE-1134
>                 URL: https://issues.apache.org/jira/browse/HIVE-1134
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: He Yongqiang
>             Fix For: 0.6.0
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1134) bucketing mapjoin where the big table contains more than 1 big partition

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836212#action_12836212 ] 

Namit Jain commented on HIVE-1134:
----------------------------------

The changes look good - but there is a problem with the test.
I did not debug, but it seems that you are not deleting the
table bucketmapjoin_has_result1 and 2 in some test  - because
of which the tests input2/3 are failing intermittently (depending on the
order of tests).

Can you update the tests ?

> bucketing mapjoin where the big table contains more than 1 big partition
> ------------------------------------------------------------------------
>
>                 Key: HIVE-1134
>                 URL: https://issues.apache.org/jira/browse/HIVE-1134
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: He Yongqiang
>             Fix For: 0.6.0
>
>         Attachments: hive-1134-2010-02-17.patch, hive-1134-2010-02-18.patch, hive-1134-2010-02-19.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1134) bucketing mapjoin where the big table contains more than 1 big partition

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang updated HIVE-1134:
-------------------------------

    Attachment: hive-1134-2010-02-19.patch

Integrated Namit's comments. Thanks Namit.

> bucketing mapjoin where the big table contains more than 1 big partition
> ------------------------------------------------------------------------
>
>                 Key: HIVE-1134
>                 URL: https://issues.apache.org/jira/browse/HIVE-1134
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: He Yongqiang
>             Fix For: 0.6.0
>
>         Attachments: hive-1134-2010-02-17.patch, hive-1134-2010-02-18.patch, hive-1134-2010-02-19.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1134) bucketing mapjoin where the big table contains more than 1 big partition

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-1134:
-----------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]
          Status: Resolved  (was: Patch Available)

Committed. Thanks Yongqiang

> bucketing mapjoin where the big table contains more than 1 big partition
> ------------------------------------------------------------------------
>
>                 Key: HIVE-1134
>                 URL: https://issues.apache.org/jira/browse/HIVE-1134
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: He Yongqiang
>             Fix For: 0.6.0
>
>         Attachments: hive-1134-2010-02-17.patch, hive-1134-2010-02-18.patch, hive-1134-2010-02-19.patch, hive-1134-2010-02-20.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1134) bucketing mapjoin where the big table contains more than 1 big table

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834446#action_12834446 ] 

Namit Jain commented on HIVE-1134:
----------------------------------

Some pending work from https://issues.apache.org/jira/browse/HIVE-917 - you can do that in separate jira if you want to.

1. Add the mapping in explain plan so that it can be compared - look at
    https://issues.apache.org/jira/browse/HIVE-976

2. Add a negative test - the number of buckets in the 2 tables are not exact multiples of each other. 
    I mean, bucketed map join will not be used.

3. Instead of checking at runtime, set the defultbucketmatcher in the plan and initialize it using reflection

> bucketing mapjoin where the big table contains more than 1 big table
> --------------------------------------------------------------------
>
>                 Key: HIVE-1134
>                 URL: https://issues.apache.org/jira/browse/HIVE-1134
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: He Yongqiang
>             Fix For: 0.6.0
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1134) bucketing mapjoin where the big table contains more than 1 big partition

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang updated HIVE-1134:
-------------------------------

    Status: Patch Available  (was: Open)

> bucketing mapjoin where the big table contains more than 1 big partition
> ------------------------------------------------------------------------
>
>                 Key: HIVE-1134
>                 URL: https://issues.apache.org/jira/browse/HIVE-1134
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: He Yongqiang
>             Fix For: 0.6.0
>
>         Attachments: hive-1134-2010-02-17.patch, hive-1134-2010-02-18.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.