You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Namit Jain (JIRA)" <ji...@apache.org> on 2012/07/16 09:02:33 UTC

[jira] [Created] (HIVE-3260) support bucketed mapjoin where the small table has different number of buckets for different partitons

Namit Jain created HIVE-3260:
--------------------------------

             Summary: support bucketed mapjoin where the small table has different number of buckets for different partitons
                 Key: HIVE-3260
                 URL: https://issues.apache.org/jira/browse/HIVE-3260
             Project: Hive
          Issue Type: Bug
            Reporter: Namit Jain


Consider the following scenario:

A (1 partition) join B (2 partitions)

A has 2 buckets, whereas B has 2 and 4 buckets for different partitions.

The bucketed mapjoin should still work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-3260) support bucketed mapjoin where the small table has different number of buckets for different partitons

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-3260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13415004#comment-13415004 ] 

Namit Jain commented on HIVE-3260:
----------------------------------

I agree - this code is not the difficult part.

But, we need to make sure everywhere partition metadata is getting used, like in sampling etc.

Let us defer it for now, and get back to it later.
Do you want to open the jira for now ?
                
> support bucketed mapjoin where the small table has different number of buckets for different partitons
> ------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-3260
>                 URL: https://issues.apache.org/jira/browse/HIVE-3260
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Namit Jain
>
> Consider the following scenario:
> A (1 partition) join B (2 partitions)
> A has 2 buckets, whereas B has 2 and 4 buckets for different partitions.
> The bucketed mapjoin should still work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-3260) support bucketed mapjoin where the small table has different number of buckets for different partitons

Posted by "Navis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-3260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13414916#comment-13414916 ] 

Navis commented on HIVE-3260:
-----------------------------

I couldn't find ddl for creating partition with bucket number.

>From org.apache.hadoop.hive.ql.metadata.Partition
{code}
/**
   * The number of buckets is a property of the partition. However - internally
   * we are just storing it as a property of the table as a short term measure.
   */
  public int getBucketCount() {
    return table.getNumBuckets();
{code}
Does this mean that it's not possible?
                
> support bucketed mapjoin where the small table has different number of buckets for different partitons
> ------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-3260
>                 URL: https://issues.apache.org/jira/browse/HIVE-3260
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Namit Jain
>
> Consider the following scenario:
> A (1 partition) join B (2 partitions)
> A has 2 buckets, whereas B has 2 and 4 buckets for different partitions.
> The bucketed mapjoin should still work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (HIVE-3260) support bucketed mapjoin where the small table has different number of buckets for different partitons

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-3260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain resolved HIVE-3260.
------------------------------

    Resolution: Won't Fix

The correct fix would be https://issues.apache.org/jira/browse/HIVE-3261
                
> support bucketed mapjoin where the small table has different number of buckets for different partitons
> ------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-3260
>                 URL: https://issues.apache.org/jira/browse/HIVE-3260
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Namit Jain
>
> Consider the following scenario:
> A (1 partition) join B (2 partitions)
> A has 2 buckets, whereas B has 2 and 4 buckets for different partitions.
> The bucketed mapjoin should still work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-3260) support bucketed mapjoin where the small table has different number of buckets for different partitons

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-3260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13414958#comment-13414958 ] 

Namit Jain commented on HIVE-3260:
----------------------------------

I agree - there is a alter table command


alter TABLE <TBL_NAME> CLUSTERED BY (<COLS>) INTO <n> BUCKETS;

But, the partition number of buckets are not used anywhere.

Should we just disallow alter table number of buckets in strict mode if there are some partitions present in the table ?
                
> support bucketed mapjoin where the small table has different number of buckets for different partitons
> ------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-3260
>                 URL: https://issues.apache.org/jira/browse/HIVE-3260
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Namit Jain
>
> Consider the following scenario:
> A (1 partition) join B (2 partitions)
> A has 2 buckets, whereas B has 2 and 4 buckets for different partitions.
> The bucketed mapjoin should still work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-3260) support bucketed mapjoin where the small table has different number of buckets for different partitons

Posted by "Navis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-3260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13414980#comment-13414980 ] 

Navis commented on HIVE-3260:
-----------------------------

It is right thing which should be done. 

But why not support buckets per partition? Would it be enough to change the code above to 
{code}
int bucketNum = sd.getNumBuckets();
return bucketNum < 0 ? table.getNumBuckets() : bucketNum;
{code}
and add some DDL syntax for partition?
                
> support bucketed mapjoin where the small table has different number of buckets for different partitons
> ------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-3260
>                 URL: https://issues.apache.org/jira/browse/HIVE-3260
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Namit Jain
>
> Consider the following scenario:
> A (1 partition) join B (2 partitions)
> A has 2 buckets, whereas B has 2 and 4 buckets for different partitions.
> The bucketed mapjoin should still work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Reopened] (HIVE-3260) support bucketed mapjoin where the small table has different number of buckets for different partitons

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-3260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain reopened HIVE-3260:
------------------------------

    
> support bucketed mapjoin where the small table has different number of buckets for different partitons
> ------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-3260
>                 URL: https://issues.apache.org/jira/browse/HIVE-3260
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Namit Jain
>
> Consider the following scenario:
> A (1 partition) join B (2 partitions)
> A has 2 buckets, whereas B has 2 and 4 buckets for different partitions.
> The bucketed mapjoin should still work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira