You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "Mark Grover (JIRA)" <ji...@apache.org> on 2012/06/01 06:24:22 UTC

[jira] [Created] (HIVE-3077) Insert overwrite table doesn't fail for bucketed tables and breaks bucketing

Mark Grover created HIVE-3077:
---------------------------------

             Summary: Insert overwrite table doesn't fail for bucketed tables and breaks bucketing
                 Key: HIVE-3077
                 URL: https://issues.apache.org/jira/browse/HIVE-3077
             Project: Hive
          Issue Type: Bug
          Components: CLI
    Affects Versions: 0.9.0, 0.8.1, 0.8.0, 0.10.0, 0.9.1
         Environment: java version "1.6.0_30"
hive version 0.9.0
hadoop version 0.20.205.0
            Reporter: Mark Grover


If table my_table is bucketed, the command "insert into table my_table ..." is supposed to give an error stating "Bucketized tables do not support INSERT INTO".

However, it doesn't seem to do that in all cases.
Consider the following example on Hive 0.9.0:
create table src(x string) clustered by(x) sorted by (x) into 32 buckets; 
create table dest(x string) clustered by(x) sorted by (x) into 32 buckets; 

Now, put some data into x (after enable hive.enforce.bucketing and hive.enforce.sorting to be true).

Then, do:
insert into table dest select * from src; 

This should fail since dest is a bucketized table. However, this succeeds creating a 33rd file inside the HDFS folder for the table, thereby corrupting it.

This happens regardless of whether the src table is bucketed or not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3077) Insert overwrite table doesn't fail for bucketed tables and breaks bucketing

Posted by "Mark Grover (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Grover updated HIVE-3077:
------------------------------

    Description: 
If table my_table is bucketed, the command "insert into table my_table ..." is supposed to give an error stating "Bucketized tables do not support INSERT INTO".

However, it doesn't seem to do that in all cases.
Consider the following example on Hive 0.9.0:
create table src(x string) clustered by( x ) sorted by ( x ) into 32 buckets; 
create table dest(x string) clustered by( x ) sorted by ( x ) into 32 buckets; 

Now, put some data into x (after enable hive.enforce.bucketing and hive.enforce.sorting to be true).

Then, do:
insert into table dest select * from src; 

This should fail since dest is a bucketized table. However, this succeeds creating a 33rd file inside the HDFS folder for the table, thereby corrupting it.

This happens regardless of whether the src table is bucketed or not.

  was:
If table my_table is bucketed, the command "insert into table my_table ..." is supposed to give an error stating "Bucketized tables do not support INSERT INTO".

However, it doesn't seem to do that in all cases.
Consider the following example on Hive 0.9.0:
create table src(x string) clustered by(x) sorted by (x) into 32 buckets; 
create table dest(x string) clustered by(x) sorted by (x) into 32 buckets; 

Now, put some data into x (after enable hive.enforce.bucketing and hive.enforce.sorting to be true).

Then, do:
insert into table dest select * from src; 

This should fail since dest is a bucketized table. However, this succeeds creating a 33rd file inside the HDFS folder for the table, thereby corrupting it.

This happens regardless of whether the src table is bucketed or not.

    
> Insert overwrite table doesn't fail for bucketed tables and breaks bucketing
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-3077
>                 URL: https://issues.apache.org/jira/browse/HIVE-3077
>             Project: Hive
>          Issue Type: Bug
>          Components: CLI
>    Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.9.1
>         Environment: java version "1.6.0_30"
> hive version 0.9.0
> hadoop version 0.20.205.0
>            Reporter: Mark Grover
>
> If table my_table is bucketed, the command "insert into table my_table ..." is supposed to give an error stating "Bucketized tables do not support INSERT INTO".
> However, it doesn't seem to do that in all cases.
> Consider the following example on Hive 0.9.0:
> create table src(x string) clustered by( x ) sorted by ( x ) into 32 buckets; 
> create table dest(x string) clustered by( x ) sorted by ( x ) into 32 buckets; 
> Now, put some data into x (after enable hive.enforce.bucketing and hive.enforce.sorting to be true).
> Then, do:
> insert into table dest select * from src; 
> This should fail since dest is a bucketized table. However, this succeeds creating a 33rd file inside the HDFS folder for the table, thereby corrupting it.
> This happens regardless of whether the src table is bucketed or not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira