You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "Ning Zhang (JIRA)" <ji...@apache.org> on 2009/11/17 02:39:39 UTC

[jira] Created: (HIVE-936) dynamic partitions creation based on values

dynamic partitions creation based on values
-------------------------------------------

                 Key: HIVE-936
                 URL: https://issues.apache.org/jira/browse/HIVE-936
             Project: Hadoop Hive
          Issue Type: New Feature
            Reporter: Ning Zhang


If a Hive table is created as partitioned, DML could only inserted into one partitioin per query. Ideally partitions should be created on the fly based on the value of the partition columns. As an example:
{{{
  create table T (a int, b string) partitioned by (ds string);
  insert overwrite table T select a, b, ds from S where ds >= '2009-11-01' and ds <= '2009-11-16';
}}}
should be able to execute in one DML rather than possibley 16 DML for each distinct ds values. CTAS and alter table should be able to do the same thing:
{{{
  create table T partitioned by (ds string) as select * from S where ds >= '2009-11-01' and ds <= '2009-11-16';
}}}
 and
{{{
  create table T(a int, b string, ds string);
  insert overwrite table T select * from S where ds >= '2009-11-1' and ds <= '2009-11-16';
  alter table T partitioned by (ds);
}}}
should all return the same results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-936) dynamic partitions creation based on values

Posted by "Jay Booth (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778929#action_12778929 ] 

Jay Booth commented on HIVE-936:
--------------------------------

This would be super useful..  only issue I can see is inserting super sparse data, we've had issues with unbounded MultipleOutputs, eventually you run out of RAM, XCeivers or both.  I suppose it could just be a client responsibility to make sure your query doesn't generate enough outputs to break it..  still better than not having the functionality at all.

> dynamic partitions creation based on values
> -------------------------------------------
>
>                 Key: HIVE-936
>                 URL: https://issues.apache.org/jira/browse/HIVE-936
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Ning Zhang
>
> If a Hive table is created as partitioned, DML could only inserted into one partitioin per query. Ideally partitions should be created on the fly based on the value of the partition columns. As an example:
> {{{
>   create table T (a int, b string) partitioned by (ds string);
>   insert overwrite table T select a, b, ds from S where ds >= '2009-11-01' and ds <= '2009-11-16';
> }}}
> should be able to execute in one DML rather than possibley 16 DML for each distinct ds values. CTAS and alter table should be able to do the same thing:
> {{{
>   create table T partitioned by (ds string) as select * from S where ds >= '2009-11-01' and ds <= '2009-11-16';
> }}}
>  and
> {{{
>   create table T(a int, b string, ds string);
>   insert overwrite table T select * from S where ds >= '2009-11-1' and ds <= '2009-11-16';
>   alter table T partitioned by (ds);
> }}}
> should all return the same results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-936) dynamic partitions creation based on values

Posted by "Prasad Chakka (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778935#action_12778935 ] 

Prasad Chakka commented on HIVE-936:
------------------------------------

@Namit: you can't change the partition keys of a table.

> dynamic partitions creation based on values
> -------------------------------------------
>
>                 Key: HIVE-936
>                 URL: https://issues.apache.org/jira/browse/HIVE-936
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>
> If a Hive table is created as partitioned, DML could only inserted into one partitioin per query. Ideally partitions should be created on the fly based on the value of the partition columns. As an example:
> {{{
>   create table T (a int, b string) partitioned by (ds string);
>   insert overwrite table T select a, b, ds from S where ds >= '2009-11-01' and ds <= '2009-11-16';
> }}}
> should be able to execute in one DML rather than possibley 16 DML for each distinct ds values. CTAS and alter table should be able to do the same thing:
> {{{
>   create table T partitioned by (ds string) as select * from S where ds >= '2009-11-01' and ds <= '2009-11-16';
> }}}
>  and
> {{{
>   create table T(a int, b string, ds string);
>   insert overwrite table T select * from S where ds >= '2009-11-1' and ds <= '2009-11-16';
>   alter table T partitioned by (ds);
> }}}
> should all return the same results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-936) dynamic partitions creation based on values

Posted by "Jeff Hammerbacher (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778836#action_12778836 ] 

Jeff Hammerbacher commented on HIVE-936:
----------------------------------------

Probably the most requested feature from our customers. Getting a patch in would be a big usability improvement for Hive!

> dynamic partitions creation based on values
> -------------------------------------------
>
>                 Key: HIVE-936
>                 URL: https://issues.apache.org/jira/browse/HIVE-936
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Ning Zhang
>
> If a Hive table is created as partitioned, DML could only inserted into one partitioin per query. Ideally partitions should be created on the fly based on the value of the partition columns. As an example:
> {{{
>   create table T (a int, b string) partitioned by (ds string);
>   insert overwrite table T select a, b, ds from S where ds >= '2009-11-01' and ds <= '2009-11-16';
> }}}
> should be able to execute in one DML rather than possibley 16 DML for each distinct ds values. CTAS and alter table should be able to do the same thing:
> {{{
>   create table T partitioned by (ds string) as select * from S where ds >= '2009-11-01' and ds <= '2009-11-16';
> }}}
>  and
> {{{
>   create table T(a int, b string, ds string);
>   insert overwrite table T select * from S where ds >= '2009-11-1' and ds <= '2009-11-16';
>   alter table T partitioned by (ds);
> }}}
> should all return the same results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-936) dynamic partitions creation based on values

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778938#action_12778938 ] 

Namit Jain commented on HIVE-936:
---------------------------------

I think, for 936, we dont need alter table - we can see if we want to support something like that in 938.
I will copy your comments there for now.

> dynamic partitions creation based on values
> -------------------------------------------
>
>                 Key: HIVE-936
>                 URL: https://issues.apache.org/jira/browse/HIVE-936
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>
> If a Hive table is created as partitioned, DML could only inserted into one partitioin per query. Ideally partitions should be created on the fly based on the value of the partition columns. As an example:
> {{{
>   create table T (a int, b string) partitioned by (ds string);
>   insert overwrite table T select a, b, ds from S where ds >= '2009-11-01' and ds <= '2009-11-16';
> }}}
> should be able to execute in one DML rather than possibley 16 DML for each distinct ds values. CTAS and alter table should be able to do the same thing:
> {{{
>   create table T partitioned by (ds string) as select * from S where ds >= '2009-11-01' and ds <= '2009-11-16';
> }}}
>  and
> {{{
>   create table T(a int, b string, ds string);
>   insert overwrite table T select * from S where ds >= '2009-11-1' and ds <= '2009-11-16';
>   alter table T partitioned by (ds);
> }}}
> should all return the same results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-936) dynamic partitions creation based on values

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778933#action_12778933 ] 

Namit Jain commented on HIVE-936:
---------------------------------

@Jay, we were thinking of having a configurable variable to control the number of partitions a statement can create. 

> dynamic partitions creation based on values
> -------------------------------------------
>
>                 Key: HIVE-936
>                 URL: https://issues.apache.org/jira/browse/HIVE-936
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>
> If a Hive table is created as partitioned, DML could only inserted into one partitioin per query. Ideally partitions should be created on the fly based on the value of the partition columns. As an example:
> {{{
>   create table T (a int, b string) partitioned by (ds string);
>   insert overwrite table T select a, b, ds from S where ds >= '2009-11-01' and ds <= '2009-11-16';
> }}}
> should be able to execute in one DML rather than possibley 16 DML for each distinct ds values. CTAS and alter table should be able to do the same thing:
> {{{
>   create table T partitioned by (ds string) as select * from S where ds >= '2009-11-01' and ds <= '2009-11-16';
> }}}
>  and
> {{{
>   create table T(a int, b string, ds string);
>   insert overwrite table T select * from S where ds >= '2009-11-1' and ds <= '2009-11-16';
>   alter table T partitioned by (ds);
> }}}
> should all return the same results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HIVE-936) dynamic partitions creation based on values

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain reassigned HIVE-936:
-------------------------------

    Assignee: Ning Zhang

I agree this is a important feature and we will start working into it

> dynamic partitions creation based on values
> -------------------------------------------
>
>                 Key: HIVE-936
>                 URL: https://issues.apache.org/jira/browse/HIVE-936
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>
> If a Hive table is created as partitioned, DML could only inserted into one partitioin per query. Ideally partitions should be created on the fly based on the value of the partition columns. As an example:
> {{{
>   create table T (a int, b string) partitioned by (ds string);
>   insert overwrite table T select a, b, ds from S where ds >= '2009-11-01' and ds <= '2009-11-16';
> }}}
> should be able to execute in one DML rather than possibley 16 DML for each distinct ds values. CTAS and alter table should be able to do the same thing:
> {{{
>   create table T partitioned by (ds string) as select * from S where ds >= '2009-11-01' and ds <= '2009-11-16';
> }}}
>  and
> {{{
>   create table T(a int, b string, ds string);
>   insert overwrite table T select * from S where ds >= '2009-11-1' and ds <= '2009-11-16';
>   alter table T partitioned by (ds);
> }}}
> should all return the same results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-936) dynamic partitions creation based on values

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ning Zhang updated HIVE-936:
----------------------------

    Attachment: dp_design.txt

Uploading a design notes. Comments and suggestions are welcome. 

> dynamic partitions creation based on values
> -------------------------------------------
>
>                 Key: HIVE-936
>                 URL: https://issues.apache.org/jira/browse/HIVE-936
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>         Attachments: dp_design.txt
>
>
> If a Hive table is created as partitioned, DML could only inserted into one partitioin per query. Ideally partitions should be created on the fly based on the value of the partition columns. As an example:
> {{{
>   create table T (a int, b string) partitioned by (ds string);
>   insert overwrite table T select a, b, ds from S where ds >= '2009-11-01' and ds <= '2009-11-16';
> }}}
> should be able to execute in one DML rather than possibley 16 DML for each distinct ds values. CTAS and alter table should be able to do the same thing:
> {{{
>   create table T partitioned by (ds string) as select * from S where ds >= '2009-11-01' and ds <= '2009-11-16';
> }}}
>  and
> {{{
>   create table T(a int, b string, ds string);
>   insert overwrite table T select * from S where ds >= '2009-11-1' and ds <= '2009-11-16';
>   alter table T partitioned by (ds);
> }}}
> should all return the same results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-936) dynamic partitions creation based on values

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852506#action_12852506 ] 

Ning Zhang commented on HIVE-936:
---------------------------------

A patch for HIVE-1002 has been upload. That patch deals with insert only. Please take a look and review. I will upload a patch for CTAS on this JIRA. 

> dynamic partitions creation based on values
> -------------------------------------------
>
>                 Key: HIVE-936
>                 URL: https://issues.apache.org/jira/browse/HIVE-936
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>         Attachments: dp_design.txt
>
>
> If a Hive table is created as partitioned, DML could only inserted into one partitioin per query. Ideally partitions should be created on the fly based on the value of the partition columns. As an example:
> {{{
>   create table T (a int, b string) partitioned by (ds string);
>   insert overwrite table T select a, b, ds from S where ds >= '2009-11-01' and ds <= '2009-11-16';
> }}}
> should be able to execute in one DML rather than possibley 16 DML for each distinct ds values. CTAS and alter table should be able to do the same thing:
> {{{
>   create table T partitioned by (ds string) as select * from S where ds >= '2009-11-01' and ds <= '2009-11-16';
> }}}
>  and
> {{{
>   create table T(a int, b string, ds string);
>   insert overwrite table T select * from S where ds >= '2009-11-1' and ds <= '2009-11-16';
>   alter table T partitioned by (ds);
> }}}
> should all return the same results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-936) dynamic partitions creation based on values

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793306#action_12793306 ] 

Zheng Shao commented on HIVE-936:
---------------------------------

Another case not covered by previous discussions (copied from HIVE-1002):

{code}
CREATE TABLE (a STRING, b STRING, c STRING)
PARTITIONED BY (ds STRING, ts STRING);

INSERT OVERWRITE TABLE x PARTITION (ds = '2009-12-12')
SELECT a, b, c, ts FROM xxx;
{code}

Basically, allowing users to overwrite multiple partitions at a time.
The partition values specified in PARTITION part (if any) should be a prefix of the partition keys (to simply implementation).
The rest of the partition keys goes to the end of the SELECT expression list.

This query will remove any existing partitions of ds = '2009-12-12' (and any ts value), and create partitions with ds = '2009-12-12' and ts of dynamical values.



> dynamic partitions creation based on values
> -------------------------------------------
>
>                 Key: HIVE-936
>                 URL: https://issues.apache.org/jira/browse/HIVE-936
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>
> If a Hive table is created as partitioned, DML could only inserted into one partitioin per query. Ideally partitions should be created on the fly based on the value of the partition columns. As an example:
> {{{
>   create table T (a int, b string) partitioned by (ds string);
>   insert overwrite table T select a, b, ds from S where ds >= '2009-11-01' and ds <= '2009-11-16';
> }}}
> should be able to execute in one DML rather than possibley 16 DML for each distinct ds values. CTAS and alter table should be able to do the same thing:
> {{{
>   create table T partitioned by (ds string) as select * from S where ds >= '2009-11-01' and ds <= '2009-11-16';
> }}}
>  and
> {{{
>   create table T(a int, b string, ds string);
>   insert overwrite table T select * from S where ds >= '2009-11-1' and ds <= '2009-11-16';
>   alter table T partitioned by (ds);
> }}}
> should all return the same results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-936) dynamic partitions creation based on values

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ning Zhang updated HIVE-936:
----------------------------

    Attachment: dp_design.txt

Updated design notes after a group discussion.

> dynamic partitions creation based on values
> -------------------------------------------
>
>                 Key: HIVE-936
>                 URL: https://issues.apache.org/jira/browse/HIVE-936
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>         Attachments: dp_design.txt, dp_design.txt
>
>
> If a Hive table is created as partitioned, DML could only inserted into one partitioin per query. Ideally partitions should be created on the fly based on the value of the partition columns. As an example:
> {{{
>   create table T (a int, b string) partitioned by (ds string);
>   insert overwrite table T select a, b, ds from S where ds >= '2009-11-01' and ds <= '2009-11-16';
> }}}
> should be able to execute in one DML rather than possibley 16 DML for each distinct ds values. CTAS and alter table should be able to do the same thing:
> {{{
>   create table T partitioned by (ds string) as select * from S where ds >= '2009-11-01' and ds <= '2009-11-16';
> }}}
>  and
> {{{
>   create table T(a int, b string, ds string);
>   insert overwrite table T select * from S where ds >= '2009-11-1' and ds <= '2009-11-16';
>   alter table T partitioned by (ds);
> }}}
> should all return the same results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-936) dynamic partitions creation based on values

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778932#action_12778932 ] 

Namit Jain commented on HIVE-936:
---------------------------------

{ create table T(a int, b string, ds string); insert overwrite table T select * from S where ds >= '2009-11-1' and ds <= '2009-11-16'; alter table T partitioned by (ds); }

is a change of behavior - currently, DDLs never perform a data operation - let us discuss this more in a follow-up jira - maybe add
a force option or something like that

> dynamic partitions creation based on values
> -------------------------------------------
>
>                 Key: HIVE-936
>                 URL: https://issues.apache.org/jira/browse/HIVE-936
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>
> If a Hive table is created as partitioned, DML could only inserted into one partitioin per query. Ideally partitions should be created on the fly based on the value of the partition columns. As an example:
> {{{
>   create table T (a int, b string) partitioned by (ds string);
>   insert overwrite table T select a, b, ds from S where ds >= '2009-11-01' and ds <= '2009-11-16';
> }}}
> should be able to execute in one DML rather than possibley 16 DML for each distinct ds values. CTAS and alter table should be able to do the same thing:
> {{{
>   create table T partitioned by (ds string) as select * from S where ds >= '2009-11-01' and ds <= '2009-11-16';
> }}}
>  and
> {{{
>   create table T(a int, b string, ds string);
>   insert overwrite table T select * from S where ds >= '2009-11-1' and ds <= '2009-11-16';
>   alter table T partitioned by (ds);
> }}}
> should all return the same results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-936) dynamic partitions creation based on values

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ning Zhang updated HIVE-936:
----------------------------

    Attachment:     (was: dp_design.txt)

> dynamic partitions creation based on values
> -------------------------------------------
>
>                 Key: HIVE-936
>                 URL: https://issues.apache.org/jira/browse/HIVE-936
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>         Attachments: dp_design.txt
>
>
> If a Hive table is created as partitioned, DML could only inserted into one partitioin per query. Ideally partitions should be created on the fly based on the value of the partition columns. As an example:
> {{{
>   create table T (a int, b string) partitioned by (ds string);
>   insert overwrite table T select a, b, ds from S where ds >= '2009-11-01' and ds <= '2009-11-16';
> }}}
> should be able to execute in one DML rather than possibley 16 DML for each distinct ds values. CTAS and alter table should be able to do the same thing:
> {{{
>   create table T partitioned by (ds string) as select * from S where ds >= '2009-11-01' and ds <= '2009-11-16';
> }}}
>  and
> {{{
>   create table T(a int, b string, ds string);
>   insert overwrite table T select * from S where ds >= '2009-11-1' and ds <= '2009-11-16';
>   alter table T partitioned by (ds);
> }}}
> should all return the same results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.