You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "Steve Hoffman (JIRA)" <ji...@apache.org> on 2014/03/08 05:07:42 UTC

[jira] [Commented] (HIVE-6589) Automatically add partitions for external tables

    [ https://issues.apache.org/jira/browse/HIVE-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13924704#comment-13924704 ] 

Steve Hoffman commented on HIVE-6589:
-------------------------------------

This is a great idea to have an alternate way to specify partitioning.  Having a cron job create partitions over and over into the database when there is such a clear programmatic pattern is just silly (which is how I deal with this today -- and it stinks).

I should only need to specify the root directory and some directory pattern mapping (as in Ken's example above).

This is very important for external tables where hive isn't managing the data directories and streaming data which is ALWAYS creating new partitions.

> Automatically add partitions for external tables
> ------------------------------------------------
>
>                 Key: HIVE-6589
>                 URL: https://issues.apache.org/jira/browse/HIVE-6589
>             Project: Hive
>          Issue Type: New Feature
>    Affects Versions: 0.10.0
>            Reporter: Ken Dallmeyer
>
> I have a data stream being loaded into Hadoop via Flume. It loads into a date partition folder in HDFS.  The path looks like this:
> {code}/flume/my_data/YYYY/MM/DD/HH
> /flume/my_data/2014/03/02/01
> /flume/my_data/2014/03/02/02
> /flume/my_data/2014/03/02/03{code}
> On top of it I create an EXTERNAL hive table to do querying.  As of now, I have to manually add partitions.  What I want is for EXTERNAL tables, Hive should "discover" those partitions.  Additionally I would like to specify a partition pattern so that when I query Hive will know to use the partition pattern to find the HDFS folder.
> So something like this:
> {code}CREATE EXTERNAL TABLE my_data (
>   col1 STRING,
>   col2 INT
> )
> PARTITIONED BY (
>   dt STRING,
>   hour STRING
> )
> LOCATION 
>   '/flume/mydata'
> TBLPROPERTIES (
>   'hive.partition.spec' = 'dt=$Y-$M-$D, hour=$H',
>   'hive.partition.spec.location' = '$Y/$M/$D/$H',
> );
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)