You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Nirav Patel <np...@xactlycorp.com> on 2017/08/03 18:00:35 UTC

Wildcard and Automatic partitioning

Hi, is there a way in hive when I create an external table I can specify
wild card in LOCATION and have hive automatically identify partitions.

I have opened HIVE-17236 <https://issues.apache.org/jira/browse/HIVE-17236> for
wildcard support. same time I also have issue of specifying partitions.  I
can use tedious ALTER TABLE to add partition directory. But since data
already exist in separate partition why can't hive identify it ?

Here's such example of a directory:
/user/mycomp/customers/*/departments/partition/*

I can have n number of customer and for each m number of partition for
departments object.

I think if I use following sql to create external table then it should be
able to identify all the partitions.

CREATE EXTERNAL TABLE testTable (val map<string, string>)
PARTITIONED BY (period string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LOCATION '/user/mycomp/customers/*/departments/partition/*';

If my partition directory (lets say p-12345) have multiple files insides it
that doesn't start with "part-" prefix then I should be able to specify
that prefix so hive can find the right filesets.

Thanks

-- 


[image: What's New with Xactly] <http://www.xactlycorp.com/email-click/>

<https://www.nyse.com/quote/XNYS:XTLY>  [image: LinkedIn] 
<https://www.linkedin.com/company/xactly-corporation>  [image: Twitter] 
<https://twitter.com/Xactly>  [image: Facebook] 
<https://www.facebook.com/XactlyCorp>  [image: YouTube] 
<http://www.youtube.com/xactlycorporation>

Re: Wildcard and Automatic partitioning

Posted by Nirav Patel <np...@xactlycorp.com>.
Thanks for config for recursive read. However it is not working. Do the
file have to have to be in particular format?

I am using LOCATION '/user/mycomp/customers/228/departments/partition/';

my subdirectories are like 469990542/dept.tsv, 469990543/dept.tsv.



Also I am still not sure why hive need specific dir format to learn
partitions.

In Following example:

CREATE EXTERNAL TABLE testTable (id Int, period Int, amount Double)
LOCATION '/user/mycomp/customers/228/departments/partition/';
PARTITIONED BY (period string)

All my partition dirs are under
"/user/mycomp/customers/228/departments/partition/"
in form of Integer names(469990543, 469990544). When I say
"PARTITIONED BY period"
hive can assume that 'period' is the key for all those subdirectories and
can make sure that period is also defined in schema. What more hive can
learn from redundant key part in dir name(period= 469990543,
period=469990544) that it can not by the convention I mentiond. If there
are multiple partition columns it can follow same principal.



On Thu, Aug 3, 2017 at 12:38 PM, Sergey Shelukhin <se...@hortonworks.com>
wrote:

> How would Hive determine partition keys for partitioning from arbitrary
> directory structure? There has to be some format, and there already is.
> Also columns for keys need to be added to the table, with types.
> For reading it without partitions, Hive already supports mapred.input.dir.recursive,
> which would read all the nested directories. In fact iirc it’s on by
> default if Tez is used.
>
> From: Nirav Patel <np...@xactlycorp.com>
> Reply-To: "user@hive.apache.org" <us...@hive.apache.org>
> Date: Thursday, August 3, 2017 at 11:41
> To: "user@hive.apache.org" <us...@hive.apache.org>
> Subject: Re: Wildcard and Automatic partitioning
>
> What is the point if I have to rename hdfs directories to hive
> format  (key=value/key=value etc.).  Why it can't just be normal directory
> like everyone has. That entire directory can be considered as a key
> ("period" in my example) and hive add all values as partitions.
>
> On Thu, Aug 3, 2017 at 11:25 AM, Sergey Shelukhin <se...@hortonworks.com>
> wrote:
>
>> The typical, although not technically intended for the purpose, approach
>> is to use msck to “repair” the table and create the partitions. The
>> partitions have to be in the standard Hive format (key=value/key=value
>> etc.) and the table must be created with the corresponding partition keys.
>> It may actually be good to have a feature to do it in a standard manner
>> for external tables only, however it would probably be restricted to the
>> same format. So, the example below probably won’t work because of the star
>> in the middle.
>>
>>
>> From: Nirav Patel <np...@xactlycorp.com>
>> Reply-To: "user@hive.apache.org" <us...@hive.apache.org>
>> Date: Thursday, August 3, 2017 at 11:00
>> To: "user@hive.apache.org" <us...@hive.apache.org>
>> Subject: Wildcard and Automatic partitioning
>>
>> Hi, is there a way in hive when I create an external table I can specify
>> wild card in LOCATION and have hive automatically identify partitions.
>>
>> I have opened HIVE-17236
>> <https://issues.apache.org/jira/browse/HIVE-17236> for wildcard support.
>> same time I also have issue of specifying partitions.  I can use tedious
>> ALTER TABLE to add partition directory. But since data already exist in
>> separate partition why can't hive identify it ?
>>
>> Here's such example of a directory:
>> /user/mycomp/customers/*/departments/partition/*
>>
>> I can have n number of customer and for each m number of partition for
>> departments object.
>>
>> I think if I use following sql to create external table then it should be
>> able to identify all the partitions.
>>
>> CREATE EXTERNAL TABLE testTable (val map<string, string>)
>> PARTITIONED BY (period string)
>> ROW FORMAT DELIMITED
>> FIELDS TERMINATED BY '\t'
>> LOCATION '/user/mycomp/customers/*/departments/partition/*';
>>
>> If my partition directory (lets say p-12345) have multiple files insides
>> it that doesn't start with "part-" prefix then I should be able to specify
>> that prefix so hive can find the right filesets.
>>
>> Thanks
>>
>>
>>
>> [image: What's New with Xactly] <http://www.xactlycorp.com/email-click/>
>>
>> <https://www.nyse.com/quote/XNYS:XTLY>  [image: LinkedIn]
>> <https://www.linkedin.com/company/xactly-corporation>  [image: Twitter]
>> <https://twitter.com/Xactly>  [image: Facebook]
>> <https://www.facebook.com/XactlyCorp>  [image: YouTube]
>> <http://www.youtube.com/xactlycorporation>
>>
>
>
>
>
> [image: What's New with Xactly] <http://www.xactlycorp.com/email-click/>
>
> <https://www.nyse.com/quote/XNYS:XTLY>  [image: LinkedIn]
> <https://www.linkedin.com/company/xactly-corporation>  [image: Twitter]
> <https://twitter.com/Xactly>  [image: Facebook]
> <https://www.facebook.com/XactlyCorp>  [image: YouTube]
> <http://www.youtube.com/xactlycorporation>
>

-- 


[image: What's New with Xactly] <http://www.xactlycorp.com/email-click/>

<https://www.nyse.com/quote/XNYS:XTLY>  [image: LinkedIn] 
<https://www.linkedin.com/company/xactly-corporation>  [image: Twitter] 
<https://twitter.com/Xactly>  [image: Facebook] 
<https://www.facebook.com/XactlyCorp>  [image: YouTube] 
<http://www.youtube.com/xactlycorporation>

Re: Wildcard and Automatic partitioning

Posted by Sergey Shelukhin <se...@hortonworks.com>.
How would Hive determine partition keys for partitioning from arbitrary directory structure? There has to be some format, and there already is. Also columns for keys need to be added to the table, with types.
For reading it without partitions, Hive already supports mapred.input.dir.recursive, which would read all the nested directories. In fact iirc it’s on by default if Tez is used.

From: Nirav Patel <np...@xactlycorp.com>>
Reply-To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Date: Thursday, August 3, 2017 at 11:41
To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Subject: Re: Wildcard and Automatic partitioning

What is the point if I have to rename hdfs directories to hive format  (key=value/key=value etc.).  Why it can't just be normal directory like everyone has. That entire directory can be considered as a key ("period" in my example) and hive add all values as partitions.

On Thu, Aug 3, 2017 at 11:25 AM, Sergey Shelukhin <se...@hortonworks.com>> wrote:
The typical, although not technically intended for the purpose, approach is to use msck to “repair” the table and create the partitions. The partitions have to be in the standard Hive format (key=value/key=value etc.) and the table must be created with the corresponding partition keys.
It may actually be good to have a feature to do it in a standard manner for external tables only, however it would probably be restricted to the same format. So, the example below probably won’t work because of the star in the middle.


From: Nirav Patel <np...@xactlycorp.com>>
Reply-To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Date: Thursday, August 3, 2017 at 11:00
To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Subject: Wildcard and Automatic partitioning

Hi, is there a way in hive when I create an external table I can specify wild card in LOCATION and have hive automatically identify partitions.

I have opened HIVE-17236<https://issues.apache.org/jira/browse/HIVE-17236> for wildcard support. same time I also have issue of specifying partitions.  I can use tedious ALTER TABLE to add partition directory. But since data already exist in separate partition why can't hive identify it ?

Here's such example of a directory:
/user/mycomp/customers/*/departments/partition/*

I can have n number of customer and for each m number of partition for departments object.

I think if I use following sql to create external table then it should be able to identify all the partitions.

CREATE EXTERNAL TABLE testTable (val map<string, string>)
PARTITIONED BY (period string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LOCATION '/user/mycomp/customers/*/departments/partition/*';

If my partition directory (lets say p-12345) have multiple files insides it that doesn't start with "part-" prefix then I should be able to specify that prefix so hive can find the right filesets.

Thanks



[What's New with Xactly]<http://www.xactlycorp.com/email-click/>

[https://www.xactlycorp.com/wp-content/uploads/2015/07/nyse_xtly_alt_24.png]<https://www.nyse.com/quote/XNYS:XTLY>  [LinkedIn] <https://www.linkedin.com/company/xactly-corporation>   [Twitter] <https://twitter.com/Xactly>   [Facebook] <https://www.facebook.com/XactlyCorp>   [YouTube] <http://www.youtube.com/xactlycorporation>




[What's New with Xactly]<http://www.xactlycorp.com/email-click/>

[https://www.xactlycorp.com/wp-content/uploads/2015/07/nyse_xtly_alt_24.png]<https://www.nyse.com/quote/XNYS:XTLY>  [LinkedIn] <https://www.linkedin.com/company/xactly-corporation>   [Twitter] <https://twitter.com/Xactly>   [Facebook] <https://www.facebook.com/XactlyCorp>   [YouTube] <http://www.youtube.com/xactlycorporation>

Re: Wildcard and Automatic partitioning

Posted by Nirav Patel <np...@xactlycorp.com>.
What is the point if I have to rename hdfs directories to hive
format  (key=value/key=value etc.).  Why it can't just be normal directory
like everyone has. That entire directory can be considered as a key
("period" in my example) and hive add all values as partitions.

On Thu, Aug 3, 2017 at 11:25 AM, Sergey Shelukhin <se...@hortonworks.com>
wrote:

> The typical, although not technically intended for the purpose, approach
> is to use msck to “repair” the table and create the partitions. The
> partitions have to be in the standard Hive format (key=value/key=value
> etc.) and the table must be created with the corresponding partition keys.
> It may actually be good to have a feature to do it in a standard manner
> for external tables only, however it would probably be restricted to the
> same format. So, the example below probably won’t work because of the star
> in the middle.
>
>
> From: Nirav Patel <np...@xactlycorp.com>
> Reply-To: "user@hive.apache.org" <us...@hive.apache.org>
> Date: Thursday, August 3, 2017 at 11:00
> To: "user@hive.apache.org" <us...@hive.apache.org>
> Subject: Wildcard and Automatic partitioning
>
> Hi, is there a way in hive when I create an external table I can specify
> wild card in LOCATION and have hive automatically identify partitions.
>
> I have opened HIVE-17236
> <https://issues.apache.org/jira/browse/HIVE-17236> for wildcard support.
> same time I also have issue of specifying partitions.  I can use tedious
> ALTER TABLE to add partition directory. But since data already exist in
> separate partition why can't hive identify it ?
>
> Here's such example of a directory:
> /user/mycomp/customers/*/departments/partition/*
>
> I can have n number of customer and for each m number of partition for
> departments object.
>
> I think if I use following sql to create external table then it should be
> able to identify all the partitions.
>
> CREATE EXTERNAL TABLE testTable (val map<string, string>)
> PARTITIONED BY (period string)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\t'
> LOCATION '/user/mycomp/customers/*/departments/partition/*';
>
> If my partition directory (lets say p-12345) have multiple files insides
> it that doesn't start with "part-" prefix then I should be able to specify
> that prefix so hive can find the right filesets.
>
> Thanks
>
>
>
> [image: What's New with Xactly] <http://www.xactlycorp.com/email-click/>
>
> <https://www.nyse.com/quote/XNYS:XTLY>  [image: LinkedIn]
> <https://www.linkedin.com/company/xactly-corporation>  [image: Twitter]
> <https://twitter.com/Xactly>  [image: Facebook]
> <https://www.facebook.com/XactlyCorp>  [image: YouTube]
> <http://www.youtube.com/xactlycorporation>
>

-- 


[image: What's New with Xactly] <http://www.xactlycorp.com/email-click/>

<https://www.nyse.com/quote/XNYS:XTLY>  [image: LinkedIn] 
<https://www.linkedin.com/company/xactly-corporation>  [image: Twitter] 
<https://twitter.com/Xactly>  [image: Facebook] 
<https://www.facebook.com/XactlyCorp>  [image: YouTube] 
<http://www.youtube.com/xactlycorporation>

Re: Wildcard and Automatic partitioning

Posted by Sergey Shelukhin <se...@hortonworks.com>.
The typical, although not technically intended for the purpose, approach is to use msck to “repair” the table and create the partitions. The partitions have to be in the standard Hive format (key=value/key=value etc.) and the table must be created with the corresponding partition keys.
It may actually be good to have a feature to do it in a standard manner for external tables only, however it would probably be restricted to the same format. So, the example below probably won’t work because of the star in the middle.


From: Nirav Patel <np...@xactlycorp.com>>
Reply-To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Date: Thursday, August 3, 2017 at 11:00
To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Subject: Wildcard and Automatic partitioning

Hi, is there a way in hive when I create an external table I can specify wild card in LOCATION and have hive automatically identify partitions.

I have opened HIVE-17236<https://issues.apache.org/jira/browse/HIVE-17236> for wildcard support. same time I also have issue of specifying partitions.  I can use tedious ALTER TABLE to add partition directory. But since data already exist in separate partition why can't hive identify it ?

Here's such example of a directory:
/user/mycomp/customers/*/departments/partition/*

I can have n number of customer and for each m number of partition for departments object.

I think if I use following sql to create external table then it should be able to identify all the partitions.

CREATE EXTERNAL TABLE testTable (val map<string, string>)
PARTITIONED BY (period string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LOCATION '/user/mycomp/customers/*/departments/partition/*';

If my partition directory (lets say p-12345) have multiple files insides it that doesn't start with "part-" prefix then I should be able to specify that prefix so hive can find the right filesets.

Thanks



[What's New with Xactly]<http://www.xactlycorp.com/email-click/>

[https://www.xactlycorp.com/wp-content/uploads/2015/07/nyse_xtly_alt_24.png]<https://www.nyse.com/quote/XNYS:XTLY>  [LinkedIn] <https://www.linkedin.com/company/xactly-corporation>   [Twitter] <https://twitter.com/Xactly>   [Facebook] <https://www.facebook.com/XactlyCorp>   [YouTube] <http://www.youtube.com/xactlycorporation>