You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by abhishek <ab...@gmail.com> on 2012/12/18 23:39:03 UTC

Partitions in pig

Hi all,

I have a use case which is implemented in hive with partitions.

Say
Customer_data/2012-12-18/....
                        /2012-12-17/....
                        /2012-12-16/....
                        /
                        /

I want implement this in pig.

How will partitions work in pig?

Regards 
Abhishek 

Re: Partitions in pig

Posted by Cheolsoo Park <ch...@cloudera.com>.
To be clear, the next CDH release is going to include HCatalog.

Thanks,
Cheolsoo


On Tue, Dec 18, 2012 at 3:13 PM, Russell Jurney <ru...@gmail.com>wrote:

> This is what HCatalog and Pig's HCatStorage is for, to access data
> from Hive from Pig. Unfortunately you are running CDH, which doesn't
> support the Apache HCatalog project. HDP includes Apache HCatalog:
> http://hortonworks.com/hdp/hdp-hcatalog-metadata-services/ More info
> on Apache HCatalog is available here:
> http://www.infoq.com/articles/HadoopMetadata
>
> However, there is an RCFile loader in Piggybank:
>
> http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/HiveColumnarLoader.java?view=markup
>
> Russell Jurney http://datasyndrome.com
>
> On Dec 18, 2012, at 2:39 PM, abhishek <ab...@gmail.com> wrote:
>
> > Hi all,
> >
> > I have a use case which is implemented in hive with partitions.
> >
> > Say
> > Customer_data/2012-12-18/....
> >                        /2012-12-17/....
> >                        /2012-12-16/....
> >                        /
> >                        /
> >
> > I want implement this in pig.
> >
> > How will partitions work in pig?
> >
> > Regards
> > Abhishek
>

Re: Partitions in pig

Posted by abhishek <ab...@gmail.com>.
It works for me thanks.

Regards
Abhi

Sent from my iPhone

On Dec 18, 2012, at 7:43 PM, Russell Jurney <ru...@gmail.com> wrote:

> It will work like so:
> http://stackoverflow.com/questions/3515481/pig-latin-load-multiple-files-from-a-date-range-part-of-the-directory-structur
> 
> Russell Jurney http://datasyndrome.com
> 
> On Dec 18, 2012, at 4:27 PM, abhishek <ab...@gmail.com> wrote:
> 
>> Directory based partition in hive.
>> 
>> Partition by date
>> 
>> Thanks
>> Abhi
>> 
>> Sent from my iPhone
>> 
>> On Dec 18, 2012, at 7:20 PM, Russell Jurney <ru...@gmail.com> wrote:
>> 
>>> Are you doing a directory-based partition with Hive, or are you
>>> letting Hive's RCFile partition data for you?
>>> 
>>> Russell Jurney http://datasyndrome.com
>>> 
>>> On Dec 18, 2012, at 4:12 PM, abhishek <ab...@gmail.com> wrote:
>>> 
>>>> Hi Russell,
>>>> 
>>>> Thanks for the reply.How RCFile loader is related to partitions?
>>>> 
>>>> I did not get your point in this.
>>>> 
>>>> Regards
>>>> Abhi
>>>> 
>>>> Sent from my iPhone
>>>> 
>>>> On Dec 18, 2012, at 6:13 PM, Russell Jurney <ru...@gmail.com> wrote:
>>>> 
>>>>> This is what HCatalog and Pig's HCatStorage is for, to access data
>>>>> from Hive from Pig. Unfortunately you are running CDH, which doesn't
>>>>> support the Apache HCatalog project. HDP includes Apache HCatalog:
>>>>> http://hortonworks.com/hdp/hdp-hcatalog-metadata-services/ More info
>>>>> on Apache HCatalog is available here:
>>>>> http://www.infoq.com/articles/HadoopMetadata
>>>>> 
>>>>> However, there is an RCFile loader in Piggybank:
>>>>> http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/HiveColumnarLoader.java?view=markup
>>>>> 
>>>>> Russell Jurney http://datasyndrome.com
>>>>> 
>>>>> On Dec 18, 2012, at 2:39 PM, abhishek <ab...@gmail.com> wrote:
>>>>> 
>>>>>> Hi all,
>>>>>> 
>>>>>> I have a use case which is implemented in hive with partitions.
>>>>>> 
>>>>>> Say
>>>>>> Customer_data/2012-12-18/....
>>>>>>                   /2012-12-17/....
>>>>>>                   /2012-12-16/....
>>>>>>                   /
>>>>>>                   /
>>>>>> 
>>>>>> I want implement this in pig.
>>>>>> 
>>>>>> How will partitions work in pig?
>>>>>> 
>>>>>> Regards
>>>>>> Abhishek

Re: Partitions in pig

Posted by abhishek <ab...@gmail.com>.
Hi Russell,

I will try this and get back to you.

Regards
Abhishek 

Sent from my iPhone

On Dec 18, 2012, at 7:43 PM, Russell Jurney <ru...@gmail.com> wrote:

> It will work like so:
> http://stackoverflow.com/questions/3515481/pig-latin-load-multiple-files-from-a-date-range-part-of-the-directory-structur
> 
> Russell Jurney http://datasyndrome.com
> 
> On Dec 18, 2012, at 4:27 PM, abhishek <ab...@gmail.com> wrote:
> 
>> Directory based partition in hive.
>> 
>> Partition by date
>> 
>> Thanks
>> Abhi
>> 
>> Sent from my iPhone
>> 
>> On Dec 18, 2012, at 7:20 PM, Russell Jurney <ru...@gmail.com> wrote:
>> 
>>> Are you doing a directory-based partition with Hive, or are you
>>> letting Hive's RCFile partition data for you?
>>> 
>>> Russell Jurney http://datasyndrome.com
>>> 
>>> On Dec 18, 2012, at 4:12 PM, abhishek <ab...@gmail.com> wrote:
>>> 
>>>> Hi Russell,
>>>> 
>>>> Thanks for the reply.How RCFile loader is related to partitions?
>>>> 
>>>> I did not get your point in this.
>>>> 
>>>> Regards
>>>> Abhi
>>>> 
>>>> Sent from my iPhone
>>>> 
>>>> On Dec 18, 2012, at 6:13 PM, Russell Jurney <ru...@gmail.com> wrote:
>>>> 
>>>>> This is what HCatalog and Pig's HCatStorage is for, to access data
>>>>> from Hive from Pig. Unfortunately you are running CDH, which doesn't
>>>>> support the Apache HCatalog project. HDP includes Apache HCatalog:
>>>>> http://hortonworks.com/hdp/hdp-hcatalog-metadata-services/ More info
>>>>> on Apache HCatalog is available here:
>>>>> http://www.infoq.com/articles/HadoopMetadata
>>>>> 
>>>>> However, there is an RCFile loader in Piggybank:
>>>>> http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/HiveColumnarLoader.java?view=markup
>>>>> 
>>>>> Russell Jurney http://datasyndrome.com
>>>>> 
>>>>> On Dec 18, 2012, at 2:39 PM, abhishek <ab...@gmail.com> wrote:
>>>>> 
>>>>>> Hi all,
>>>>>> 
>>>>>> I have a use case which is implemented in hive with partitions.
>>>>>> 
>>>>>> Say
>>>>>> Customer_data/2012-12-18/....
>>>>>>                   /2012-12-17/....
>>>>>>                   /2012-12-16/....
>>>>>>                   /
>>>>>>                   /
>>>>>> 
>>>>>> I want implement this in pig.
>>>>>> 
>>>>>> How will partitions work in pig?
>>>>>> 
>>>>>> Regards
>>>>>> Abhishek

Re: Partitions in pig

Posted by Russell Jurney <ru...@gmail.com>.
It will work like so:
http://stackoverflow.com/questions/3515481/pig-latin-load-multiple-files-from-a-date-range-part-of-the-directory-structur

Russell Jurney http://datasyndrome.com

On Dec 18, 2012, at 4:27 PM, abhishek <ab...@gmail.com> wrote:

> Directory based partition in hive.
>
> Partition by date
>
> Thanks
> Abhi
>
> Sent from my iPhone
>
> On Dec 18, 2012, at 7:20 PM, Russell Jurney <ru...@gmail.com> wrote:
>
>> Are you doing a directory-based partition with Hive, or are you
>> letting Hive's RCFile partition data for you?
>>
>> Russell Jurney http://datasyndrome.com
>>
>> On Dec 18, 2012, at 4:12 PM, abhishek <ab...@gmail.com> wrote:
>>
>>> Hi Russell,
>>>
>>> Thanks for the reply.How RCFile loader is related to partitions?
>>>
>>> I did not get your point in this.
>>>
>>> Regards
>>> Abhi
>>>
>>> Sent from my iPhone
>>>
>>> On Dec 18, 2012, at 6:13 PM, Russell Jurney <ru...@gmail.com> wrote:
>>>
>>>> This is what HCatalog and Pig's HCatStorage is for, to access data
>>>> from Hive from Pig. Unfortunately you are running CDH, which doesn't
>>>> support the Apache HCatalog project. HDP includes Apache HCatalog:
>>>> http://hortonworks.com/hdp/hdp-hcatalog-metadata-services/ More info
>>>> on Apache HCatalog is available here:
>>>> http://www.infoq.com/articles/HadoopMetadata
>>>>
>>>> However, there is an RCFile loader in Piggybank:
>>>> http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/HiveColumnarLoader.java?view=markup
>>>>
>>>> Russell Jurney http://datasyndrome.com
>>>>
>>>> On Dec 18, 2012, at 2:39 PM, abhishek <ab...@gmail.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I have a use case which is implemented in hive with partitions.
>>>>>
>>>>> Say
>>>>> Customer_data/2012-12-18/....
>>>>>                    /2012-12-17/....
>>>>>                    /2012-12-16/....
>>>>>                    /
>>>>>                    /
>>>>>
>>>>> I want implement this in pig.
>>>>>
>>>>> How will partitions work in pig?
>>>>>
>>>>> Regards
>>>>> Abhishek

Re: Partitions in pig

Posted by abhishek <ab...@gmail.com>.
Directory based partition in hive.

Partition by date

Thanks
Abhi

Sent from my iPhone

On Dec 18, 2012, at 7:20 PM, Russell Jurney <ru...@gmail.com> wrote:

> Are you doing a directory-based partition with Hive, or are you
> letting Hive's RCFile partition data for you?
> 
> Russell Jurney http://datasyndrome.com
> 
> On Dec 18, 2012, at 4:12 PM, abhishek <ab...@gmail.com> wrote:
> 
>> Hi Russell,
>> 
>> Thanks for the reply.How RCFile loader is related to partitions?
>> 
>> I did not get your point in this.
>> 
>> Regards
>> Abhi
>> 
>> Sent from my iPhone
>> 
>> On Dec 18, 2012, at 6:13 PM, Russell Jurney <ru...@gmail.com> wrote:
>> 
>>> This is what HCatalog and Pig's HCatStorage is for, to access data
>>> from Hive from Pig. Unfortunately you are running CDH, which doesn't
>>> support the Apache HCatalog project. HDP includes Apache HCatalog:
>>> http://hortonworks.com/hdp/hdp-hcatalog-metadata-services/ More info
>>> on Apache HCatalog is available here:
>>> http://www.infoq.com/articles/HadoopMetadata
>>> 
>>> However, there is an RCFile loader in Piggybank:
>>> http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/HiveColumnarLoader.java?view=markup
>>> 
>>> Russell Jurney http://datasyndrome.com
>>> 
>>> On Dec 18, 2012, at 2:39 PM, abhishek <ab...@gmail.com> wrote:
>>> 
>>>> Hi all,
>>>> 
>>>> I have a use case which is implemented in hive with partitions.
>>>> 
>>>> Say
>>>> Customer_data/2012-12-18/....
>>>>                     /2012-12-17/....
>>>>                     /2012-12-16/....
>>>>                     /
>>>>                     /
>>>> 
>>>> I want implement this in pig.
>>>> 
>>>> How will partitions work in pig?
>>>> 
>>>> Regards
>>>> Abhishek

Re: Partitions in pig

Posted by Russell Jurney <ru...@gmail.com>.
Are you doing a directory-based partition with Hive, or are you
letting Hive's RCFile partition data for you?

Russell Jurney http://datasyndrome.com

On Dec 18, 2012, at 4:12 PM, abhishek <ab...@gmail.com> wrote:

> Hi Russell,
>
> Thanks for the reply.How RCFile loader is related to partitions?
>
> I did not get your point in this.
>
> Regards
> Abhi
>
> Sent from my iPhone
>
> On Dec 18, 2012, at 6:13 PM, Russell Jurney <ru...@gmail.com> wrote:
>
>> This is what HCatalog and Pig's HCatStorage is for, to access data
>> from Hive from Pig. Unfortunately you are running CDH, which doesn't
>> support the Apache HCatalog project. HDP includes Apache HCatalog:
>> http://hortonworks.com/hdp/hdp-hcatalog-metadata-services/ More info
>> on Apache HCatalog is available here:
>> http://www.infoq.com/articles/HadoopMetadata
>>
>> However, there is an RCFile loader in Piggybank:
>> http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/HiveColumnarLoader.java?view=markup
>>
>> Russell Jurney http://datasyndrome.com
>>
>> On Dec 18, 2012, at 2:39 PM, abhishek <ab...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I have a use case which is implemented in hive with partitions.
>>>
>>> Say
>>> Customer_data/2012-12-18/....
>>>                      /2012-12-17/....
>>>                      /2012-12-16/....
>>>                      /
>>>                      /
>>>
>>> I want implement this in pig.
>>>
>>> How will partitions work in pig?
>>>
>>> Regards
>>> Abhishek

Re: Partitions in pig

Posted by abhishek <ab...@gmail.com>.
Hi Russell,

Thanks for the reply.How RCFile loader is related to partitions?

I did not get your point in this.

Regards
Abhi

Sent from my iPhone

On Dec 18, 2012, at 6:13 PM, Russell Jurney <ru...@gmail.com> wrote:

> This is what HCatalog and Pig's HCatStorage is for, to access data
> from Hive from Pig. Unfortunately you are running CDH, which doesn't
> support the Apache HCatalog project. HDP includes Apache HCatalog:
> http://hortonworks.com/hdp/hdp-hcatalog-metadata-services/ More info
> on Apache HCatalog is available here:
> http://www.infoq.com/articles/HadoopMetadata
> 
> However, there is an RCFile loader in Piggybank:
> http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/HiveColumnarLoader.java?view=markup
> 
> Russell Jurney http://datasyndrome.com
> 
> On Dec 18, 2012, at 2:39 PM, abhishek <ab...@gmail.com> wrote:
> 
>> Hi all,
>> 
>> I have a use case which is implemented in hive with partitions.
>> 
>> Say
>> Customer_data/2012-12-18/....
>>                       /2012-12-17/....
>>                       /2012-12-16/....
>>                       /
>>                       /
>> 
>> I want implement this in pig.
>> 
>> How will partitions work in pig?
>> 
>> Regards
>> Abhishek

Re: Partitions in pig

Posted by Russell Jurney <ru...@gmail.com>.
This is what HCatalog and Pig's HCatStorage is for, to access data
from Hive from Pig. Unfortunately you are running CDH, which doesn't
support the Apache HCatalog project. HDP includes Apache HCatalog:
http://hortonworks.com/hdp/hdp-hcatalog-metadata-services/ More info
on Apache HCatalog is available here:
http://www.infoq.com/articles/HadoopMetadata

However, there is an RCFile loader in Piggybank:
http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/HiveColumnarLoader.java?view=markup

Russell Jurney http://datasyndrome.com

On Dec 18, 2012, at 2:39 PM, abhishek <ab...@gmail.com> wrote:

> Hi all,
>
> I have a use case which is implemented in hive with partitions.
>
> Say
> Customer_data/2012-12-18/....
>                        /2012-12-17/....
>                        /2012-12-16/....
>                        /
>                        /
>
> I want implement this in pig.
>
> How will partitions work in pig?
>
> Regards
> Abhishek