You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by abhishek <ab...@gmail.com> on 2012/12/18 23:39:03 UTC
Partitions in pig
Hi all,
I have a use case which is implemented in hive with partitions.
Say
Customer_data/2012-12-18/....
/2012-12-17/....
/2012-12-16/....
/
/
I want implement this in pig.
How will partitions work in pig?
Regards
Abhishek
Re: Partitions in pig
Posted by Cheolsoo Park <ch...@cloudera.com>.
To be clear, the next CDH release is going to include HCatalog.
Thanks,
Cheolsoo
On Tue, Dec 18, 2012 at 3:13 PM, Russell Jurney <ru...@gmail.com>wrote:
> This is what HCatalog and Pig's HCatStorage is for, to access data
> from Hive from Pig. Unfortunately you are running CDH, which doesn't
> support the Apache HCatalog project. HDP includes Apache HCatalog:
> http://hortonworks.com/hdp/hdp-hcatalog-metadata-services/ More info
> on Apache HCatalog is available here:
> http://www.infoq.com/articles/HadoopMetadata
>
> However, there is an RCFile loader in Piggybank:
>
> http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/HiveColumnarLoader.java?view=markup
>
> Russell Jurney http://datasyndrome.com
>
> On Dec 18, 2012, at 2:39 PM, abhishek <ab...@gmail.com> wrote:
>
> > Hi all,
> >
> > I have a use case which is implemented in hive with partitions.
> >
> > Say
> > Customer_data/2012-12-18/....
> > /2012-12-17/....
> > /2012-12-16/....
> > /
> > /
> >
> > I want implement this in pig.
> >
> > How will partitions work in pig?
> >
> > Regards
> > Abhishek
>
Re: Partitions in pig
Posted by abhishek <ab...@gmail.com>.
It works for me thanks.
Regards
Abhi
Sent from my iPhone
On Dec 18, 2012, at 7:43 PM, Russell Jurney <ru...@gmail.com> wrote:
> It will work like so:
> http://stackoverflow.com/questions/3515481/pig-latin-load-multiple-files-from-a-date-range-part-of-the-directory-structur
>
> Russell Jurney http://datasyndrome.com
>
> On Dec 18, 2012, at 4:27 PM, abhishek <ab...@gmail.com> wrote:
>
>> Directory based partition in hive.
>>
>> Partition by date
>>
>> Thanks
>> Abhi
>>
>> Sent from my iPhone
>>
>> On Dec 18, 2012, at 7:20 PM, Russell Jurney <ru...@gmail.com> wrote:
>>
>>> Are you doing a directory-based partition with Hive, or are you
>>> letting Hive's RCFile partition data for you?
>>>
>>> Russell Jurney http://datasyndrome.com
>>>
>>> On Dec 18, 2012, at 4:12 PM, abhishek <ab...@gmail.com> wrote:
>>>
>>>> Hi Russell,
>>>>
>>>> Thanks for the reply.How RCFile loader is related to partitions?
>>>>
>>>> I did not get your point in this.
>>>>
>>>> Regards
>>>> Abhi
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On Dec 18, 2012, at 6:13 PM, Russell Jurney <ru...@gmail.com> wrote:
>>>>
>>>>> This is what HCatalog and Pig's HCatStorage is for, to access data
>>>>> from Hive from Pig. Unfortunately you are running CDH, which doesn't
>>>>> support the Apache HCatalog project. HDP includes Apache HCatalog:
>>>>> http://hortonworks.com/hdp/hdp-hcatalog-metadata-services/ More info
>>>>> on Apache HCatalog is available here:
>>>>> http://www.infoq.com/articles/HadoopMetadata
>>>>>
>>>>> However, there is an RCFile loader in Piggybank:
>>>>> http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/HiveColumnarLoader.java?view=markup
>>>>>
>>>>> Russell Jurney http://datasyndrome.com
>>>>>
>>>>> On Dec 18, 2012, at 2:39 PM, abhishek <ab...@gmail.com> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I have a use case which is implemented in hive with partitions.
>>>>>>
>>>>>> Say
>>>>>> Customer_data/2012-12-18/....
>>>>>> /2012-12-17/....
>>>>>> /2012-12-16/....
>>>>>> /
>>>>>> /
>>>>>>
>>>>>> I want implement this in pig.
>>>>>>
>>>>>> How will partitions work in pig?
>>>>>>
>>>>>> Regards
>>>>>> Abhishek
Re: Partitions in pig
Posted by abhishek <ab...@gmail.com>.
Hi Russell,
I will try this and get back to you.
Regards
Abhishek
Sent from my iPhone
On Dec 18, 2012, at 7:43 PM, Russell Jurney <ru...@gmail.com> wrote:
> It will work like so:
> http://stackoverflow.com/questions/3515481/pig-latin-load-multiple-files-from-a-date-range-part-of-the-directory-structur
>
> Russell Jurney http://datasyndrome.com
>
> On Dec 18, 2012, at 4:27 PM, abhishek <ab...@gmail.com> wrote:
>
>> Directory based partition in hive.
>>
>> Partition by date
>>
>> Thanks
>> Abhi
>>
>> Sent from my iPhone
>>
>> On Dec 18, 2012, at 7:20 PM, Russell Jurney <ru...@gmail.com> wrote:
>>
>>> Are you doing a directory-based partition with Hive, or are you
>>> letting Hive's RCFile partition data for you?
>>>
>>> Russell Jurney http://datasyndrome.com
>>>
>>> On Dec 18, 2012, at 4:12 PM, abhishek <ab...@gmail.com> wrote:
>>>
>>>> Hi Russell,
>>>>
>>>> Thanks for the reply.How RCFile loader is related to partitions?
>>>>
>>>> I did not get your point in this.
>>>>
>>>> Regards
>>>> Abhi
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On Dec 18, 2012, at 6:13 PM, Russell Jurney <ru...@gmail.com> wrote:
>>>>
>>>>> This is what HCatalog and Pig's HCatStorage is for, to access data
>>>>> from Hive from Pig. Unfortunately you are running CDH, which doesn't
>>>>> support the Apache HCatalog project. HDP includes Apache HCatalog:
>>>>> http://hortonworks.com/hdp/hdp-hcatalog-metadata-services/ More info
>>>>> on Apache HCatalog is available here:
>>>>> http://www.infoq.com/articles/HadoopMetadata
>>>>>
>>>>> However, there is an RCFile loader in Piggybank:
>>>>> http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/HiveColumnarLoader.java?view=markup
>>>>>
>>>>> Russell Jurney http://datasyndrome.com
>>>>>
>>>>> On Dec 18, 2012, at 2:39 PM, abhishek <ab...@gmail.com> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I have a use case which is implemented in hive with partitions.
>>>>>>
>>>>>> Say
>>>>>> Customer_data/2012-12-18/....
>>>>>> /2012-12-17/....
>>>>>> /2012-12-16/....
>>>>>> /
>>>>>> /
>>>>>>
>>>>>> I want implement this in pig.
>>>>>>
>>>>>> How will partitions work in pig?
>>>>>>
>>>>>> Regards
>>>>>> Abhishek
Re: Partitions in pig
Posted by Russell Jurney <ru...@gmail.com>.
It will work like so:
http://stackoverflow.com/questions/3515481/pig-latin-load-multiple-files-from-a-date-range-part-of-the-directory-structur
Russell Jurney http://datasyndrome.com
On Dec 18, 2012, at 4:27 PM, abhishek <ab...@gmail.com> wrote:
> Directory based partition in hive.
>
> Partition by date
>
> Thanks
> Abhi
>
> Sent from my iPhone
>
> On Dec 18, 2012, at 7:20 PM, Russell Jurney <ru...@gmail.com> wrote:
>
>> Are you doing a directory-based partition with Hive, or are you
>> letting Hive's RCFile partition data for you?
>>
>> Russell Jurney http://datasyndrome.com
>>
>> On Dec 18, 2012, at 4:12 PM, abhishek <ab...@gmail.com> wrote:
>>
>>> Hi Russell,
>>>
>>> Thanks for the reply.How RCFile loader is related to partitions?
>>>
>>> I did not get your point in this.
>>>
>>> Regards
>>> Abhi
>>>
>>> Sent from my iPhone
>>>
>>> On Dec 18, 2012, at 6:13 PM, Russell Jurney <ru...@gmail.com> wrote:
>>>
>>>> This is what HCatalog and Pig's HCatStorage is for, to access data
>>>> from Hive from Pig. Unfortunately you are running CDH, which doesn't
>>>> support the Apache HCatalog project. HDP includes Apache HCatalog:
>>>> http://hortonworks.com/hdp/hdp-hcatalog-metadata-services/ More info
>>>> on Apache HCatalog is available here:
>>>> http://www.infoq.com/articles/HadoopMetadata
>>>>
>>>> However, there is an RCFile loader in Piggybank:
>>>> http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/HiveColumnarLoader.java?view=markup
>>>>
>>>> Russell Jurney http://datasyndrome.com
>>>>
>>>> On Dec 18, 2012, at 2:39 PM, abhishek <ab...@gmail.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I have a use case which is implemented in hive with partitions.
>>>>>
>>>>> Say
>>>>> Customer_data/2012-12-18/....
>>>>> /2012-12-17/....
>>>>> /2012-12-16/....
>>>>> /
>>>>> /
>>>>>
>>>>> I want implement this in pig.
>>>>>
>>>>> How will partitions work in pig?
>>>>>
>>>>> Regards
>>>>> Abhishek
Re: Partitions in pig
Posted by abhishek <ab...@gmail.com>.
Directory based partition in hive.
Partition by date
Thanks
Abhi
Sent from my iPhone
On Dec 18, 2012, at 7:20 PM, Russell Jurney <ru...@gmail.com> wrote:
> Are you doing a directory-based partition with Hive, or are you
> letting Hive's RCFile partition data for you?
>
> Russell Jurney http://datasyndrome.com
>
> On Dec 18, 2012, at 4:12 PM, abhishek <ab...@gmail.com> wrote:
>
>> Hi Russell,
>>
>> Thanks for the reply.How RCFile loader is related to partitions?
>>
>> I did not get your point in this.
>>
>> Regards
>> Abhi
>>
>> Sent from my iPhone
>>
>> On Dec 18, 2012, at 6:13 PM, Russell Jurney <ru...@gmail.com> wrote:
>>
>>> This is what HCatalog and Pig's HCatStorage is for, to access data
>>> from Hive from Pig. Unfortunately you are running CDH, which doesn't
>>> support the Apache HCatalog project. HDP includes Apache HCatalog:
>>> http://hortonworks.com/hdp/hdp-hcatalog-metadata-services/ More info
>>> on Apache HCatalog is available here:
>>> http://www.infoq.com/articles/HadoopMetadata
>>>
>>> However, there is an RCFile loader in Piggybank:
>>> http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/HiveColumnarLoader.java?view=markup
>>>
>>> Russell Jurney http://datasyndrome.com
>>>
>>> On Dec 18, 2012, at 2:39 PM, abhishek <ab...@gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I have a use case which is implemented in hive with partitions.
>>>>
>>>> Say
>>>> Customer_data/2012-12-18/....
>>>> /2012-12-17/....
>>>> /2012-12-16/....
>>>> /
>>>> /
>>>>
>>>> I want implement this in pig.
>>>>
>>>> How will partitions work in pig?
>>>>
>>>> Regards
>>>> Abhishek
Re: Partitions in pig
Posted by Russell Jurney <ru...@gmail.com>.
Are you doing a directory-based partition with Hive, or are you
letting Hive's RCFile partition data for you?
Russell Jurney http://datasyndrome.com
On Dec 18, 2012, at 4:12 PM, abhishek <ab...@gmail.com> wrote:
> Hi Russell,
>
> Thanks for the reply.How RCFile loader is related to partitions?
>
> I did not get your point in this.
>
> Regards
> Abhi
>
> Sent from my iPhone
>
> On Dec 18, 2012, at 6:13 PM, Russell Jurney <ru...@gmail.com> wrote:
>
>> This is what HCatalog and Pig's HCatStorage is for, to access data
>> from Hive from Pig. Unfortunately you are running CDH, which doesn't
>> support the Apache HCatalog project. HDP includes Apache HCatalog:
>> http://hortonworks.com/hdp/hdp-hcatalog-metadata-services/ More info
>> on Apache HCatalog is available here:
>> http://www.infoq.com/articles/HadoopMetadata
>>
>> However, there is an RCFile loader in Piggybank:
>> http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/HiveColumnarLoader.java?view=markup
>>
>> Russell Jurney http://datasyndrome.com
>>
>> On Dec 18, 2012, at 2:39 PM, abhishek <ab...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I have a use case which is implemented in hive with partitions.
>>>
>>> Say
>>> Customer_data/2012-12-18/....
>>> /2012-12-17/....
>>> /2012-12-16/....
>>> /
>>> /
>>>
>>> I want implement this in pig.
>>>
>>> How will partitions work in pig?
>>>
>>> Regards
>>> Abhishek
Re: Partitions in pig
Posted by abhishek <ab...@gmail.com>.
Hi Russell,
Thanks for the reply.How RCFile loader is related to partitions?
I did not get your point in this.
Regards
Abhi
Sent from my iPhone
On Dec 18, 2012, at 6:13 PM, Russell Jurney <ru...@gmail.com> wrote:
> This is what HCatalog and Pig's HCatStorage is for, to access data
> from Hive from Pig. Unfortunately you are running CDH, which doesn't
> support the Apache HCatalog project. HDP includes Apache HCatalog:
> http://hortonworks.com/hdp/hdp-hcatalog-metadata-services/ More info
> on Apache HCatalog is available here:
> http://www.infoq.com/articles/HadoopMetadata
>
> However, there is an RCFile loader in Piggybank:
> http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/HiveColumnarLoader.java?view=markup
>
> Russell Jurney http://datasyndrome.com
>
> On Dec 18, 2012, at 2:39 PM, abhishek <ab...@gmail.com> wrote:
>
>> Hi all,
>>
>> I have a use case which is implemented in hive with partitions.
>>
>> Say
>> Customer_data/2012-12-18/....
>> /2012-12-17/....
>> /2012-12-16/....
>> /
>> /
>>
>> I want implement this in pig.
>>
>> How will partitions work in pig?
>>
>> Regards
>> Abhishek
Re: Partitions in pig
Posted by Russell Jurney <ru...@gmail.com>.
This is what HCatalog and Pig's HCatStorage is for, to access data
from Hive from Pig. Unfortunately you are running CDH, which doesn't
support the Apache HCatalog project. HDP includes Apache HCatalog:
http://hortonworks.com/hdp/hdp-hcatalog-metadata-services/ More info
on Apache HCatalog is available here:
http://www.infoq.com/articles/HadoopMetadata
However, there is an RCFile loader in Piggybank:
http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/HiveColumnarLoader.java?view=markup
Russell Jurney http://datasyndrome.com
On Dec 18, 2012, at 2:39 PM, abhishek <ab...@gmail.com> wrote:
> Hi all,
>
> I have a use case which is implemented in hive with partitions.
>
> Say
> Customer_data/2012-12-18/....
> /2012-12-17/....
> /2012-12-16/....
> /
> /
>
> I want implement this in pig.
>
> How will partitions work in pig?
>
> Regards
> Abhishek