You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Abhishek <ab...@gmail.com> on 2012/10/05 00:18:58 UTC

Optimizations in pig

Hi all,

I am new to pig.

In hive we can optimize the code by using

Indexing
Bucketing
Partitions
Storing the file in different formats, such as Rc file,sequence file

Overriding some property in the hive shell.

By using 

Set property name = value;

Override some default property in grunt shell.

How can use optimizations in pig.

Regards
Abhi


Sent from my iPhone

Re: Optimizations in pig

Posted by abhishek dodda <ab...@gmail.com>.
Thanks for your detailed explanation, I have some doubts which are
below please clarify them

On Thu, Oct 4, 2012 at 4:59 PM, Dmitriy Ryaboy <dv...@gmail.com> wrote:
> bucketing and partitioning is just setting the files up right. you can
> do that explicitly.

-- How can i do buckets explicitly i don't get your point here.

> Pig also lets you push down any filtering and projection into the
> loader, as long as said loader is aware of how to deal with filters
> and projections. Using any such loader will give you the benefits.

-- Hi what loader your are talking about can you please elaborate on this.

> HCatLoader is one such implementation (and can use Hive's metastore to
> filter partitions).
>
> Optimized / custom stores and loads are supported via the StoreFunc
> and LoadFunc implementation

-- Can you please point me to , some of the optimized store or load functions

 -- write your own, or use one of the many
> existing ones. RCFile is supported via RCFileLoader in piggybank.
> There is extensive


> SequenceFile support (and some additional RCFile support) in the
> Elephant-Bird project from Twitter (disclaimer: that's my group's
> project).

> Indexing is a special case of filter pushdowns; not as well developed
> as Hive's, but the Elephant-Twin project can help if you aren't afraid
> of rolling up your sleeves. (same disclaimer).
>
> There are also multiple join and grouping strategies.
>
> Setting any properties can be achieved via "set property.name value;"

-- Generally what kind of property's you override in pig grunt shell,
important properties to over ride.

Regards
Abhi

>
> D
>
>
> On Thu, Oct 4, 2012 at 4:35 PM, TianYi Zhu
> <ti...@facilitatedigital.com> wrote:
>> Hi Abhishek,
>>
>> http://archive.cloudera.com/cdh4/cdh/4/pig/perf.html
>> http://ofps.oreilly.com/titles/9781449302641/making_pig_fly.html
>>
>> On Fri, Oct 5, 2012 at 8:18 AM, Abhishek <ab...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I am new to pig.
>>>
>>> In hive we can optimize the code by using
>>>
>>> Indexing
>>> Bucketing
>>> Partitions
>>> Storing the file in different formats, such as Rc file,sequence file
>>>
>>> Overriding some property in the hive shell.
>>>
>>> By using
>>>
>>> Set property name = value;
>>>
>>> Override some default property in grunt shell.
>>>
>>> How can use optimizations in pig.
>>>
>>> Regards
>>> Abhi
>>>
>>>
>>> Sent from my iPhone
>>>

Re: Optimizations in pig

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
bucketing and partitioning is just setting the files up right. you can
do that explicitly.

Pig also lets you push down any filtering and projection into the
loader, as long as said loader is aware of how to deal with filters
and projections. Using any such loader will give you the benefits.
HCatLoader is one such implementation (and can use Hive's metastore to
filter partitions).

Optimized / custom stores and loads are supported via the StoreFunc
and LoadFunc implementation -- write your own, or use one of the many
existing ones. RCFile is supported via RCFileLoader in piggybank.
There is extensive

SequenceFile support (and some additional RCFile support) in the
Elephant-Bird project from Twitter (disclaimer: that's my group's
project).

Indexing is a special case of filter pushdowns; not as well developed
as Hive's, but the Elephant-Twin project can help if you aren't afraid
of rolling up your sleeves. (same disclaimer).

There are also multiple join and grouping strategies.

Setting any properties can be achieved via "set property.name value;"

D


On Thu, Oct 4, 2012 at 4:35 PM, TianYi Zhu
<ti...@facilitatedigital.com> wrote:
> Hi Abhishek,
>
> http://archive.cloudera.com/cdh4/cdh/4/pig/perf.html
> http://ofps.oreilly.com/titles/9781449302641/making_pig_fly.html
>
> On Fri, Oct 5, 2012 at 8:18 AM, Abhishek <ab...@gmail.com> wrote:
>
>> Hi all,
>>
>> I am new to pig.
>>
>> In hive we can optimize the code by using
>>
>> Indexing
>> Bucketing
>> Partitions
>> Storing the file in different formats, such as Rc file,sequence file
>>
>> Overriding some property in the hive shell.
>>
>> By using
>>
>> Set property name = value;
>>
>> Override some default property in grunt shell.
>>
>> How can use optimizations in pig.
>>
>> Regards
>> Abhi
>>
>>
>> Sent from my iPhone
>>

Re: Optimizations in pig

Posted by abhishek dodda <ab...@gmail.com>.
Thanks for the information Zhu.

Regards
abhishek

On Thu, Oct 4, 2012 at 4:35 PM, TianYi Zhu
<ti...@facilitatedigital.com> wrote:
> Hi Abhishek,
>
> http://archive.cloudera.com/cdh4/cdh/4/pig/perf.html
> http://ofps.oreilly.com/titles/9781449302641/making_pig_fly.html
>
> On Fri, Oct 5, 2012 at 8:18 AM, Abhishek <ab...@gmail.com> wrote:
>
>> Hi all,
>>
>> I am new to pig.
>>
>> In hive we can optimize the code by using
>>
>> Indexing
>> Bucketing
>> Partitions
>> Storing the file in different formats, such as Rc file,sequence file
>>
>> Overriding some property in the hive shell.
>>
>> By using
>>
>> Set property name = value;
>>
>> Override some default property in grunt shell.
>>
>> How can use optimizations in pig.
>>
>> Regards
>> Abhi
>>
>>
>> Sent from my iPhone
>>

Re: Optimizations in pig

Posted by TianYi Zhu <ti...@facilitatedigital.com>.
Hi Abhishek,

http://archive.cloudera.com/cdh4/cdh/4/pig/perf.html
http://ofps.oreilly.com/titles/9781449302641/making_pig_fly.html

On Fri, Oct 5, 2012 at 8:18 AM, Abhishek <ab...@gmail.com> wrote:

> Hi all,
>
> I am new to pig.
>
> In hive we can optimize the code by using
>
> Indexing
> Bucketing
> Partitions
> Storing the file in different formats, such as Rc file,sequence file
>
> Overriding some property in the hive shell.
>
> By using
>
> Set property name = value;
>
> Override some default property in grunt shell.
>
> How can use optimizations in pig.
>
> Regards
> Abhi
>
>
> Sent from my iPhone
>