You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Aleksandr Elbakyan <ra...@yahoo.com> on 2012/02/02 02:04:51 UTC

issue with partioning sdf

Hello All,

I am trying to understand how does pig group partitioning work, I was not able to find any documentation regarding what happen under the hood. 


For example

B = GROUP A BY age;

Does pig partition data by age? Or it will partition by something else?


Other question:  
If I want to create custom partitioner can I pass fields I want data be 
partition by or it will be the same as group by key?


Regards,
Aleksandr 


Re: issue with partioning sdf

Posted by Aniket Mokashi <an...@gmail.com>.
I think pig will use default partitioner for the same.

You can use following syntax--
A = load 'input_data';
B = group A by $0 PARTITION BY
org.apache.pig.test.utils.SimpleCustomPartitioner parallel 2;

Take a look-
https://issues.apache.org/jira/browse/PIG-282

Thanks,
Aniket

On Wed, Feb 1, 2012 at 5:04 PM, Aleksandr Elbakyan <ra...@yahoo.com>wrote:

> Hello All,
>
> I am trying to understand how does pig group partitioning work, I was not
> able to find any documentation regarding what happen under the hood.
>
>
> For example
>
> B = GROUP A BY age;
>
> Does pig partition data by age? Or it will partition by something else?
>
>
> Other question:
> If I want to create custom partitioner can I pass fields I want data be
> partition by or it will be the same as group by key?
>
>
> Regards,
> Aleksandr
>
>


-- 
"...:::Aniket:::... Quetzalco@tl"

Re: issue with partioning sdf

Posted by Alan Gates <ga...@hortonworks.com>.
On Feb 1, 2012, at 5:04 PM, Aleksandr Elbakyan wrote:

> Hello All,
> 
> I am trying to understand how does pig group partitioning work, I was not able to find any documentation regarding what happen under the hood. 
> 
> 
> For example
> 
> B = GROUP A BY age;
> 
> Does pig partition data by age? Or it will partition by something else?

It partitions by the group by key (age in this case).  Similarly for joins and order by, it partitions by the join key and the sort key.
> 
> 
> Other question:  
> If I want to create custom partitioner can I pass fields I want data be 
> partition by or it will be the same as group by key?

No.

Alan.

> 
> 
> Regards,
> Aleksandr 
>