You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Olga Natkovich (JIRA)" <ji...@apache.org> on 2010/08/21 02:12:18 UTC
[jira] Updated: (PIG-282) Custom Partitioner
[ https://issues.apache.org/jira/browse/PIG-282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Olga Natkovich updated PIG-282:
-------------------------------
Release Note:
This feature allows to specify Hadoop Partitioner for the following operations: GROUP/COGROUP, CROSS, DISTINCT, JOIN (except 'skewed' join). Partitioner controls the partitioning of the keys of the intermediate map-outputs. See http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/Partitioner.html for more details.
To use this feature you can add PARTITION BY clause to the appropriate operator:
A = load 'input_data';
B = group A by $0 PARTITION BY org.apache.pig.test.utils.SimpleCustomPartitioner parallel 2;
.....
Here is the code for SimpleCustomPartitioner
public class SimpleCustomPartitioner extends Partitioner<PigNullableWritable, Writable> {
//@Override
public int getPartition(PigNullableWritable key, Writable value, int numPartitions) {
if(key.getValueAsPigType() instanceof Integer) {
int ret = (((Integer)key.getValueAsPigType()).intValue() % numPartitions);
return ret;
}
else {
return (key.hashCode()) % numPartitions;
}
}
}
> Custom Partitioner
> ------------------
>
> Key: PIG-282
> URL: https://issues.apache.org/jira/browse/PIG-282
> Project: Pig
> Issue Type: New Feature
> Affects Versions: 0.7.0
> Reporter: Amir Youssefi
> Assignee: Aniket Mokashi
> Priority: Minor
> Fix For: 0.8.0
>
> Attachments: CustomPartitioner.patch, CustomPartitionerFinale.patch, CustomPartitionerTest.patch
>
>
> By adding custom partitioner we can give control over which output partition a key (/value) goes to. We can add keywords to language e.g.
> PARTITION BY UDF(...)
> or a similar syntax. UDF returns a number between 0 and n-1 where n is number of output partitions.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.