You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "liyunzhang_intel (JIRA)" <ji...@apache.org> on 2016/02/04 09:55:39 UTC
[jira] [Commented] (PIG-4766) Ensure GroupBy is optimized for all
algebraic Operations
[ https://issues.apache.org/jira/browse/PIG-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15131978#comment-15131978 ]
liyunzhang_intel commented on PIG-4766:
---------------------------------------
[~pallavi.rao]: PIG-4766-1.patch looks good except following problem.
org.apache.pig.backend.hadoop.executionengine.spark.converter.ReduceByConverter.MergeValuesFunction
{code}
public Tuple apply(Tuple v1, Tuple v2) {
LOG.debug("MergeValuesFunction in : " + v1 + " , " + v2);
Tuple result = tf.newTuple(2);
DataBag bag = DefaultBagFactory.getInstance().newDefaultBag();
Tuple t = new DefaultTuple();
try {
// Package the input tuples so they can be processed by Algebraic functions.
Object key = v1.get(0);
if (key == null) {
key = "";
} else {
result.set(0, key);
}
....
{code}
Is it ok that tuples with null key are considered as same? for example: two tuples (,20) and (,20), they will be considered to have the same key and execute poReduce.getNext().
> Ensure GroupBy is optimized for all algebraic Operations
> --------------------------------------------------------
>
> Key: PIG-4766
> URL: https://issues.apache.org/jira/browse/PIG-4766
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Reporter: Pallavi Rao
> Assignee: Pallavi Rao
> Labels: spork
> Fix For: spark-branch
>
> Attachments: PIG-4766-v1.patch, PIG-4766.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)