You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Vivek Padmanabhan (JIRA)" <ji...@apache.org> on 2011/03/17 06:11:29 UTC
[jira] Commented: (PIG-1911) Infinite loop with accumulator
function in nested foreach
[ https://issues.apache.org/jira/browse/PIG-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007815#comment-13007815 ]
Vivek Padmanabhan commented on PIG-1911:
----------------------------------------
In this case pig is calling getValue() and cleanup() methods infinitely. The below is the udf source just in case;
{code}
public class MyCOUNT extends EvalFunc<Long> implements Accumulator<Long>{
@Override
public Long exec(Tuple input) throws IOException {
DataBag bag = (DataBag)input.get(0);
Iterator it = bag.iterator();
long cnt = 0;
while (it.hasNext()){
Tuple t = (Tuple)it.next();
if (t != null && t.size() > 0 && t.get(0) != null )
cnt++;
}
return cnt;
}
@Override
public Schema outputSchema(Schema input) {
return new Schema(new Schema.FieldSchema(null, DataType.LONG));
}
private long intermediateCount = 0L;
@Override
public void accumulate(Tuple b) throws IOException {
DataBag bag = (DataBag)b.get(0);
Iterator it = bag.iterator();
while (it.hasNext()){
Tuple t = (Tuple)it.next();
if (t != null && t.size() > 0 && t.get(0) != null) {
intermediateCount += 1;
}
}
}
@Override
public void cleanup() {
intermediateCount = 0L;
}
@Override
public Long getValue() {
return intermediateCount;
}
}
{code}
> Infinite loop with accumulator function in nested foreach
> ---------------------------------------------------------
>
> Key: PIG-1911
> URL: https://issues.apache.org/jira/browse/PIG-1911
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.0
> Reporter: Olga Natkovich
> Assignee: Thejas M Nair
> Fix For: 0.8.0
>
>
> Sample script:
> register v_udf.jar;
> a = load '2records' as (f1:chararray,f2:chararray);
> b = group a by f1;
> d = foreach b { sort = order a by f1;
> generate org.udfs.MyCOUNT(sort) as something ; }
> dump d;
> This causes infinite loop if MyCOUNT implements Accumulator interface.
> The workaround is to take the function out of nested foreach into a separate foreach statement.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira