You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Daniel Dai (JIRA)" <ji...@apache.org> on 2014/09/18 20:49:33 UTC

[jira] [Created] (PIG-4184) UDF backward compatibility issue after POStatus.STATUS_NULL refactory

Daniel Dai created PIG-4184:
-------------------------------

             Summary: UDF backward compatibility issue after POStatus.STATUS_NULL refactory
                 Key: PIG-4184
                 URL: https://issues.apache.org/jira/browse/PIG-4184
             Project: Pig
          Issue Type: Bug
          Components: impl
            Reporter: Daniel Dai
            Assignee: Daniel Dai
             Fix For: 0.14.0


This is the same issue we discussed in PIG-3739 and PIG-3679. However, our previous fix does not solve the issue, in fact, it make things worse and it is totally my fault.

Consider the following UDF and script:
{code}
    public class IntToBool extends EvalFunc<Boolean> {
        @Override
        public Boolean exec(Tuple input) throws IOException {
            if (input == null || input.size() == 0)
                return null;
            Integer val = (Integer)input.get(0);
            return (val == null || val == 0) ? false : true;
        }
    }
{code}
{code}
a = load '1.txt' as (i0:int, i1:int);
b = foreach a generate IntToBool(i0);
store b into 'output';
{code}
1.txt
{code}
1
2   3
{code}
With Pig 0.12, we get:
{code}
(false)
(true)
{code}
With Pig 0.13/0.14, we get:
{code}
()
(true)
{code}
The reason is in 0.12, Pig pass first row as a tuple with a null item to IntToBool, with 0.13/0.14, Pig swallow the first row, which is not right. And this wrong behavior is brought by PIG-3739 and PIG-3679.

Before that (but after POStatus.STATUS_NULL refactory PIG-3568), we do have a behavior change which makes e2e test StreamingPythonUDFs_10 fail with NPE. However, I think this is an inconsistent behavior of 0.12. Consider the following scripts:
{code}
a = load '1.txt' as (name:chararray, age:int, gpa:double);
b = foreach a generate ROUND((gpa>3.0?gpa+1:gpa));
store b into 'output';
{code}
{code}
a = load '1.txt' as (name:chararray, age:int, gpa:double);
b = foreach a generate ROUND(gpa);
store b into 'output';
{code}
If gpa field is null, script 1 skip the row and script 2 fail with NPE, which does not make sense. So my thinking is:
1. Pig 0.12 is wrong and POStatus.STATUS_NULL refactory fix this behavior (we don't need related fix in PIG-3739/PIG-3679)
2. ROUND (and some other UDF) is wrong anyway, we shall fix it



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)