You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Thejas M Nair (JIRA)" <ji...@apache.org> on 2011/03/19 02:35:29 UTC

[jira] Created: (PIG-1922) null is being treated as string constant in expressions

null is being treated as string constant in expressions
-------------------------------------------------------

                 Key: PIG-1922
                 URL: https://issues.apache.org/jira/browse/PIG-1922
             Project: Pig
          Issue Type: Bug
    Affects Versions: 0.8.0, 0.9.0
            Reporter: Thejas M Nair
             Fix For: 0.9.0


In following statement, the null gets translated to string constant. The statement is invalid, and should result in an error. 
{code}
fil = filter l by a != null; -- This does not give an error, the correct usage is "a is not null"
fil = filter l by a != adsf; -- this does give an error message saying that there is no column asdf
{code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-1922) null is being treated as string constant in expressions

Posted by "Santhosh Srinivasan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009275#comment-13009275 ] 

Santhosh Srinivasan commented on PIG-1922:
------------------------------------------

Can a != null be translated to a is not null ?

> null is being treated as string constant in expressions
> -------------------------------------------------------
>
>                 Key: PIG-1922
>                 URL: https://issues.apache.org/jira/browse/PIG-1922
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0, 0.9.0
>            Reporter: Thejas M Nair
>             Fix For: 0.9.0
>
>
> In following statement, the null gets translated to string constant. The statement is invalid, and should result in an error. 
> {code}
> fil = filter l by a != null; -- This does not give an error, the correct usage is "a is not null"
> fil = filter l by a != adsf; -- this does give an error message saying that there is no column asdf
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-1922) null is being treated as string constant in expressions

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009313#comment-13009313 ] 

Daniel Dai commented on PIG-1922:
---------------------------------

I think we shall follow SQL. In SQL only "is not null" is allowed.

> null is being treated as string constant in expressions
> -------------------------------------------------------
>
>                 Key: PIG-1922
>                 URL: https://issues.apache.org/jira/browse/PIG-1922
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0, 0.9.0
>            Reporter: Thejas M Nair
>             Fix For: 0.9.0
>
>
> In following statement, the null gets translated to string constant. The statement is invalid, and should result in an error. 
> {code}
> fil = filter l by a != null; -- This does not give an error, the correct usage is "a is not null"
> fil = filter l by a != adsf; -- this does give an error message saying that there is no column asdf
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-1922) null is being treated as string constant in expressions

Posted by "Santhosh Srinivasan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009311#comment-13009311 ] 

Santhosh Srinivasan commented on PIG-1922:
------------------------------------------

Good point. Does (co)group treat all NULL as equal?

> null is being treated as string constant in expressions
> -------------------------------------------------------
>
>                 Key: PIG-1922
>                 URL: https://issues.apache.org/jira/browse/PIG-1922
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0, 0.9.0
>            Reporter: Thejas M Nair
>             Fix For: 0.9.0
>
>
> In following statement, the null gets translated to string constant. The statement is invalid, and should result in an error. 
> {code}
> fil = filter l by a != null; -- This does not give an error, the correct usage is "a is not null"
> fil = filter l by a != adsf; -- this does give an error message saying that there is no column asdf
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-1922) null is being treated as string constant in expressions

Posted by "Santhosh Srinivasan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009312#comment-13009312 ] 

Santhosh Srinivasan commented on PIG-1922:
------------------------------------------

Good point. Does (co)group treat all NULL as equal?

> null is being treated as string constant in expressions
> -------------------------------------------------------
>
>                 Key: PIG-1922
>                 URL: https://issues.apache.org/jira/browse/PIG-1922
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0, 0.9.0
>            Reporter: Thejas M Nair
>             Fix For: 0.9.0
>
>
> In following statement, the null gets translated to string constant. The statement is invalid, and should result in an error. 
> {code}
> fil = filter l by a != null; -- This does not give an error, the correct usage is "a is not null"
> fil = filter l by a != adsf; -- this does give an error message saying that there is no column asdf
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-1922) null is being treated as string constant in expressions

Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009296#comment-13009296 ] 

Thejas M Nair commented on PIG-1922:
------------------------------------

bq. Can a != null be translated to a is not null ?
That can be confusing, a null is not 'equal' to another null (for example in join condition). The "is null" and "is not null" syntax reinforces that idea better.

> null is being treated as string constant in expressions
> -------------------------------------------------------
>
>                 Key: PIG-1922
>                 URL: https://issues.apache.org/jira/browse/PIG-1922
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0, 0.9.0
>            Reporter: Thejas M Nair
>             Fix For: 0.9.0
>
>
> In following statement, the null gets translated to string constant. The statement is invalid, and should result in an error. 
> {code}
> fil = filter l by a != null; -- This does not give an error, the correct usage is "a is not null"
> fil = filter l by a != adsf; -- this does give an error message saying that there is no column asdf
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-1922) null is being treated as string constant in expressions

Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009394#comment-13009394 ] 

Thejas M Nair commented on PIG-1922:
------------------------------------

bq. Based on the aforementioned points use of = and != should not error out but should result in UNKNOWN.
Yes, an filter expression such as (col1 == col2) should not result in error if either of them is null. But that does not mean that we need to support (col1 == null) in the syntax, as it is unlikely that the user's intention is to get UNKOWN as the result.


> null is being treated as string constant in expressions
> -------------------------------------------------------
>
>                 Key: PIG-1922
>                 URL: https://issues.apache.org/jira/browse/PIG-1922
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0, 0.9.0
>            Reporter: Thejas M Nair
>             Fix For: 0.9.0
>
>
> In following statement, the null gets translated to string constant. The statement is invalid, and should result in an error. 
> {code}
> fil = filter l by a != null; -- This does not give an error, the correct usage is "a is not null"
> fil = filter l by a != adsf; -- this does give an error message saying that there is no column asdf
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-1922) null is being treated as string constant in expressions

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009400#comment-13009400 ] 

Daniel Dai commented on PIG-1922:
---------------------------------

Thanks for checking. 
1. This is consistent with Pig
2. This is consistent with Pig (I am wrong in my previous comment)
3. This we need to fix in this Jira. However, introduce another unknown data type in Pig seems overkilling. 
4. This is consistent with Pig

> null is being treated as string constant in expressions
> -------------------------------------------------------
>
>                 Key: PIG-1922
>                 URL: https://issues.apache.org/jira/browse/PIG-1922
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0, 0.9.0
>            Reporter: Thejas M Nair
>             Fix For: 0.9.0
>
>
> In following statement, the null gets translated to string constant. The statement is invalid, and should result in an error. 
> {code}
> fil = filter l by a != null; -- This does not give an error, the correct usage is "a is not null"
> fil = filter l by a != adsf; -- this does give an error message saying that there is no column asdf
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-1922) null is being treated as string constant in expressions

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009314#comment-13009314 ] 

Daniel Dai commented on PIG-1922:
---------------------------------

Yes, (co)group treat all NULL equal, this is similar to SQL join.

> null is being treated as string constant in expressions
> -------------------------------------------------------
>
>                 Key: PIG-1922
>                 URL: https://issues.apache.org/jira/browse/PIG-1922
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0, 0.9.0
>            Reporter: Thejas M Nair
>             Fix For: 0.9.0
>
>
> In following statement, the null gets translated to string constant. The statement is invalid, and should result in an error. 
> {code}
> fil = filter l by a != null; -- This does not give an error, the correct usage is "a is not null"
> fil = filter l by a != adsf; -- this does give an error message saying that there is no column asdf
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-1922) null is being treated as string constant in expressions

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009721#comment-13009721 ] 

Alan Gates commented on PIG-1922:
---------------------------------

bq. 3. Use of (in)equality operator with NULL results in UNKNOWN (which is not the same as NULL) 
For all intents and purposes, unknown is the same as null here.  Filters pass through only records that are true.  The result of a boolean comparison with a null does not pass through the filter.  So it is fine to say x == null results in null.

So I'm not sure there's a bug here at all.  x == null is probably not what the user meant, but as long as the filter passed no records through we did the right thing.

> null is being treated as string constant in expressions
> -------------------------------------------------------
>
>                 Key: PIG-1922
>                 URL: https://issues.apache.org/jira/browse/PIG-1922
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0, 0.9.0
>            Reporter: Thejas M Nair
>             Fix For: 0.9.0
>
>
> In following statement, the null gets translated to string constant. The statement is invalid, and should result in an error. 
> {code}
> fil = filter l by a != null; -- This does not give an error, the correct usage is "a is not null"
> fil = filter l by a != adsf; -- this does give an error message saying that there is no column asdf
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (PIG-1922) null is being treated as string constant in expressions

Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thejas M Nair resolved PIG-1922.
--------------------------------

    Resolution: Invalid

I agree with Alan's comment that is not a bug. This behavior is also documented in the section about nulls - http://pig.apache.org/docs/r0.8.0/piglatin_ref2.html#nulls

I also verified that the ==, != statements work as documented, except for one case which does not - "fil = filter by null == null;" . The filter expression should evaluate to not-true (null), but it evaluates to true. I will open another jira to track that issue.


> null is being treated as string constant in expressions
> -------------------------------------------------------
>
>                 Key: PIG-1922
>                 URL: https://issues.apache.org/jira/browse/PIG-1922
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0, 0.9.0
>            Reporter: Thejas M Nair
>             Fix For: 0.9.0
>
>
> In following statement, the null gets translated to string constant. The statement is invalid, and should result in an error. 
> {code}
> fil = filter l by a != null; -- This does not give an error, the correct usage is "a is not null"
> fil = filter l by a != adsf; -- this does give an error message saying that there is no column asdf
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-1922) null is being treated as string constant in expressions

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009338#comment-13009338 ] 

Olga Natkovich commented on PIG-1922:
-------------------------------------

We are following SQL semantics for NULL handling wherever possible. SQL is not very consistent on this point; however, it is considered to be a standard for relational processing and many people are familiar with its semantics so doing this is the safest choice.

> null is being treated as string constant in expressions
> -------------------------------------------------------
>
>                 Key: PIG-1922
>                 URL: https://issues.apache.org/jira/browse/PIG-1922
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0, 0.9.0
>            Reporter: Thejas M Nair
>             Fix For: 0.9.0
>
>
> In following statement, the null gets translated to string constant. The statement is invalid, and should result in an error. 
> {code}
> fil = filter l by a != null; -- This does not give an error, the correct usage is "a is not null"
> fil = filter l by a != adsf; -- this does give an error message saying that there is no column asdf
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-1922) null is being treated as string constant in expressions

Posted by "Santhosh Srinivasan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009723#comment-13009723 ] 

Santhosh Srinivasan commented on PIG-1922:
------------------------------------------

bq. For all intents and purposes, unknown is the same as null here. Filters pass through only records that are true. The result of a boolean comparison with a null does not pass through the filter. So it is fine to say x == null results in null.

So I'm not sure there's a bug here at all. x == null is probably not what the user meant, but as long as the filter passed no records through we did the right thing.

The documentation needs to be updated to state this behaviour, i.e., set user expectations. So is this bug invalid wrt code changes?

> null is being treated as string constant in expressions
> -------------------------------------------------------
>
>                 Key: PIG-1922
>                 URL: https://issues.apache.org/jira/browse/PIG-1922
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0, 0.9.0
>            Reporter: Thejas M Nair
>             Fix For: 0.9.0
>
>
> In following statement, the null gets translated to string constant. The statement is invalid, and should result in an error. 
> {code}
> fil = filter l by a != null; -- This does not give an error, the correct usage is "a is not null"
> fil = filter l by a != adsf; -- this does give an error message saying that there is no column asdf
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-1922) null is being treated as string constant in expressions

Posted by "Santhosh Srinivasan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009450#comment-13009450 ] 

Santhosh Srinivasan commented on PIG-1922:
------------------------------------------

bq. Yes, an filter expression such as (col1 == col2) should not result in error if either of them is null. But that does not mean that we need to support (col1 == null) in the syntax, as it is unlikely that the user's intention is to get UNKOWN as the result.

SQL allows the use of = and != with NULL. The resulting value is UNKNOWN. Granted that we don't have UNKNOWN today. Its something that we should think about for a few reasons:

1. SQL allows it (to use your argument :)
2. Bailing out with an error will surprise (existing) users.

> null is being treated as string constant in expressions
> -------------------------------------------------------
>
>                 Key: PIG-1922
>                 URL: https://issues.apache.org/jira/browse/PIG-1922
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0, 0.9.0
>            Reporter: Thejas M Nair
>             Fix For: 0.9.0
>
>
> In following statement, the null gets translated to string constant. The statement is invalid, and should result in an error. 
> {code}
> fil = filter l by a != null; -- This does not give an error, the correct usage is "a is not null"
> fil = filter l by a != adsf; -- this does give an error message saying that there is no column asdf
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-1922) null is being treated as string constant in expressions

Posted by "Santhosh Srinivasan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009384#comment-13009384 ] 

Santhosh Srinivasan commented on PIG-1922:
------------------------------------------

A cursory research on SQL and NULL reveals the following:

Reference: http://en.wikipedia.org/wiki/Null_(SQL)

1. For group by NULLs are considered equal (SQL 2003)
2. For joins NULLs are not equal
3. Use of (in)equality operator with NULL results in UNKNOWN (which is not the same as NULL) 
4. Only use of IS NULL and IS NOT NULL is defined as boolean

Based on the aforementioned points use of = and != should not error out but should result in UNKNOWN. This looks like a slightly broader discussion.

Thoughts?

> null is being treated as string constant in expressions
> -------------------------------------------------------
>
>                 Key: PIG-1922
>                 URL: https://issues.apache.org/jira/browse/PIG-1922
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0, 0.9.0
>            Reporter: Thejas M Nair
>             Fix For: 0.9.0
>
>
> In following statement, the null gets translated to string constant. The statement is invalid, and should result in an error. 
> {code}
> fil = filter l by a != null; -- This does not give an error, the correct usage is "a is not null"
> fil = filter l by a != adsf; -- this does give an error message saying that there is no column asdf
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira