You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Olga Natkovich (JIRA)" <ji...@apache.org> on 2008/09/04 23:59:44 UTC

[jira] Commented: (PIG-361) JOIN and cogroup should handle NULLs correctly

    [ https://issues.apache.org/jira/browse/PIG-361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628477#action_12628477 ] 

Olga Natkovich commented on PIG-361:
------------------------------------

After having further discussion, here is what I think is the right thing to do:

(1) Cogroup distinguishes between NULL keys from different relations by creating separate records

A = load ...
B = load ...
C = congroup A by $0, B by $0;
...

Assuming that both A and B contain null values in the key column, C would look as follows:

{
....
NULL,  {.....}, {}
NULL, {}, {...}
....
}

The first record corresponds to all records of A with NULL key and the second with record of B with empty key.

(2) This is consistent with SQL semantics that NULLs are not the same. It will make JOIN work as is and also outer join expressed as COGROUP + FOREACH with Bincond work as with earlier versions.

(3) The required work is to add relation id to the comparison function. Join optimization already does that. We will try to piggyback this issue onto join optimization

> JOIN and cogroup should handle NULLs correctly
> ----------------------------------------------
>
>                 Key: PIG-361
>                 URL: https://issues.apache.org/jira/browse/PIG-361
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: types_branch
>            Reporter: Pradeep Kamath
>            Assignee: Shravan Matthur Narayanamurthy
>             Fix For: types_branch
>
>
> JOIN should follow SQL semantics .i.e if the join key is a null or part of the join key is null in the first table, it should not join with similar keys in the second table.
> Cogroup should coalesce all NULL key rows into one group.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.