You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "David Ciemiewicz (JIRA)" <ji...@apache.org> on 2010/04/30 20:58:54 UTC

[jira] Created: (PIG-1400) add option for null field JOIN semantics

add option for null field JOIN semantics
----------------------------------------

                 Key: PIG-1400
                 URL: https://issues.apache.org/jira/browse/PIG-1400
             Project: Pig
          Issue Type: Bug
    Affects Versions: 0.6.0
            Reporter: David Ciemiewicz


Currently JOIN supports SQL semantics for joining null values in fields - they aren't matched.

However, GROUP ... and COGROUP ... semantics DO match on null values in fields.

This violated the principle of least astonishment for me - I expected JOIN on null value fields to work.

As a work around, I must now go through ALL of my code to convert chararray null values to empty strings to get the JOIN to work appropriately.

{code}
A = foreach A generate
    ((a is not null) ? a : '') as a,
    ((b is not null) ? b : '') as b,
    ...
{code}

This does not really a satisfactory work around.


My preference is that JOIN support an option (ala FULL, LEFT, RIGHT, OUTER) that directs JOIN to support null match join semantics just like COGROUP does.

Something like:

{code}
AB = JOIN A by ( key, subkey ) FULL OUTER MATCHNULLS, B by ( key, subkey );
{code}

Don't know if it should be called JOIN_NULLS, MATCHNULLS, NULLS, NULLSEMANTICS, what have you.

I just think it would be much cleaner for the end user to be able get these semantics.

We might also consider being explicit about the SQL null semantics by adding the option SQLNULLS or NONULLMATCH.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.