You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "David Ciemiewicz (JIRA)" <ji...@apache.org> on 2010/04/30 20:58:54 UTC
[jira] Created: (PIG-1400) add option for null field JOIN semantics
add option for null field JOIN semantics
----------------------------------------
Key: PIG-1400
URL: https://issues.apache.org/jira/browse/PIG-1400
Project: Pig
Issue Type: Bug
Affects Versions: 0.6.0
Reporter: David Ciemiewicz
Currently JOIN supports SQL semantics for joining null values in fields - they aren't matched.
However, GROUP ... and COGROUP ... semantics DO match on null values in fields.
This violated the principle of least astonishment for me - I expected JOIN on null value fields to work.
As a work around, I must now go through ALL of my code to convert chararray null values to empty strings to get the JOIN to work appropriately.
{code}
A = foreach A generate
((a is not null) ? a : '') as a,
((b is not null) ? b : '') as b,
...
{code}
This does not really a satisfactory work around.
My preference is that JOIN support an option (ala FULL, LEFT, RIGHT, OUTER) that directs JOIN to support null match join semantics just like COGROUP does.
Something like:
{code}
AB = JOIN A by ( key, subkey ) FULL OUTER MATCHNULLS, B by ( key, subkey );
{code}
Don't know if it should be called JOIN_NULLS, MATCHNULLS, NULLS, NULLSEMANTICS, what have you.
I just think it would be much cleaner for the end user to be able get these semantics.
We might also consider being explicit about the SQL null semantics by adding the option SQLNULLS or NONULLMATCH.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.