You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Namit Jain (JIRA)" <ji...@apache.org> on 2009/08/08 07:22:14 UTC

[jira] Commented: (HIVE-742) joins dont handle null correctly

    [ https://issues.apache.org/jira/browse/HIVE-742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740838#action_12740838 ] 

Namit Jain commented on HIVE-742:
---------------------------------

Pasted from http://issues.apache.org/jira/browse/HIVE-734 as Ning corrrectly found out.

The issue is not null vs. null comparison - the join happens at the reducer where the keys are not even taken into consideration.
NULL hashes to the same value, which explains this behavior.

The correct fix would be to filter the rows with NULL keys at the mapper itself.

However, we need to figure out the following:

1. Expected semantics when join key consists of a null and a non-null column. If that is also same (4, NULL) does not join with (4, NULL), then we can safely
    filter all rows even if one key is NULL.
2. Expected behavior of outer joins.



> joins dont handle null correctly 
> ---------------------------------
>
>                 Key: HIVE-742
>                 URL: https://issues.apache.org/jira/browse/HIVE-742
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Namit Jain
>
> Ning Zhang added a comment - 07/Aug/09 10:46 AM
> BTW, I tried joins and it doesn't handle NULL semantics correctly. Here's the data and results:
> Table:
> Key Value
> ------- -------
> NULL 325
> 18 NULL
> Query:
> select * from input4_cb a join input4_cb b on a.key = b.value;
> Result:
> NULL 325 18 NULL
> The correct result should be empty set.
> I guess the NULL vs. NULL comparison is incorrectly evaluated.
> [ Show ยป ]
> Ning Zhang added a comment - 07/Aug/09 10:46 AM BTW, I tried joins and it doesn't handle NULL semantics correctly. Here's the data and results: Table: Key Value ------- ------- NULL 325 18 NULL Query: select * from input4_cb a join input4_cb b on a.key = b.value; Result: NULL 325 18 NULL The correct result should be empty set. I guess the NULL vs. NULL comparison is incorrectly evaluated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.