You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "Namit Jain (JIRA)" <ji...@apache.org> on 2009/11/05 02:33:32 UTC

[jira] Commented: (HIVE-870) semi joins

    [ https://issues.apache.org/jira/browse/HIVE-870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773736#action_12773736 ] 

Namit Jain commented on HIVE-870:
---------------------------------

talked with Ning offline.

2 things need to be tested:

1. Why can't we have a group by in case of map join
2. There seems to be a problem in the case: A left semi join B right outer join C 


> semi joins
> ----------
>
>                 Key: HIVE-870
>                 URL: https://issues.apache.org/jira/browse/HIVE-870
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>         Attachments: Hive-870.patch
>
>
> Semi-join is an efficient way to unnest an IN/EXISTS subquery. For example,
> select * 
> from A
> where A.id IN 
>    (select id
>     from B
>     where B.date> '2009-10-01');
> returns from A whose ID is in the set of IDs found in B, whose date is greater than a certain date. This query can be unnested using a INNER join or LEFT OUTER JOIN, but we need to deduplicate the IDs returned by the subquery on table B. The semantics of LEFT SEMI JOIN is that as long as there is ANY row in the right-hand table that matches the join key, the left-hand table row will be emitted as a result w/o necessarily looking further in the right-hand table for further matches. This is exactly the semantics of the IN subquery. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.