You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Lukas Eder <lu...@datageekery.com> on 2014/05/13 11:30:52 UTC

LEFT SEMI JOIN

Hello,

We were approached to add support for Hive SQL in jOOQ [1], which might be useful for a greater community in general. I've gone through the Hive SQL syntax and I've encountered this interesting clause: The LEFT SEMI JOIN clause [2]

Example:
SELECT a.key, a.val
FROM a LEFT SEMI JOIN b on (a.key = b.key)

My question is: Why "LEFT"? I find this misleading. Compare this with a LEFT OUTER JOIN, which will always return all tuples from relation A at least once, regardless if there are any tuples in relation B matched by the JOIN predicate in "ON". So in other words, a LEFT SEMI JOIN is completely useless as it *should* always return ALL tuples from relation A, compared to a more useful "SEMI JOIN" or "INNER SEMI JOIN".

What do you think?
Lukas

[1]: http://www.jooq.org
[2]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins

--
Lukas Eder - Head of R&D | lukas.eder@datageekery.com | +41 44 586 82 56
Data Geekery GmbH | Binzstrasse 23 | CH-8045 Zürich | Switzerland

http://www.datageekery.com | Get back in control of your SQLT




RE: LEFT SEMI JOIN

Posted by java8964 <ja...@hotmail.com>.
>From Hive manual, there is only "left semi join", no "semi join", nor "inner semi join".
>From the Database world, it is just a traditional name for this kind of join: "LEFT semi join", as a reminder to the reader that the resultset comes out from the LEFT table ONLY.
Yong

> From: lukas.eder@datageekery.com
> To: user@hive.apache.org
> Subject: LEFT SEMI JOIN
> Date: Tue, 13 May 2014 09:30:52 +0000
> 
> Hello,
> 
> We were approached to add support for Hive SQL in jOOQ [1], which might be useful for a greater community in general. I've gone through the Hive SQL syntax and I've encountered this interesting clause: The LEFT SEMI JOIN clause [2]
> 
> Example:
> SELECT a.key, a.val
> FROM a LEFT SEMI JOIN b on (a.key = b.key)
> 
> My question is: Why "LEFT"? I find this misleading. Compare this with a LEFT OUTER JOIN, which will always return all tuples from relation A at least once, regardless if there are any tuples in relation B matched by the JOIN predicate in "ON". So in other words, a LEFT SEMI JOIN is completely useless as it *should* always return ALL tuples from relation A, compared to a more useful "SEMI JOIN" or "INNER SEMI JOIN".
> 
> What do you think?
> Lukas
> 
> [1]: http://www.jooq.org
> [2]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins
> 
> --
> Lukas Eder - Head of R&D | lukas.eder@datageekery.com | +41 44 586 82 56
> Data Geekery GmbH | Binzstrasse 23 | CH-8045 Zürich | Switzerland
> 
> http://www.datageekery.com | Get back in control of your SQLT
> 
> 
>