You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Lukas Eder <lu...@datageekery.com> on 2014/05/13 11:30:52 UTC
LEFT SEMI JOIN
Hello,
We were approached to add support for Hive SQL in jOOQ [1], which might be useful for a greater community in general. I've gone through the Hive SQL syntax and I've encountered this interesting clause: The LEFT SEMI JOIN clause [2]
Example:
SELECT a.key, a.val
FROM a LEFT SEMI JOIN b on (a.key = b.key)
My question is: Why "LEFT"? I find this misleading. Compare this with a LEFT OUTER JOIN, which will always return all tuples from relation A at least once, regardless if there are any tuples in relation B matched by the JOIN predicate in "ON". So in other words, a LEFT SEMI JOIN is completely useless as it *should* always return ALL tuples from relation A, compared to a more useful "SEMI JOIN" or "INNER SEMI JOIN".
What do you think?
Lukas
[1]: http://www.jooq.org
[2]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins
--
Lukas Eder - Head of R&D | lukas.eder@datageekery.com | +41 44 586 82 56
Data Geekery GmbH | Binzstrasse 23 | CH-8045 Zürich | Switzerland
http://www.datageekery.com | Get back in control of your SQLT
RE: LEFT SEMI JOIN
Posted by java8964 <ja...@hotmail.com>.
>From Hive manual, there is only "left semi join", no "semi join", nor "inner semi join".
>From the Database world, it is just a traditional name for this kind of join: "LEFT semi join", as a reminder to the reader that the resultset comes out from the LEFT table ONLY.
Yong
> From: lukas.eder@datageekery.com
> To: user@hive.apache.org
> Subject: LEFT SEMI JOIN
> Date: Tue, 13 May 2014 09:30:52 +0000
>
> Hello,
>
> We were approached to add support for Hive SQL in jOOQ [1], which might be useful for a greater community in general. I've gone through the Hive SQL syntax and I've encountered this interesting clause: The LEFT SEMI JOIN clause [2]
>
> Example:
> SELECT a.key, a.val
> FROM a LEFT SEMI JOIN b on (a.key = b.key)
>
> My question is: Why "LEFT"? I find this misleading. Compare this with a LEFT OUTER JOIN, which will always return all tuples from relation A at least once, regardless if there are any tuples in relation B matched by the JOIN predicate in "ON". So in other words, a LEFT SEMI JOIN is completely useless as it *should* always return ALL tuples from relation A, compared to a more useful "SEMI JOIN" or "INNER SEMI JOIN".
>
> What do you think?
> Lukas
>
> [1]: http://www.jooq.org
> [2]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins
>
> --
> Lukas Eder - Head of R&D | lukas.eder@datageekery.com | +41 44 586 82 56
> Data Geekery GmbH | Binzstrasse 23 | CH-8045 Zürich | Switzerland
>
> http://www.datageekery.com | Get back in control of your SQLT
>
>
>