You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by kumar avijit <av...@oracle.com> on 2012/08/04 00:16:58 UTC

Parsing nested joins in Hive

Hi Hive users,

According to the grammar for Joins at 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins, 
Hive should be able to parse joins where the table reference on the 
right-hand side is itself a join. For instance, one should be able to 
specify (a join (b join c)) as:
a LEFT SEMI JOIN b LEFT SEMI JOIN c ON b.x = c.x ON a.y = b.y

However, I have been unsuccessful in issuing such queries to Hive. I am 
using Hive-0.9.0:
hive> select * from orderdetails left semi join orders left semi join 
sites on (sites.siteid = orders.siteid) on (orderdetails.orderid = 
orders.orderid);
FAILED: Parse Error: line 1:104 mismatched input 'on' expecting EOF near ')'

Explicit parenthesization of the join leads to parse error too -- 
parenthesis supposedly signals start of a subquery.
hive> select * from orderdetails left semi join ( orders left semi join 
sites on (sites.siteid = orders.siteid) ) on 
(orderdetails.orderid=orders.orderid);
FAILED: Parse Error: line 1:44 cannot recognize input near 'orders' 
'left' 'semi' in subquery source


1. It seems to me that Hive parser currently only supports joins 
specified in a left-recursive manner, i.e, rhs limited to a table name. 
Is this correct? The manual merely says that all Joins are assumed 
left-associative, but I could not find a mention of whether 
parenthesization is allowed for nesting.
2. Are there plans to support arbitrarily nested joins in future, if not 
currently allowed?

Regards,
Avijit



Re: Parsing nested joins in Hive

Posted by "Tucker, Matt" <Ma...@disney.com>.
Hi,

Try the following statement instead:

Select *
from orderdetails
    left semi join (
        select orderid
        from orders
            left semi join sites on
                orders.siteid = sites.siteid
    ) orders on
        orderdetails.orderid = orders.orderid;

Matt

On Aug 3, 2012, at 6:17 PM, "kumar avijit" <av...@oracle.com> wrote:

> select * from orderdetails left semi join ( orders left semi join 
> sites on (sites.siteid = orders.siteid) ) on 
> (orderdetails.orderid=orders.orderid);