You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Oliver Keyes <ok...@wikimedia.org> on 2014/02/07 01:26:10 UTC

Joins between databases

Hey all,

So, I'm new to hive (I come to it from MySQL/MariaDB) and I've spent the
last couple of days banging my head against the problem of trying to
retrieve data from a join of two tables in different databases. I
understand that the db.table.column syntax is not supported in hive, and
that instead it's recommended to do db.table AS alias, and then
alias.column, but knitting this together is not working for some reason;
table aliases and joins seem to not like each other much. At the moment,
I've spent a lot of time noodling and eventually settled on:

SELECT db1.col1,
db1.col2,
db1.col3 FROM database1.table1 AS db1 INNER JOIN database2.table2 AS db2 ON
db1.col1 = db2.col1 WHERE db1.year = 2014 AND db1.month = 1 AND db1.day =
20;

This gets rejected pretty quickly ("missing EOF at 'AS' near table1", and
so I've tried experimenting with, say, retrieving the entire dataset in a
subquery and then selecting from that, but each time I run into the same
sort of problem. Can anybody help point out where I'm going wrong?

Thanks!
-Oliver

Re: Joins between databases

Posted by Oliver Keyes <ok...@wikimedia.org>.
Noted for future queries, and thanks for the help! Works like a charm :).

Best,

-- 
Oliver Keyes
Product Analyst
Wikimedia Foundation

On 6 February 2014 17:09, Stephen Sprague <sp...@gmail.com> wrote:

> ahh. we got ourselves a bona-fide head banger.  Welcome to the club!! :)
>
> select
>     a.col1 as foo,
>     a.col2 as bar,
>     a.col3 as baz
>
> from
>     database1.table1 a
>     INNER JOIN database2.table2 b on a.col1 = b.col1
>
> where
>     a.year = 2014
>     and a.month = 1
>     and a.day = 20
>
>
> so lose the 'AS' for the table alias but keep it for the column alias.
> see what happens.
>
> Also, good to tell us what version of Hive you're running 'cuz there's a
> few out there.
>
> Cheers,
> Stephen.
>
>
> On Thu, Feb 6, 2014 at 4:26 PM, Oliver Keyes <ok...@wikimedia.org> wrote:
>
>> Hey all,
>>
>> So, I'm new to hive (I come to it from MySQL/MariaDB) and I've spent the
>> last couple of days banging my head against the problem of trying to
>> retrieve data from a join of two tables in different databases. I
>> understand that the db.table.column syntax is not supported in hive, and
>> that instead it's recommended to do db.table AS alias, and then
>> alias.column, but knitting this together is not working for some reason;
>> table aliases and joins seem to not like each other much. At the moment,
>> I've spent a lot of time noodling and eventually settled on:
>>
>> SELECT db1.col1,
>> db1.col2,
>> db1.col3 FROM database1.table1 AS db1 INNER JOIN database2.table2 AS db2
>> ON db1.col1 = db2.col1 WHERE db1.year = 2014 AND db1.month = 1 AND db1.day
>> = 20;
>>
>> This gets rejected pretty quickly ("missing EOF at 'AS' near table1", and
>> so I've tried experimenting with, say, retrieving the entire dataset in a
>> subquery and then selecting from that, but each time I run into the same
>> sort of problem. Can anybody help point out where I'm going wrong?
>>
>> Thanks!
>> -Oliver
>>
>
>

Re: Joins between databases

Posted by Stephen Sprague <sp...@gmail.com>.
ahh. we got ourselves a bona-fide head banger.  Welcome to the club!! :)

select
    a.col1 as foo,
    a.col2 as bar,
    a.col3 as baz

from
    database1.table1 a
    INNER JOIN database2.table2 b on a.col1 = b.col1

where
    a.year = 2014
    and a.month = 1
    and a.day = 20


so lose the 'AS' for the table alias but keep it for the column alias.  see
what happens.

Also, good to tell us what version of Hive you're running 'cuz there's a
few out there.

Cheers,
Stephen.


On Thu, Feb 6, 2014 at 4:26 PM, Oliver Keyes <ok...@wikimedia.org> wrote:

> Hey all,
>
> So, I'm new to hive (I come to it from MySQL/MariaDB) and I've spent the
> last couple of days banging my head against the problem of trying to
> retrieve data from a join of two tables in different databases. I
> understand that the db.table.column syntax is not supported in hive, and
> that instead it's recommended to do db.table AS alias, and then
> alias.column, but knitting this together is not working for some reason;
> table aliases and joins seem to not like each other much. At the moment,
> I've spent a lot of time noodling and eventually settled on:
>
> SELECT db1.col1,
> db1.col2,
> db1.col3 FROM database1.table1 AS db1 INNER JOIN database2.table2 AS db2
> ON db1.col1 = db2.col1 WHERE db1.year = 2014 AND db1.month = 1 AND db1.day
> = 20;
>
> This gets rejected pretty quickly ("missing EOF at 'AS' near table1", and
> so I've tried experimenting with, say, retrieving the entire dataset in a
> subquery and then selecting from that, but each time I run into the same
> sort of problem. Can anybody help point out where I'm going wrong?
>
> Thanks!
> -Oliver
>