You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Neil Xu <ne...@gmail.com> on 2010/08/25 11:40:37 UTC

How is Union All optimized in Hive

I tried a query like below, same table, same column and different
conditions, only one MR job generated,  is it optimized by Hive itself? and
is the' table_1' only scanned once? who can give some details, thanks!

select a, b, c from table_1 where ...
union all
select a, b, c from table_1 where ...
union all
select a, b, c from table_1 where ...

Chocobo

RE: How is Union All optimized in Hive

Posted by Namit Jain <nj...@facebook.com>.
Yes, the table should be scanned only once.


From: Neil Xu [mailto:neil.xuxf@gmail.com]
Sent: Thursday, August 26, 2010 11:00 AM
To: hive-user@hadoop.apache.org
Subject: Re: How is Union All optimized in Hive

Hi, Namit,

Thanks for your reply, now I see that hive will optimize those kinds of jobs, but when I use 'explain' to see the syntax tree of the hql, I find 3 table scan in the tree, is table_1 really scanned only once? I am not quite familiar with the syntax tree.

2010/8/26 Namit Jain <nj...@facebook.com>>
Yes, it is optimized by hive. There will be only 1 mr job, even if the columns selected were different.


-namit

________________________________________
From: Neil Xu [neil.xuxf@gmail.com<ma...@gmail.com>]
Sent: Wednesday, August 25, 2010 2:40 AM
To: hive-user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: How is Union All optimized in Hive

I tried a query like below, same table, same column and different conditions, only one MR job generated,  is it optimized by Hive itself? and is the' table_1' only scanned once? who can give some details, thanks!

select a, b, c from table_1 where ...
union all
select a, b, c from table_1 where ...
union all
select a, b, c from table_1 where ...

Chocobo


Re: How is Union All optimized in Hive

Posted by Neil Xu <ne...@gmail.com>.
Hi, Namit,

Thanks for your reply, now I see that hive will optimize those kinds of
jobs, but when I use 'explain' to see the syntax tree of the hql, I find 3
table scan in the tree, is table_1 really scanned only once? I am not quite
familiar with the syntax tree.


2010/8/26 Namit Jain <nj...@facebook.com>

> Yes, it is optimized by hive. There will be only 1 mr job, even if the
> columns selected were different.
>
>
> -namit
>
> ________________________________________
> From: Neil Xu [neil.xuxf@gmail.com]
> Sent: Wednesday, August 25, 2010 2:40 AM
> To: hive-user@hadoop.apache.org
> Subject: How is Union All optimized in Hive
>
> I tried a query like below, same table, same column and different
> conditions, only one MR job generated,  is it optimized by Hive itself? and
> is the' table_1' only scanned once? who can give some details, thanks!
>
> select a, b, c from table_1 where ...
> union all
> select a, b, c from table_1 where ...
> union all
> select a, b, c from table_1 where ...
>
> Chocobo
>

RE: How is Union All optimized in Hive

Posted by Namit Jain <nj...@facebook.com>.
Yes, it is optimized by hive. There will be only 1 mr job, even if the columns selected were different.


-namit

________________________________________
From: Neil Xu [neil.xuxf@gmail.com]
Sent: Wednesday, August 25, 2010 2:40 AM
To: hive-user@hadoop.apache.org
Subject: How is Union All optimized in Hive

I tried a query like below, same table, same column and different conditions, only one MR job generated,  is it optimized by Hive itself? and is the' table_1' only scanned once? who can give some details, thanks!

select a, b, c from table_1 where ...
union all
select a, b, c from table_1 where ...
union all
select a, b, c from table_1 where ...

Chocobo