You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Neil Xu <ne...@gmail.com> on 2010/08/25 11:40:37 UTC
How is Union All optimized in Hive
I tried a query like below, same table, same column and different
conditions, only one MR job generated, is it optimized by Hive itself? and
is the' table_1' only scanned once? who can give some details, thanks!
select a, b, c from table_1 where ...
union all
select a, b, c from table_1 where ...
union all
select a, b, c from table_1 where ...
Chocobo
RE: How is Union All optimized in Hive
Posted by Namit Jain <nj...@facebook.com>.
Yes, the table should be scanned only once.
From: Neil Xu [mailto:neil.xuxf@gmail.com]
Sent: Thursday, August 26, 2010 11:00 AM
To: hive-user@hadoop.apache.org
Subject: Re: How is Union All optimized in Hive
Hi, Namit,
Thanks for your reply, now I see that hive will optimize those kinds of jobs, but when I use 'explain' to see the syntax tree of the hql, I find 3 table scan in the tree, is table_1 really scanned only once? I am not quite familiar with the syntax tree.
2010/8/26 Namit Jain <nj...@facebook.com>>
Yes, it is optimized by hive. There will be only 1 mr job, even if the columns selected were different.
-namit
________________________________________
From: Neil Xu [neil.xuxf@gmail.com<ma...@gmail.com>]
Sent: Wednesday, August 25, 2010 2:40 AM
To: hive-user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: How is Union All optimized in Hive
I tried a query like below, same table, same column and different conditions, only one MR job generated, is it optimized by Hive itself? and is the' table_1' only scanned once? who can give some details, thanks!
select a, b, c from table_1 where ...
union all
select a, b, c from table_1 where ...
union all
select a, b, c from table_1 where ...
Chocobo
Re: How is Union All optimized in Hive
Posted by Neil Xu <ne...@gmail.com>.
Hi, Namit,
Thanks for your reply, now I see that hive will optimize those kinds of
jobs, but when I use 'explain' to see the syntax tree of the hql, I find 3
table scan in the tree, is table_1 really scanned only once? I am not quite
familiar with the syntax tree.
2010/8/26 Namit Jain <nj...@facebook.com>
> Yes, it is optimized by hive. There will be only 1 mr job, even if the
> columns selected were different.
>
>
> -namit
>
> ________________________________________
> From: Neil Xu [neil.xuxf@gmail.com]
> Sent: Wednesday, August 25, 2010 2:40 AM
> To: hive-user@hadoop.apache.org
> Subject: How is Union All optimized in Hive
>
> I tried a query like below, same table, same column and different
> conditions, only one MR job generated, is it optimized by Hive itself? and
> is the' table_1' only scanned once? who can give some details, thanks!
>
> select a, b, c from table_1 where ...
> union all
> select a, b, c from table_1 where ...
> union all
> select a, b, c from table_1 where ...
>
> Chocobo
>
RE: How is Union All optimized in Hive
Posted by Namit Jain <nj...@facebook.com>.
Yes, it is optimized by hive. There will be only 1 mr job, even if the columns selected were different.
-namit
________________________________________
From: Neil Xu [neil.xuxf@gmail.com]
Sent: Wednesday, August 25, 2010 2:40 AM
To: hive-user@hadoop.apache.org
Subject: How is Union All optimized in Hive
I tried a query like below, same table, same column and different conditions, only one MR job generated, is it optimized by Hive itself? and is the' table_1' only scanned once? who can give some details, thanks!
select a, b, c from table_1 where ...
union all
select a, b, c from table_1 where ...
union all
select a, b, c from table_1 where ...
Chocobo