You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by bharath v <bh...@gmail.com> on 2010/02/03 10:25:29 UTC

intermediate data written to the disk?

Hi ,

I have a small doubt in how hive handles queries containing join of more
than 2 tables .

Suppose we have 3 tables A,B,C .. and the plan is  "((AB)C)" ..
We can join A,B in a map reduce job and join the resultant table with "C". I
have a doubt whether the result of "AB" is stored to disk before joining
with C or is it streamed directly to join with C (I don't know how , just a
guess) .


Any help is appreciated ,

Thanks

Re: intermediate data written to the disk?

Posted by Zheng Shao <zs...@gmail.com>.

If the join key is the same, you can use "unique join" to make sure
it's done in a single map-reduce job.


Zheng

On Wed, Feb 3, 2010 at 1:25 AM, bharath v
<bh...@gmail.com> wrote:
> Hi ,
>
> I have a small doubt in how hive handles queries containing join of more
> than 2 tables .
>
> Suppose we have 3 tables A,B,C .. and the plan is  "((AB)C)" ..
> We can join A,B in a map reduce job and join the resultant table with "C". I
> have a doubt whether the result of "AB" is stored to disk before joining
> with C or is it streamed directly to join with C (I don't know how , just a
> guess) .
>
>
> Any help is appreciated ,
>
> Thanks



-- 
Yours,
Zheng