You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Koert Kuipers <ko...@tresata.com> on 2012/01/12 23:53:00 UTC

why 1 reducer on simple join?

I am running a basic join of 2 tables and it will only run with 1 reducer.
why is that? i tried to set the number of reducers and it didn't work. hive
just ignored it.

create table z as select x.* from table1 x join table2 y where (
x.col1 = y.col1 and
x.col2 = y.col2 and
x.col3 = y.col3 and
x.col4 = y.col4 and
x.col5 = y.col5
);

both tables are backed by multiple files / blocks / chunks

Re: why 1 reducer on simple join?

Posted by Koert Kuipers <ko...@tresata.com>.

that query without the create table turns into a map-join and runs fast
without any reducers.
if i turn map-join off then it goes back to map-reduce with 1 reducer and
ignores mapred.reduce.tasks again.
i am using hive 0.7

On Thu, Jan 12, 2012 at 6:28 PM, Wojciech Langiewicz
<wl...@gmail.com>wrote:

> I ment this query (without create table....):
>
> select x.* from table1 x join table2 y where (
> x.col1 = y.col1 and
> x.col2 = y.col2 and
> x.col3 = y.col3 and
> x.col4 = y.col4 and
> x.col5 = y.col5
> );
>
> this document might be useful: https://cwiki.apache.org/Hive/**
> joinoptimization.html<https://cwiki.apache.org/Hive/joinoptimization.html>
>
> Especially try this setting:
> set hive.auto.convert.join = true; (or false)
>
> Which version of Hive are you using?
>
>
> On 13.01.2012 00:24, Koert Kuipers wrote:
>
>> hive>  set mapred.reduce.tasks = 3;
>> hive>  select count(*) from table1 group by column1 limit 10;
>> query runs with 38 mappers and 3 reducers
>>
>> hive>  select count(*) from table2 group by column1 limit 10;
>> query runs with 6 mappers and 3 reducers
>>
>> On Thu, Jan 12, 2012 at 6:09 PM, Wojciech Langiewicz
>> <wl...@gmail.com>wrote:
>>
>>  What do you mean by "Select runs fine" - is it using number of reducers
>>> that you set?
>>> It might help if you could show actual query.
>>>
>>>
>>> On 13.01.2012 00:03, Koert Kuipers wrote:
>>>
>>>  I tried set mapred.reduce.tasks = xyz; hive ignored it.
>>>> Selects run fine. The query uses 44 mappers.
>>>>
>>>> On Thu, Jan 12, 2012 at 6:00 PM, Wojciech Langiewicz
>>>> <wl...@gmail.com>wrote:
>>>>
>>>>  Hello,
>>>>
>>>>> Have you tried running only select, without creating table? What are
>>>>> results?
>>>>> How did you tried to set number of reducers? Have you used this:
>>>>> set mapred.reduce.tasks = xyz;
>>>>> How many mappers does this query use?
>>>>>
>>>>>
>>>>> On 12.01.2012 23:53, Koert Kuipers wrote:
>>>>>
>>>>>  I am running a basic join of 2 tables and it will only run with 1
>>>>>
>>>>>> reducer.
>>>>>> why is that? i tried to set the number of reducers and it didn't work.
>>>>>> hive
>>>>>> just ignored it.
>>>>>>
>>>>>> create table z as select x.* from table1 x join table2 y where (
>>>>>> x.col1 = y.col1 and
>>>>>> x.col2 = y.col2 and
>>>>>> x.col3 = y.col3 and
>>>>>> x.col4 = y.col4 and
>>>>>> x.col5 = y.col5
>>>>>> );
>>>>>>
>>>>>> both tables are backed by multiple files / blocks / chunks
>>>>>>
>>>>>>
>>>>>>  --
>>>>>>
>>>>> Wojciech Langiewicz
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: why 1 reducer on simple join?

Posted by Wojciech Langiewicz <wl...@gmail.com>.

I ment this query (without create table....):
select x.* from table1 x join table2 y where (
x.col1 = y.col1 and
x.col2 = y.col2 and
x.col3 = y.col3 and
x.col4 = y.col4 and
x.col5 = y.col5
);

this document might be useful: 
https://cwiki.apache.org/Hive/joinoptimization.html

Especially try this setting:
set hive.auto.convert.join = true; (or false)

Which version of Hive are you using?

On 13.01.2012 00:24, Koert Kuipers wrote:
> hive>  set mapred.reduce.tasks = 3;
> hive>  select count(*) from table1 group by column1 limit 10;
> query runs with 38 mappers and 3 reducers
>
> hive>  select count(*) from table2 group by column1 limit 10;
> query runs with 6 mappers and 3 reducers
>
> On Thu, Jan 12, 2012 at 6:09 PM, Wojciech Langiewicz
> <wl...@gmail.com>wrote:
>
>> What do you mean by "Select runs fine" - is it using number of reducers
>> that you set?
>> It might help if you could show actual query.
>>
>>
>> On 13.01.2012 00:03, Koert Kuipers wrote:
>>
>>> I tried set mapred.reduce.tasks = xyz; hive ignored it.
>>> Selects run fine. The query uses 44 mappers.
>>>
>>> On Thu, Jan 12, 2012 at 6:00 PM, Wojciech Langiewicz
>>> <wl...@gmail.com>wrote:
>>>
>>>   Hello,
>>>> Have you tried running only select, without creating table? What are
>>>> results?
>>>> How did you tried to set number of reducers? Have you used this:
>>>> set mapred.reduce.tasks = xyz;
>>>> How many mappers does this query use?
>>>>
>>>>
>>>> On 12.01.2012 23:53, Koert Kuipers wrote:
>>>>
>>>>   I am running a basic join of 2 tables and it will only run with 1
>>>>> reducer.
>>>>> why is that? i tried to set the number of reducers and it didn't work.
>>>>> hive
>>>>> just ignored it.
>>>>>
>>>>> create table z as select x.* from table1 x join table2 y where (
>>>>> x.col1 = y.col1 and
>>>>> x.col2 = y.col2 and
>>>>> x.col3 = y.col3 and
>>>>> x.col4 = y.col4 and
>>>>> x.col5 = y.col5
>>>>> );
>>>>>
>>>>> both tables are backed by multiple files / blocks / chunks
>>>>>
>>>>>
>>>>>   --
>>>> Wojciech Langiewicz
>>>>
>>>>
>>>
>>
>

Re: why 1 reducer on simple join?

Posted by Koert Kuipers <ko...@tresata.com>.

hive> set mapred.reduce.tasks = 3;
hive> select count(*) from table1 group by column1 limit 10;
query runs with 38 mappers and 3 reducers

hive> select count(*) from table2 group by column1 limit 10;
query runs with 6 mappers and 3 reducers

On Thu, Jan 12, 2012 at 6:09 PM, Wojciech Langiewicz
<wl...@gmail.com>wrote:

> What do you mean by "Select runs fine" - is it using number of reducers
> that you set?
> It might help if you could show actual query.
>
>
> On 13.01.2012 00:03, Koert Kuipers wrote:
>
>> I tried set mapred.reduce.tasks = xyz; hive ignored it.
>> Selects run fine. The query uses 44 mappers.
>>
>> On Thu, Jan 12, 2012 at 6:00 PM, Wojciech Langiewicz
>> <wl...@gmail.com>wrote:
>>
>>  Hello,
>>> Have you tried running only select, without creating table? What are
>>> results?
>>> How did you tried to set number of reducers? Have you used this:
>>> set mapred.reduce.tasks = xyz;
>>> How many mappers does this query use?
>>>
>>>
>>> On 12.01.2012 23:53, Koert Kuipers wrote:
>>>
>>>  I am running a basic join of 2 tables and it will only run with 1
>>>> reducer.
>>>> why is that? i tried to set the number of reducers and it didn't work.
>>>> hive
>>>> just ignored it.
>>>>
>>>> create table z as select x.* from table1 x join table2 y where (
>>>> x.col1 = y.col1 and
>>>> x.col2 = y.col2 and
>>>> x.col3 = y.col3 and
>>>> x.col4 = y.col4 and
>>>> x.col5 = y.col5
>>>> );
>>>>
>>>> both tables are backed by multiple files / blocks / chunks
>>>>
>>>>
>>>>  --
>>> Wojciech Langiewicz
>>>
>>>
>>
>

Re: why 1 reducer on simple join?

Posted by Wojciech Langiewicz <wl...@gmail.com>.

What do you mean by "Select runs fine" - is it using number of reducers 
that you set?
It might help if you could show actual query.

On 13.01.2012 00:03, Koert Kuipers wrote:
> I tried set mapred.reduce.tasks = xyz; hive ignored it.
> Selects run fine. The query uses 44 mappers.
>
> On Thu, Jan 12, 2012 at 6:00 PM, Wojciech Langiewicz
> <wl...@gmail.com>wrote:
>
>> Hello,
>> Have you tried running only select, without creating table? What are
>> results?
>> How did you tried to set number of reducers? Have you used this:
>> set mapred.reduce.tasks = xyz;
>> How many mappers does this query use?
>>
>>
>> On 12.01.2012 23:53, Koert Kuipers wrote:
>>
>>> I am running a basic join of 2 tables and it will only run with 1 reducer.
>>> why is that? i tried to set the number of reducers and it didn't work.
>>> hive
>>> just ignored it.
>>>
>>> create table z as select x.* from table1 x join table2 y where (
>>> x.col1 = y.col1 and
>>> x.col2 = y.col2 and
>>> x.col3 = y.col3 and
>>> x.col4 = y.col4 and
>>> x.col5 = y.col5
>>> );
>>>
>>> both tables are backed by multiple files / blocks / chunks
>>>
>>>
>> --
>> Wojciech Langiewicz
>>
>

Re: why 1 reducer on simple join?

Posted by Koert Kuipers <ko...@tresata.com>.

I tried set mapred.reduce.tasks = xyz; hive ignored it.
Selects run fine. The query uses 44 mappers.

On Thu, Jan 12, 2012 at 6:00 PM, Wojciech Langiewicz
<wl...@gmail.com>wrote:

> Hello,
> Have you tried running only select, without creating table? What are
> results?
> How did you tried to set number of reducers? Have you used this:
> set mapred.reduce.tasks = xyz;
> How many mappers does this query use?
>
>
> On 12.01.2012 23:53, Koert Kuipers wrote:
>
>> I am running a basic join of 2 tables and it will only run with 1 reducer.
>> why is that? i tried to set the number of reducers and it didn't work.
>> hive
>> just ignored it.
>>
>> create table z as select x.* from table1 x join table2 y where (
>> x.col1 = y.col1 and
>> x.col2 = y.col2 and
>> x.col3 = y.col3 and
>> x.col4 = y.col4 and
>> x.col5 = y.col5
>> );
>>
>> both tables are backed by multiple files / blocks / chunks
>>
>>
> --
> Wojciech Langiewicz
>

Re: why 1 reducer on simple join?

Posted by Koert Kuipers <ko...@tresata.com>.

good point... i should have used ON... with ON it runs fine as a map-join,
and if i set hive.auto.convert.join=false then it runs with my specified
number of reducers.

with right number of reducers

On Thu, Jan 12, 2012 at 6:12 PM, Edward Capriolo <ed...@gmail.com>wrote:

> You should do joins using the ON clause.
> https://cwiki.apache.org/Hive/languagemanual-joins.html
> be careful if you do the joins wrong hive does a Cartesian product
> followed by a really long reduce phase rather then the optimal join process.
>
> On Thu, Jan 12, 2012 at 6:04 PM, Aaron McCurry <am...@gmail.com> wrote:
>
>> I see that your query is kinda generic and probably not the original
>> query.  I have seen this behavior with a simple typo like:
>>
>> Notice col3.
>>
>> create table z as select x.* from table1 x join table2 y where (
>> x.col1 = y.col1 and
>> x.col2 = y.col2 and
>> y.col3 = y.col3 and
>> x.col4 = y.col4 and
>> x.col5 = y.col5
>> );
>>
>> Just a thought.
>>
>> Aaron
>>
>> On Thu, Jan 12, 2012 at 6:00 PM, Wojciech Langiewicz <
>> wlangiewicz@gmail.com> wrote:
>>
>>> Hello,
>>> Have you tried running only select, without creating table? What are
>>> results?
>>> How did you tried to set number of reducers? Have you used this:
>>> set mapred.reduce.tasks = xyz;
>>> How many mappers does this query use?
>>>
>>>
>>> On 12.01.2012 23:53, Koert Kuipers wrote:
>>>
>>>> I am running a basic join of 2 tables and it will only run with 1
>>>> reducer.
>>>> why is that? i tried to set the number of reducers and it didn't work.
>>>> hive
>>>> just ignored it.
>>>>
>>>> create table z as select x.* from table1 x join table2 y where (
>>>> x.col1 = y.col1 and
>>>> x.col2 = y.col2 and
>>>> x.col3 = y.col3 and
>>>> x.col4 = y.col4 and
>>>> x.col5 = y.col5
>>>> );
>>>>
>>>> both tables are backed by multiple files / blocks / chunks
>>>>
>>>>
>>> --
>>> Wojciech Langiewicz
>>>
>>
>>
>

Re: why 1 reducer on simple join?

Posted by Edward Capriolo <ed...@gmail.com>.

You should do joins using the ON clause.
https://cwiki.apache.org/Hive/languagemanual-joins.html
be careful if you do the joins wrong hive does a Cartesian product followed
by a really long reduce phase rather then the optimal join process.

On Thu, Jan 12, 2012 at 6:04 PM, Aaron McCurry <am...@gmail.com> wrote:

> I see that your query is kinda generic and probably not the original
> query.  I have seen this behavior with a simple typo like:
>
> Notice col3.
>
> create table z as select x.* from table1 x join table2 y where (
> x.col1 = y.col1 and
> x.col2 = y.col2 and
> y.col3 = y.col3 and
> x.col4 = y.col4 and
> x.col5 = y.col5
> );
>
> Just a thought.
>
> Aaron
>
> On Thu, Jan 12, 2012 at 6:00 PM, Wojciech Langiewicz <
> wlangiewicz@gmail.com> wrote:
>
>> Hello,
>> Have you tried running only select, without creating table? What are
>> results?
>> How did you tried to set number of reducers? Have you used this:
>> set mapred.reduce.tasks = xyz;
>> How many mappers does this query use?
>>
>>
>> On 12.01.2012 23:53, Koert Kuipers wrote:
>>
>>> I am running a basic join of 2 tables and it will only run with 1
>>> reducer.
>>> why is that? i tried to set the number of reducers and it didn't work.
>>> hive
>>> just ignored it.
>>>
>>> create table z as select x.* from table1 x join table2 y where (
>>> x.col1 = y.col1 and
>>> x.col2 = y.col2 and
>>> x.col3 = y.col3 and
>>> x.col4 = y.col4 and
>>> x.col5 = y.col5
>>> );
>>>
>>> both tables are backed by multiple files / blocks / chunks
>>>
>>>
>> --
>> Wojciech Langiewicz
>>
>
>

Re: why 1 reducer on simple join?

Posted by Koert Kuipers <ko...@tresata.com>.

thanks for that tip. i checked and unfortunately no typo like that.

would be kind of weird if an identity like that caused a single reducer!

On Thu, Jan 12, 2012 at 6:04 PM, Aaron McCurry <am...@gmail.com> wrote:

> I see that your query is kinda generic and probably not the original
> query.  I have seen this behavior with a simple typo like:
>
> Notice col3.
>
> create table z as select x.* from table1 x join table2 y where (
> x.col1 = y.col1 and
> x.col2 = y.col2 and
> y.col3 = y.col3 and
> x.col4 = y.col4 and
> x.col5 = y.col5
> );
>
> Just a thought.
>
> Aaron
>
> On Thu, Jan 12, 2012 at 6:00 PM, Wojciech Langiewicz <
> wlangiewicz@gmail.com> wrote:
>
>> Hello,
>> Have you tried running only select, without creating table? What are
>> results?
>> How did you tried to set number of reducers? Have you used this:
>> set mapred.reduce.tasks = xyz;
>> How many mappers does this query use?
>>
>>
>> On 12.01.2012 23:53, Koert Kuipers wrote:
>>
>>> I am running a basic join of 2 tables and it will only run with 1
>>> reducer.
>>> why is that? i tried to set the number of reducers and it didn't work.
>>> hive
>>> just ignored it.
>>>
>>> create table z as select x.* from table1 x join table2 y where (
>>> x.col1 = y.col1 and
>>> x.col2 = y.col2 and
>>> x.col3 = y.col3 and
>>> x.col4 = y.col4 and
>>> x.col5 = y.col5
>>> );
>>>
>>> both tables are backed by multiple files / blocks / chunks
>>>
>>>
>> --
>> Wojciech Langiewicz
>>
>
>

Re: why 1 reducer on simple join?

Posted by Aaron McCurry <am...@gmail.com>.

I see that your query is kinda generic and probably not the original query.
 I have seen this behavior with a simple typo like:

Notice col3.

create table z as select x.* from table1 x join table2 y where (
x.col1 = y.col1 and
x.col2 = y.col2 and
y.col3 = y.col3 and
x.col4 = y.col4 and
x.col5 = y.col5
);

Just a thought.

Aaron

On Thu, Jan 12, 2012 at 6:00 PM, Wojciech Langiewicz
<wl...@gmail.com>wrote:

> Hello,
> Have you tried running only select, without creating table? What are
> results?
> How did you tried to set number of reducers? Have you used this:
> set mapred.reduce.tasks = xyz;
> How many mappers does this query use?
>
>
> On 12.01.2012 23:53, Koert Kuipers wrote:
>
>> I am running a basic join of 2 tables and it will only run with 1 reducer.
>> why is that? i tried to set the number of reducers and it didn't work.
>> hive
>> just ignored it.
>>
>> create table z as select x.* from table1 x join table2 y where (
>> x.col1 = y.col1 and
>> x.col2 = y.col2 and
>> x.col3 = y.col3 and
>> x.col4 = y.col4 and
>> x.col5 = y.col5
>> );
>>
>> both tables are backed by multiple files / blocks / chunks
>>
>>
> --
> Wojciech Langiewicz
>

Re: why 1 reducer on simple join?

Posted by Wojciech Langiewicz <wl...@gmail.com>.

Hello,
Have you tried running only select, without creating table? What are 
results?
How did you tried to set number of reducers? Have you used this:
set mapred.reduce.tasks = xyz;
How many mappers does this query use?

On 12.01.2012 23:53, Koert Kuipers wrote:
> I am running a basic join of 2 tables and it will only run with 1 reducer.
> why is that? i tried to set the number of reducers and it didn't work. hive
> just ignored it.
>
> create table z as select x.* from table1 x join table2 y where (
> x.col1 = y.col1 and
> x.col2 = y.col2 and
> x.col3 = y.col3 and
> x.col4 = y.col4 and
> x.col5 = y.col5
> );
>
> both tables are backed by multiple files / blocks / chunks
>

--
Wojciech Langiewicz