You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Ravi Jagannathan <Ra...@nominum.com> on 2009/08/25 22:08:25 UTC

How to decrease the number of Mappers (not reducers) ?


There are too many mappers in Hive. Table has approximately 50K rows, number of bytes = 5,654,500.
the query is select count(1) from TABLE group by COLUMN
There are only 2 nodes.
On the Web UI I can see there are 1001 maps spawned, each of which takes 1 sec to run. There are only 2 mappers running at a time, this means 10001 = 15 minutes seconds to run which is unacceptable.
Thereafter the reduce> copy takes another 10 minutes. The reducers reduce>reduce finished very fast. How can I reduce the number of maps.

Things I tried:
I tried changing the hadoop-site.xml and restarting hive and hadoop server. But the map parameters mapred.map.tasks which I changed are not showing up in job.xml - as if Hive suppressed these changes. The python hive client does not allow a set command. I tried the cli set, but that has no effect either.
Hadoop-0.19.1, hive 0.3

Re: How to decrease the number of Mappers (not reducers) ?

Posted by Zheng Shao <zs...@gmail.com>.
I guess you have a lot of small files in the table.
Can you merge those small files into bigger files?


Zheng

On Tue, Aug 25, 2009 at 1:08 PM, Ravi Jagannathan <
Ravi.Jagannathan@nominum.com> wrote:

>
>
>
>
> There are too many mappers in Hive. Table has approximately 50K rows,
> number of bytes = 5,654,500.
>
> the query is select count(1) from TABLE group by COLUMN
>
> There are only 2 nodes.
>
> On the Web UI I can see there are 1001 maps spawned, each of which takes 1
> sec to run. There are only 2 mappers running at a time, this means 10001 =
> 15 minutes seconds to run which is unacceptable.
>
> Thereafter the reduce> copy takes another 10 minutes. The reducers
> reduce>reduce finished very fast. How can I reduce the number of maps.
>
>
> Things I tried:
> I tried changing the hadoop-site.xml and restarting hive and hadoop server.
> But the map parameters mapred.map.tasks which I changed are not showing up
> in job.xml - as if Hive suppressed these changes. The python hive client
> does not allow a set command. I tried the cli set, but that has no effect
> either.
>
> Hadoop-0.19.1, hive 0.3
>



-- 
Yours,
Zheng