You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Zheng Shao <zs...@gmail.com> on 2009/08/04 05:58:01 UTC

Re: why insert overwrite table tmp partition(dt=1) select bar, foo from pokes NEEDS 2 MR JOBS?

Hi Min,

We recently added a capability to Hive to merge small output files.

You can do the following to disable that feature:
set hive.merge.mapfiles=false;


OR you can adjust the following parameter to determine when the
additional merge job should run:
set hive.merge.size.per.task=256000000;

By default it's 256MB which means if the average output of a mapper is
smaller than 256MB, an additional job will run.
You can set that number to something like 64MB if you want.

Zheng

On Mon, Aug 3, 2009 at 8:02 PM, Min Zhou<co...@gmail.com> wrote:
> I thought one map only job is ok. try
> hive> explain insert overwrite table tmp partition(dt=1) select bar, foo
> from pokes;
>
>
> Thanks,
> Min
> --
> My research interests are distributed systems, parallel computing and
> bytecode based virtual machine.
>
> My profile:
> http://www.linkedin.com/in/coderplay
> My blog:
> http://coderplay.javaeye.com
>



-- 
Yours,
Zheng