You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Steven Wong <sw...@netflix.com> on 2011/07/02 01:38:58 UTC

RE: how to disable mapred.reduce.tasks

Try -1, judging from this:

<property>
  <name>mapred.reduce.tasks</name>
  <value>-1</value>
    <description>The default number of reduce tasks per job.  Typically set
  to a prime close to the number of available hosts.  Ignored when
  mapred.job.tracker is "local". Hadoop set this to 1 by default, whereas hive uses -1 as its default value.
  By setting this property to -1, Hive will automatically figure out what should be the number of reducers.
  </description>
</property>


From: Igor Tatarinov [mailto:igor@decide.com]
Sent: Wednesday, June 29, 2011 4:16 PM
To: user@hive.apache.org
Subject: how to disable mapred.reduce.tasks

I set mapred.reduce.tasks manually to have a single wave of reducers (does that make sense, by the way?)

When I save the data, I often end up with a bunch of small files because we use compression and Hive doesn't seem to merge small compressed files.

So my question is: can I disable mapred.reduce.tasks somehow and make Hive use the hive.exec.reducers.bytes.per.reducer instead to reduce the number of output files? It seems the former overrides the latter.