You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "Håvard W. Kongsgård" <nu...@niap.org> on 2006/09/20 08:24:10 UTC

Hadoop reduce task "java.lang.OutOfMemoryError: Java heap space"

When I run a fetch on my nutch 0.8 hadoop system I always get this error 
message, "java.lang.OutOfMemoryError: Java heap space".

 I have tried to set the java memory manually with export 
(JAVA_OPTS="-Xmx2000m -Xms128m") but with no effect.

OS: 2x SUSE 10.1 64-bit, AMD 3000 | 4000m and AMD X2 3800 | 4000 m

Java version:
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_07-b03)
Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_07-b03, mixed mode)

My hadoop-site.xml

<property>
  <name>fs.default.name</name>
  <value>192.168.1.208:9000</value>
  <description>
    The name of the default file system. Either the literal string
    "local" or a host:port for NDFS.
  </description>
</property>

<property>
  <name>mapred.job.tracker</name>
  <value>192.168.1.208:9001</value>
  <description>
    The host and port that the MapReduce job tracker runs at. If
    "local", then jobs are run in-process as a single map and
    reduce task.
  </description>
</property>

<property>
  <name>mapred.map.tasks</name>
  <value>9</value>
  <description>
    define mapred.map tasks to be number of slave hosts
  </description>
</property>

<property>
  <name>mapred.reduce.tasks</name>
  <value>2</value>
  <description>
    define mapred.reduce tasks to be number of slave hosts
  </description>
</property>

<property>
  <name>dfs.name.dir</name>
  <value>/home/nutch/filesystem/name</value>
</property>

<property>
  <name>dfs.data.dir</name>
  <value>/home/nutch/filesystem/data</value>
</property>

<property>
  <name>mapred.system.dir</name>
  <value>/home/nutch/filesystem/mapreduce/system</value>
</property>

<property>
  <name>mapred.local.dir</name>
  <value>/home/nutch/filesystem/mapreduce/local</value>
</property>

<property>
  <name>dfs.replication</name>
  <value>2</value>
</property>

<property>
  <name>mapred.tasktracker.tasks.maximum</name>
  <value>1</value>
  <description>The maximum number of tasks that will be run
  simultaneously by a task tracker.
  </description>
</property>

<property>
  <name>mapred.child.java.opts</name>
  <value>-Xmx200m</value>
  <description>Java opts for the task tracker child processes.  Subsumes
  'mapred.child.heap.size' (If a mapred.child.heap.size value is found
  in a configuration, its maximum heap size will be used and a warning
  emitted that heap.size has been deprecated). Also, the following symbols,
  if present, will be interpolated: @taskid@ is replaced by current TaskID;
  and @port@ will be replaced by mapred.task.tracker.report.port + 1 (A 
second
  child will fail with a port-in-use if mapred.tasktracker.tasks.maximum is
  greater than one). Any other occurrences of '@' will go unchanged. For
  example, to enable verbose gc logging to a file named for the taskid in
  /tmp and to set the heap maximum to be a gigabyte, pass a 'value' of:

        -Xmx1024m -verbose:gc -Xloggc:/tmp/@taskid@.gc
  </description>
</property>

<property>
  <name>mapred.task.timeout</name>
  <value>6000000</value>
  <description>The number of milliseconds before a task will be
  terminated if it neither reads an input, writes an output, nor
  updates its status string.
  </description>
</property>