You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "Håvard W. Kongsgård" <nu...@niap.org> on 2006/09/20 08:24:10 UTC
Hadoop reduce task "java.lang.OutOfMemoryError: Java heap space"
When I run a fetch on my nutch 0.8 hadoop system I always get this error
message, "java.lang.OutOfMemoryError: Java heap space".
I have tried to set the java memory manually with export
(JAVA_OPTS="-Xmx2000m -Xms128m") but with no effect.
OS: 2x SUSE 10.1 64-bit, AMD 3000 | 4000m and AMD X2 3800 | 4000 m
Java version:
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_07-b03)
Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_07-b03, mixed mode)
My hadoop-site.xml
<property>
<name>fs.default.name</name>
<value>192.168.1.208:9000</value>
<description>
The name of the default file system. Either the literal string
"local" or a host:port for NDFS.
</description>
</property>
<property>
<name>mapred.job.tracker</name>
<value>192.168.1.208:9001</value>
<description>
The host and port that the MapReduce job tracker runs at. If
"local", then jobs are run in-process as a single map and
reduce task.
</description>
</property>
<property>
<name>mapred.map.tasks</name>
<value>9</value>
<description>
define mapred.map tasks to be number of slave hosts
</description>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>2</value>
<description>
define mapred.reduce tasks to be number of slave hosts
</description>
</property>
<property>
<name>dfs.name.dir</name>
<value>/home/nutch/filesystem/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/nutch/filesystem/data</value>
</property>
<property>
<name>mapred.system.dir</name>
<value>/home/nutch/filesystem/mapreduce/system</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/home/nutch/filesystem/mapreduce/local</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>mapred.tasktracker.tasks.maximum</name>
<value>1</value>
<description>The maximum number of tasks that will be run
simultaneously by a task tracker.
</description>
</property>
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx200m</value>
<description>Java opts for the task tracker child processes. Subsumes
'mapred.child.heap.size' (If a mapred.child.heap.size value is found
in a configuration, its maximum heap size will be used and a warning
emitted that heap.size has been deprecated). Also, the following symbols,
if present, will be interpolated: @taskid@ is replaced by current TaskID;
and @port@ will be replaced by mapred.task.tracker.report.port + 1 (A
second
child will fail with a port-in-use if mapred.tasktracker.tasks.maximum is
greater than one). Any other occurrences of '@' will go unchanged. For
example, to enable verbose gc logging to a file named for the taskid in
/tmp and to set the heap maximum to be a gigabyte, pass a 'value' of:
-Xmx1024m -verbose:gc -Xloggc:/tmp/@taskid@.gc
</description>
</property>
<property>
<name>mapred.task.timeout</name>
<value>6000000</value>
<description>The number of milliseconds before a task will be
terminated if it neither reads an input, writes an output, nor
updates its status string.
</description>
</property>