You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Markus Jelsma <ma...@openindex.io> on 2011/12/22 15:12:09 UTC
Tasks timeout followed by OOM

Hi.

I'm testing Apache Nutch on Hadoop 0.22.0 and migrated from 0.20.203. Many 
more tasks fail for unknown reasons (they timeout)  while they didn't on the 
other cluster that was much less high-end. I assume some of these timeouts 
have something to do with bad configurations.

Anyway, just now i had two tasks failing due to a time out. When they got 
respawned they new attempt died immediately with a heap space error:

2011-12-22 13:51:47,939 WARN org.apache.hadoop.conf.Configuration: session.id 
is deprecated. Instead, use dfs.metrics.session-id
2011-12-22 13:51:47,939 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Cannot 
initialize JVM Metrics with processName=MAP, sessionId= - already initialized
2011-12-22 13:51:48,108 INFO org.apache.hadoop.mapred.Task:  Using 
ResourceCalculatorPlugin : 
org.apache.hadoop.mapreduce.util.LinuxResourceCalculatorPlugin@6997f7f4
2011-12-22 13:51:48,360 WARN 
org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 4687 may 
have finished in the interim.
2011-12-22 13:51:48,360 WARN 
org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 4688 may 
have finished in the interim.
2011-12-22 13:51:48,395 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 
18
2011-12-22 13:51:48,784 FATAL org.apache.hadoop.mapred.Child: Error running 
child : java.lang.OutOfMemoryError: Java heap space
	at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:802)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:381)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:223)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1153)
	at org.apache.hadoop.mapred.Child.main(Child.java:217)

Other mappers also time out but these were kind of strange. I suspect there 
may be an issue with JVM reusage as tasks don't seem to fail when started 
fresh. A memleak in Nutch is highly unlikely, we reused instances over 128 
times on our other cluster and happily ran the application for many days 
straight on.

Any thoughts or pointers here?
Thanks

-- 
Markus Jelsma - CTO - Openindex