You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Markus Jelsma <ma...@openindex.io> on 2011/12/22 15:12:09 UTC
Tasks timeout followed by OOM
Hi.
I'm testing Apache Nutch on Hadoop 0.22.0 and migrated from 0.20.203. Many
more tasks fail for unknown reasons (they timeout) while they didn't on the
other cluster that was much less high-end. I assume some of these timeouts
have something to do with bad configurations.
Anyway, just now i had two tasks failing due to a time out. When they got
respawned they new attempt died immediately with a heap space error:
2011-12-22 13:51:47,939 WARN org.apache.hadoop.conf.Configuration: session.id
is deprecated. Instead, use dfs.metrics.session-id
2011-12-22 13:51:47,939 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Cannot
initialize JVM Metrics with processName=MAP, sessionId= - already initialized
2011-12-22 13:51:48,108 INFO org.apache.hadoop.mapred.Task: Using
ResourceCalculatorPlugin :
org.apache.hadoop.mapreduce.util.LinuxResourceCalculatorPlugin@6997f7f4
2011-12-22 13:51:48,360 WARN
org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 4687 may
have finished in the interim.
2011-12-22 13:51:48,360 WARN
org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 4688 may
have finished in the interim.
2011-12-22 13:51:48,395 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks:
18
2011-12-22 13:51:48,784 FATAL org.apache.hadoop.mapred.Child: Error running
child : java.lang.OutOfMemoryError: Java heap space
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:802)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:381)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
at org.apache.hadoop.mapred.Child$4.run(Child.java:223)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1153)
at org.apache.hadoop.mapred.Child.main(Child.java:217)
Other mappers also time out but these were kind of strange. I suspect there
may be an issue with JVM reusage as tasks don't seem to fail when started
fresh. A memleak in Nutch is highly unlikely, we reused instances over 128
times on our other cluster and happily ran the application for many days
straight on.
Any thoughts or pointers here?
Thanks
--
Markus Jelsma - CTO - Openindex