You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Robert Dyer <ps...@gmail.com> on 2012/07/03 22:44:25 UTC

Hadoop map task deadlocking?

I am running Hadoop 1.0.3 on a small cluster (1 namenode, 1
jobtracker, 2 compute+data nodes).  My input file is a SequenceFile of
around 129MB consisting of Text keys and BytesWritable values.

This job creates 2 map tasks.  The first runs to completion and exits
without any error.  The second seems to be stuck in the initializing
state.  If I leave it, it will never finish, time out on, error
nothing.  It just runs forever and the job will never complete (in any
state!).  I must manually kill the job on the cluster.  Any ideas?

Attaching full Java thread dump of the 'stuck' map task:

===============================================

2012-06-28 16:35:23
Full thread dump OpenJDK 64-Bit Server VM (22.0-b10 mixed mode):

"SpillThread" daemon prio=10 tid=0x00007f1b90738800 nid=0x5445 waiting
on condition [0x00007f1b87080000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000000f9cbdbc0> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1340)

"communication thread" daemon prio=10 tid=0x00007f1b906bd000
nid=0x543c runnable [0x00007f1b8717f000]
   java.lang.Thread.State: RUNNABLE
	at java.util.Arrays.copyOf(Arrays.java:2367)
	at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
	at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
	at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:415)
	at java.lang.StringBuilder.append(StringBuilder.java:132)
	at java.net.URLStreamHandler.parseURL(URLStreamHandler.java:249)
	at sun.net.www.protocol.file.Handler.parseURL(Handler.java:67)
	at java.net.URL.<init>(URL.java:612)
	at java.net.URL.<init>(URL.java:480)
	at sun.misc.URLClassPath$FileLoader.getResource(URLClassPath.java:1035)
	at sun.misc.URLClassPath$FileLoader.findResource(URLClassPath.java:1024)
	at sun.misc.URLClassPath.findResource(URLClassPath.java:172)
	at java.net.URLClassLoader$2.run(URLClassLoader.java:549)
	at java.net.URLClassLoader$2.run(URLClassLoader.java:547)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findResource(URLClassLoader.java:546)
	at java.lang.ClassLoader.getResource(ClassLoader.java:1134)
	at java.net.URLClassLoader.getResourceAsStream(URLClassLoader.java:227)
	at java.util.ResourceBundle$Control$1.run(ResourceBundle.java:2600)
	at java.util.ResourceBundle$Control$1.run(ResourceBundle.java:2585)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.util.ResourceBundle$Control.newBundle(ResourceBundle.java:2584)
	at java.util.ResourceBundle.loadBundle(ResourceBundle.java:1436)
	at java.util.ResourceBundle.findBundle(ResourceBundle.java:1400)
	at java.util.ResourceBundle.findBundle(ResourceBundle.java:1354)
	at java.util.ResourceBundle.findBundle(ResourceBundle.java:1354)
	at java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1296)
	at java.util.ResourceBundle.getBundle(ResourceBundle.java:724)
	at org.apache.hadoop.mapred.Counters.getResourceBundle(Counters.java:385)
	at org.apache.hadoop.mapred.Counters.access$100(Counters.java:51)
	at org.apache.hadoop.mapred.Counters$Group.<init>(Counters.java:166)
	at org.apache.hadoop.mapred.Counters.getGroup(Counters.java:414)
	- locked <0x00000000f9d3abd0> (a org.apache.hadoop.mapred.Counters)
	at org.apache.hadoop.mapred.Counters.findCounter(Counters.java:445)
	- locked <0x00000000f9d3abd0> (a org.apache.hadoop.mapred.Counters)
	at org.apache.hadoop.mapred.Task$FileSystemStatisticUpdater.updateCounters(Task.java:775)
	at org.apache.hadoop.mapred.Task.updateCounters(Task.java:827)
	- locked <0x00000000f9d22e68> (a org.apache.hadoop.mapred.MapTask)
	at org.apache.hadoop.mapred.Task.access$600(Task.java:66)
	at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:666)
	at java.lang.Thread.run(Thread.java:722)

"Timer for 'MapTask' metrics system" daemon prio=10
tid=0x00007f1b9066e800 nid=0x543a in Object.wait()
[0x00007f1b875ad000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <0x00000000f9d02bd0> (a java.util.TaskQueue)
	at java.util.TimerThread.mainLoop(Timer.java:552)
	- locked <0x00000000f9d02bd0> (a java.util.TaskQueue)
	at java.util.TimerThread.run(Timer.java:505)

"Thread for syncLogs" daemon prio=10 tid=0x00007f1b904d3800 nid=0x5439
waiting for monitor entry [0x00007f1b878b3000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at java.util.zip.ZipCoder.getBytes(ZipCoder.java:80)
	at java.util.zip.ZipFile.getEntry(ZipFile.java:302)
	- locked <0x00000000f9d2a040> (a java.util.jar.JarFile)
	at java.util.jar.JarFile.getEntry(JarFile.java:225)
	at java.util.jar.JarFile.getJarEntry(JarFile.java:208)
	at sun.misc.URLClassPath$JarLoader.getResource(URLClassPath.java:817)
	at sun.misc.URLClassPath$JarLoader.findResource(URLClassPath.java:795)
	at sun.misc.URLClassPath.findResource(URLClassPath.java:172)
	at java.net.URLClassLoader$2.run(URLClassLoader.java:549)
	at java.net.URLClassLoader$2.run(URLClassLoader.java:547)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findResource(URLClassLoader.java:546)
	at java.lang.ClassLoader.getResource(ClassLoader.java:1134)
	at java.lang.ClassLoader.getResource(ClassLoader.java:1129)
	at java.lang.ClassLoader.getSystemResource(ClassLoader.java:1256)
	at java.lang.ClassLoader.getSystemResourceAsStream(ClassLoader.java:1359)
	at java.lang.Class.getResourceAsStream(Class.java:2045)
	at javax.xml.parsers.SecuritySupport$4.run(SecuritySupport.java:92)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.xml.parsers.SecuritySupport.getResourceAsStream(SecuritySupport.java:87)
	at javax.xml.parsers.FactoryFinder.findJarServiceProvider(FactoryFinder.java:253)
	at javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:221)
	at javax.xml.parsers.DocumentBuilderFactory.newInstance(DocumentBuilderFactory.java:121)
	at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1135)
	at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1119)
	at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1063)
	- locked <0x00000000f9cfcf00> (a org.apache.hadoop.mapred.JobConf)
	at org.apache.hadoop.conf.Configuration.get(Configuration.java:416)
	at org.apache.hadoop.conf.Configuration.getLong(Configuration.java:521)
	at org.apache.hadoop.fs.FileSystem.getDefaultBlockSize(FileSystem.java:1282)
	at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:395)
	at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:778)
	at org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:424)
	at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:339)
	- locked <0x00000000f9d7ed48> (a java.lang.Class for
org.apache.hadoop.mapred.TaskLog)
	at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:385)
	- locked <0x00000000f9d7ed48> (a java.lang.Class for
org.apache.hadoop.mapred.TaskLog)
	at org.apache.hadoop.mapred.Child$3.run(Child.java:141)

"Service Thread" daemon prio=10 tid=0x00007f1b9010a000 nid=0x5436
runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread1" daemon prio=10 tid=0x00007f1b90108000 nid=0x5435
waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" daemon prio=10 tid=0x00007f1b90105000 nid=0x5434
waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=10 tid=0x00007f1b90103000 nid=0x5433
runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Finalizer" daemon prio=10 tid=0x00007f1b900ac000 nid=0x5432 in
Object.wait() [0x00007f1b94fa0000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <0x00000000f9cb6a08> (a java.lang.ref.ReferenceQueue$Lock)
	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
	- locked <0x00000000f9cb6a08> (a java.lang.ref.ReferenceQueue$Lock)
	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
	at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:177)

"Reference Handler" daemon prio=10 tid=0x00007f1b900a9800 nid=0x5431
in Object.wait() [0x00007f1b950a1000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <0x00000000f9cb63c0> (a java.lang.ref.Reference$Lock)
	at java.lang.Object.wait(Object.java:503)
	at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
	- locked <0x00000000f9cb63c0> (a java.lang.ref.Reference$Lock)

"main" prio=10 tid=0x00007f1b9000a000 nid=0x542b waiting on condition
[0x00007f1b9911a000]
   java.lang.Thread.State: RUNNABLE
	at com.google.protobuf.ByteString.copyFrom(ByteString.java:90)
	at com.google.protobuf.CodedInputStream.readBytes(CodedInputStream.java:289)
	at sizzle.types.Shared$Person$Builder.mergeFrom(Shared.java:460)
	at sizzle.types.Shared$Person$Builder.mergeFrom(Shared.java:302)
	at com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:275)
	at sizzle.types.Code$Revision$Builder.mergeFrom(Code.java:1398)
	at sizzle.types.Code$Revision$Builder.mergeFrom(Code.java:1084)
	at com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:275)
	at sizzle.types.Code$CodeRepository$Builder.mergeFrom(Code.java:436)
	at sizzle.types.Code$CodeRepository$Builder.mergeFrom(Code.java:251)
	at com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:275)
	at sizzle.types.Toplevel$Project$Builder.mergeFrom(Toplevel.java:1763)
	at sizzle.types.Toplevel$Project$Builder.mergeFrom(Toplevel.java:1263)
	at com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:300)
	at sizzle.types.Toplevel$Project.parseFrom(Toplevel.java:1240)
	at sizzle.count_libs$count_libsSizzleMapper.map(count_libs.java:35)
	at sizzle.count_libs$count_libsSizzleMapper.map(count_libs.java:25)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
	at org.apache.hadoop.mapred.Child.main(Child.java:249)

"VM Thread" prio=10 tid=0x00007f1b900a1000 nid=0x5430 runnable

"GC task thread#0 (ParallelGC)" prio=10 tid=0x00007f1b90015000
nid=0x542c runnable

"GC task thread#1 (ParallelGC)" prio=10 tid=0x00007f1b90017000
nid=0x542d runnable

"GC task thread#2 (ParallelGC)" prio=10 tid=0x00007f1b90018800
nid=0x542e runnable

"GC task thread#3 (ParallelGC)" prio=10 tid=0x00007f1b9001a800
nid=0x542f runnable

"VM Periodic Task Thread" prio=10 tid=0x00007f1b90115000 nid=0x5437
waiting on condition

JNI global references: 186

Heap
 PSYoungGen      total 33792K, used 32512K [0x00000000fbd60000,
0x00000000fdfa0000, 0x0000000100000000)
  eden space 32512K, 100% used
[0x00000000fbd60000,0x00000000fdd20000,0x00000000fdd20000)
  from space 1280K, 0% used
[0x00000000fde60000,0x00000000fde60000,0x00000000fdfa0000)
  to   space 1280K, 0% used
[0x00000000fdd20000,0x00000000fdd20000,0x00000000fde60000)
 PSOldGen        total 136576K, used 116504K [0x00000000f3800000,
0x00000000fbd60000, 0x00000000fbd60000)
  object space 136576K, 85% used
[0x00000000f3800000,0x00000000fa9c6190,0x00000000fbd60000)
 PSPermGen       total 21248K, used 14613K [0x00000000e9200000,
0x00000000ea6c0000, 0x00000000f3800000)
  object space 21248K, 68% used
[0x00000000e9200000,0x00000000ea0457e0,0x00000000ea6c0000)