You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Robert Dyer <ps...@gmail.com> on 2012/07/03 22:44:25 UTC
Hadoop map task deadlocking?
I am running Hadoop 1.0.3 on a small cluster (1 namenode, 1
jobtracker, 2 compute+data nodes). My input file is a SequenceFile of
around 129MB consisting of Text keys and BytesWritable values.
This job creates 2 map tasks. The first runs to completion and exits
without any error. The second seems to be stuck in the initializing
state. If I leave it, it will never finish, time out on, error
nothing. It just runs forever and the job will never complete (in any
state!). I must manually kill the job on the cluster. Any ideas?
Attaching full Java thread dump of the 'stuck' map task:
===============================================
2012-06-28 16:35:23
Full thread dump OpenJDK 64-Bit Server VM (22.0-b10 mixed mode):
"SpillThread" daemon prio=10 tid=0x00007f1b90738800 nid=0x5445 waiting
on condition [0x00007f1b87080000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000f9cbdbc0> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1340)
"communication thread" daemon prio=10 tid=0x00007f1b906bd000
nid=0x543c runnable [0x00007f1b8717f000]
java.lang.Thread.State: RUNNABLE
at java.util.Arrays.copyOf(Arrays.java:2367)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:415)
at java.lang.StringBuilder.append(StringBuilder.java:132)
at java.net.URLStreamHandler.parseURL(URLStreamHandler.java:249)
at sun.net.www.protocol.file.Handler.parseURL(Handler.java:67)
at java.net.URL.<init>(URL.java:612)
at java.net.URL.<init>(URL.java:480)
at sun.misc.URLClassPath$FileLoader.getResource(URLClassPath.java:1035)
at sun.misc.URLClassPath$FileLoader.findResource(URLClassPath.java:1024)
at sun.misc.URLClassPath.findResource(URLClassPath.java:172)
at java.net.URLClassLoader$2.run(URLClassLoader.java:549)
at java.net.URLClassLoader$2.run(URLClassLoader.java:547)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findResource(URLClassLoader.java:546)
at java.lang.ClassLoader.getResource(ClassLoader.java:1134)
at java.net.URLClassLoader.getResourceAsStream(URLClassLoader.java:227)
at java.util.ResourceBundle$Control$1.run(ResourceBundle.java:2600)
at java.util.ResourceBundle$Control$1.run(ResourceBundle.java:2585)
at java.security.AccessController.doPrivileged(Native Method)
at java.util.ResourceBundle$Control.newBundle(ResourceBundle.java:2584)
at java.util.ResourceBundle.loadBundle(ResourceBundle.java:1436)
at java.util.ResourceBundle.findBundle(ResourceBundle.java:1400)
at java.util.ResourceBundle.findBundle(ResourceBundle.java:1354)
at java.util.ResourceBundle.findBundle(ResourceBundle.java:1354)
at java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1296)
at java.util.ResourceBundle.getBundle(ResourceBundle.java:724)
at org.apache.hadoop.mapred.Counters.getResourceBundle(Counters.java:385)
at org.apache.hadoop.mapred.Counters.access$100(Counters.java:51)
at org.apache.hadoop.mapred.Counters$Group.<init>(Counters.java:166)
at org.apache.hadoop.mapred.Counters.getGroup(Counters.java:414)
- locked <0x00000000f9d3abd0> (a org.apache.hadoop.mapred.Counters)
at org.apache.hadoop.mapred.Counters.findCounter(Counters.java:445)
- locked <0x00000000f9d3abd0> (a org.apache.hadoop.mapred.Counters)
at org.apache.hadoop.mapred.Task$FileSystemStatisticUpdater.updateCounters(Task.java:775)
at org.apache.hadoop.mapred.Task.updateCounters(Task.java:827)
- locked <0x00000000f9d22e68> (a org.apache.hadoop.mapred.MapTask)
at org.apache.hadoop.mapred.Task.access$600(Task.java:66)
at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:666)
at java.lang.Thread.run(Thread.java:722)
"Timer for 'MapTask' metrics system" daemon prio=10
tid=0x00007f1b9066e800 nid=0x543a in Object.wait()
[0x00007f1b875ad000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000000f9d02bd0> (a java.util.TaskQueue)
at java.util.TimerThread.mainLoop(Timer.java:552)
- locked <0x00000000f9d02bd0> (a java.util.TaskQueue)
at java.util.TimerThread.run(Timer.java:505)
"Thread for syncLogs" daemon prio=10 tid=0x00007f1b904d3800 nid=0x5439
waiting for monitor entry [0x00007f1b878b3000]
java.lang.Thread.State: BLOCKED (on object monitor)
at java.util.zip.ZipCoder.getBytes(ZipCoder.java:80)
at java.util.zip.ZipFile.getEntry(ZipFile.java:302)
- locked <0x00000000f9d2a040> (a java.util.jar.JarFile)
at java.util.jar.JarFile.getEntry(JarFile.java:225)
at java.util.jar.JarFile.getJarEntry(JarFile.java:208)
at sun.misc.URLClassPath$JarLoader.getResource(URLClassPath.java:817)
at sun.misc.URLClassPath$JarLoader.findResource(URLClassPath.java:795)
at sun.misc.URLClassPath.findResource(URLClassPath.java:172)
at java.net.URLClassLoader$2.run(URLClassLoader.java:549)
at java.net.URLClassLoader$2.run(URLClassLoader.java:547)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findResource(URLClassLoader.java:546)
at java.lang.ClassLoader.getResource(ClassLoader.java:1134)
at java.lang.ClassLoader.getResource(ClassLoader.java:1129)
at java.lang.ClassLoader.getSystemResource(ClassLoader.java:1256)
at java.lang.ClassLoader.getSystemResourceAsStream(ClassLoader.java:1359)
at java.lang.Class.getResourceAsStream(Class.java:2045)
at javax.xml.parsers.SecuritySupport$4.run(SecuritySupport.java:92)
at java.security.AccessController.doPrivileged(Native Method)
at javax.xml.parsers.SecuritySupport.getResourceAsStream(SecuritySupport.java:87)
at javax.xml.parsers.FactoryFinder.findJarServiceProvider(FactoryFinder.java:253)
at javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:221)
at javax.xml.parsers.DocumentBuilderFactory.newInstance(DocumentBuilderFactory.java:121)
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1135)
at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1119)
at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1063)
- locked <0x00000000f9cfcf00> (a org.apache.hadoop.mapred.JobConf)
at org.apache.hadoop.conf.Configuration.get(Configuration.java:416)
at org.apache.hadoop.conf.Configuration.getLong(Configuration.java:521)
at org.apache.hadoop.fs.FileSystem.getDefaultBlockSize(FileSystem.java:1282)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:395)
at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:778)
at org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:424)
at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:339)
- locked <0x00000000f9d7ed48> (a java.lang.Class for
org.apache.hadoop.mapred.TaskLog)
at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:385)
- locked <0x00000000f9d7ed48> (a java.lang.Class for
org.apache.hadoop.mapred.TaskLog)
at org.apache.hadoop.mapred.Child$3.run(Child.java:141)
"Service Thread" daemon prio=10 tid=0x00007f1b9010a000 nid=0x5436
runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread1" daemon prio=10 tid=0x00007f1b90108000 nid=0x5435
waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread0" daemon prio=10 tid=0x00007f1b90105000 nid=0x5434
waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Signal Dispatcher" daemon prio=10 tid=0x00007f1b90103000 nid=0x5433
runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Finalizer" daemon prio=10 tid=0x00007f1b900ac000 nid=0x5432 in
Object.wait() [0x00007f1b94fa0000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000000f9cb6a08> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
- locked <0x00000000f9cb6a08> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:177)
"Reference Handler" daemon prio=10 tid=0x00007f1b900a9800 nid=0x5431
in Object.wait() [0x00007f1b950a1000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000000f9cb63c0> (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:503)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
- locked <0x00000000f9cb63c0> (a java.lang.ref.Reference$Lock)
"main" prio=10 tid=0x00007f1b9000a000 nid=0x542b waiting on condition
[0x00007f1b9911a000]
java.lang.Thread.State: RUNNABLE
at com.google.protobuf.ByteString.copyFrom(ByteString.java:90)
at com.google.protobuf.CodedInputStream.readBytes(CodedInputStream.java:289)
at sizzle.types.Shared$Person$Builder.mergeFrom(Shared.java:460)
at sizzle.types.Shared$Person$Builder.mergeFrom(Shared.java:302)
at com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:275)
at sizzle.types.Code$Revision$Builder.mergeFrom(Code.java:1398)
at sizzle.types.Code$Revision$Builder.mergeFrom(Code.java:1084)
at com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:275)
at sizzle.types.Code$CodeRepository$Builder.mergeFrom(Code.java:436)
at sizzle.types.Code$CodeRepository$Builder.mergeFrom(Code.java:251)
at com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:275)
at sizzle.types.Toplevel$Project$Builder.mergeFrom(Toplevel.java:1763)
at sizzle.types.Toplevel$Project$Builder.mergeFrom(Toplevel.java:1263)
at com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:300)
at sizzle.types.Toplevel$Project.parseFrom(Toplevel.java:1240)
at sizzle.count_libs$count_libsSizzleMapper.map(count_libs.java:35)
at sizzle.count_libs$count_libsSizzleMapper.map(count_libs.java:25)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
"VM Thread" prio=10 tid=0x00007f1b900a1000 nid=0x5430 runnable
"GC task thread#0 (ParallelGC)" prio=10 tid=0x00007f1b90015000
nid=0x542c runnable
"GC task thread#1 (ParallelGC)" prio=10 tid=0x00007f1b90017000
nid=0x542d runnable
"GC task thread#2 (ParallelGC)" prio=10 tid=0x00007f1b90018800
nid=0x542e runnable
"GC task thread#3 (ParallelGC)" prio=10 tid=0x00007f1b9001a800
nid=0x542f runnable
"VM Periodic Task Thread" prio=10 tid=0x00007f1b90115000 nid=0x5437
waiting on condition
JNI global references: 186
Heap
PSYoungGen total 33792K, used 32512K [0x00000000fbd60000,
0x00000000fdfa0000, 0x0000000100000000)
eden space 32512K, 100% used
[0x00000000fbd60000,0x00000000fdd20000,0x00000000fdd20000)
from space 1280K, 0% used
[0x00000000fde60000,0x00000000fde60000,0x00000000fdfa0000)
to space 1280K, 0% used
[0x00000000fdd20000,0x00000000fdd20000,0x00000000fde60000)
PSOldGen total 136576K, used 116504K [0x00000000f3800000,
0x00000000fbd60000, 0x00000000fbd60000)
object space 136576K, 85% used
[0x00000000f3800000,0x00000000fa9c6190,0x00000000fbd60000)
PSPermGen total 21248K, used 14613K [0x00000000e9200000,
0x00000000ea6c0000, 0x00000000f3800000)
object space 21248K, 68% used
[0x00000000e9200000,0x00000000ea0457e0,0x00000000ea6c0000)