You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Vladimir Egorov <vl...@oracle.com> on 2012/04/20 20:50:27 UTC

after 2 weeks TaskTracker gets hung with 100% CPU consumption

Hi,

After around 2 weeks a TestTracker (TT) in our MR cluster gets hung with 
100% CPU consumption. Most of the times no new tasks are sent to the 
node. We start getting more job failure in the cluster when this 
happens. Once we restart the TT the node is fine for around another two 
weeks.

We also noticed that after restart some other TT in the cluster starts 
having the same behavior. This continues till all the TTs have been 
restarted. Another solution is to restart the MR cluster.

A thread dump is posted below. It looks like TT is busy with some log 
cleanup. We also noticed that when we restart, sometimes TT fails to 
start because tobedeleted directory cannot be deleted. We have to delete 
it manually, and then TT starts normally.

Has anyone seen this and is there a resolution or workaround.

Thank you,
Vladimir

Full thread dump Java HotSpot(TM) 64-Bit Server VM (19.0-b09 mixed mode):

"Thread-97182" daemon prio=10 tid=0x00002aaab8a7f000 nid=0x1c7d runnable 
[0x0000000040508000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=0000000040508000>] 

    java.lang.Thread.State: RUNNABLE
     at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:232)
     at java.lang.StringCoding.encode(StringCoding.java:272)
     at java.lang.String.getBytes(String.java:946)
     at java.io.UnixFileSystem.list(Native Method)
     at java.io.File.list(File.java:973)
     at java.io.File.listFiles(File.java:1051)
     at org.apache.hadoop.fs.FileUtil.fullyDeleteContents(FileUtil.java:96)
     at org.apache.hadoop.fs.FileUtil.fullyDelete(FileUtil.java:84)
     at 
org.apache.hadoop.fs.FileUtil.fullyDeleteContents(FileUtil.java:115)
     at org.apache.hadoop.fs.FileUtil.fullyDelete(FileUtil.java:84)
     at 
org.apache.hadoop.fs.RawLocalFileSystem.delete(RawLocalFileSystem.java:293)
     at 
org.apache.hadoop.fs.ChecksumFileSystem.delete(ChecksumFileSystem.java:466)
     at 
org.apache.hadoop.mapreduce.util.MRAsyncDiskService$DeleteTask.run(MRAsyncDiskService.java:199)
     at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
     at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
     at java.lang.Thread.run(Thread.java:662)

"Thread-97171" daemon prio=10 tid=0x00002aaab8a81000 nid=0x1bde waiting 
for monitor entry [0x000000004030 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=000000004030>a000] 

    java.lang.Thread.State: BLOCKED (on object monitor)
     at 
org.apache.hadoop.mapred.TaskTracker.getTaskTrackerReportAddress(TaskTracker.java:1351)
     - waiting to lock<0x00000000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c185f690>  
(a org.apache.hadoop.mapred.TaskTracker)
     at org.apache.hadoop.mapred.TaskRunner.getVMArgs(TaskRunner.java:477)
     at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:210)

"Thread-6" daemon prio=10 tid=0x00002aaab443e800 nid=0x2a98 runnable 
[0x0000000043047000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=0000000043047000>] 

    java.lang.Thread.State: RUNNABLE
     at java.lang.String.substring(String.java:1939)
     at java.lang.String.substring(String.java:1904)
     at java.io.File.getName(File.java:401)
     at 
java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:229)
     at java.io.File.exists(File.java:733)
     at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:420)
     at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:964)
     at 
org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:430)
     at 
org.apache.hadoop.mapreduce.util.MRAsyncDiskService.moveAndDeleteRelativePath(MRAsyncDiskService.java:244)
     at 
org.apache.hadoop.mapreduce.util.MRAsyncDiskService.moveAndDeleteAbsolutePath(MRAsyncDiskService.java:361)
     at 
org.apache.hadoop.mapred.UserLogCleaner.deleteLogPath(UserLogCleaner.java:200)
     at 
org.apache.hadoop.mapred.UserLogCleaner.processCompletedJobs(UserLogCleaner.java:103)
     - locked<0x00000000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c18b0200>  
(a java.util.Collections$SynchronizedMap)
     at org.apache.hadoop.mapred.UserLogCleaner.run(UserLogCleaner.java:83)

"Directory/File cleanup thread" daemon prio=10 tid=0x00002aaab443c800 
nid=0x2a97 waiting on condition [0x0000000042 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=0000000042>f46000] 

    java.lang.Thread.State: WAITING (parking)
     at sun.misc.Unsafe.park(Native Method)
     - parking to wait for<0x00000000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c18c9b98>  
(a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
     at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
     at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
     at 
org.apache.hadoop.mapred.CleanupQueue$PathCleanupThread.run(CleanupQueue.java:130)

"taskCleanup" daemon prio=10 tid=0x00002aaab443c000 nid=0x2a96 waiting 
for monitor entry [0x0000000042 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=0000000042>e45000] 

    java.lang.Thread.State: BLOCKED (on object monitor)
     at 
org.apache.hadoop.mapred.TaskTracker.purgeJob(TaskTracker.java:1892)
     - waiting to lock<0x00000000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c18afb88>  
(a java.util.TreeMap)
     - locked<0x00000000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c185f690>  
(a org.apache.hadoop.mapred.TaskTracker)
     at org.apache.hadoop.mapred.TaskTracker$1.run(TaskTracker.java:398)
     at java.lang.Thread.run(Thread.java:662)

"TaskLauncher for REDUCE tasks" daemon prio=10 tid=0x00002aaab4438800 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=4438800> 
nid=0x2a95 in Object.wait() [0x0000000042 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=0000000042>c43000] 

    java.lang.Thread.State: WAITING (on object monitor)
     at java.lang.Object.wait(Native Method)
     - waiting on<0x00000000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c185f660>  
(a java.util.LinkedList)
     at java.lang.Object.wait(Object.java:485)
     at 
org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:2157)
     - locked<0x00000000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c185f660>  
(a java.util.LinkedList)

"TaskLauncher for MAP tasks" daemon prio=10 tid=0x00002aaab4431800 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=4431800> 
nid=0x2a94 waiting on condition [0x0000000042 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=0000000042>d43000] 

    java.lang.Thread.State: RUNNABLE
     at java.util.HashMap.newKeyIterator(HashMap.java:840)
     at java.util.HashMap$KeySet.iterator(HashMap.java:874)
     at java.util.HashSet.iterator(HashSet.java:153)
     at 
java.util.AbstractCollection.containsAll(AbstractCollection.java:276)
     at java.util.AbstractSet.equals(AbstractSet.java:78)
     at java.util.Collections$SynchronizedSet.equals(Collections.java:1655)
     - locked<0x00000000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>ed5f12c0>  
(a java.util.Collections$SynchronizedSet)
     at javax.security.auth.Subject.equals(Subject.java:773)
     at 
org.apache.hadoop.security.UserGroupInformation.equals(UserGroupInformation.java:698)
     at 
org.apache.hadoop.fs.FileSystem$Cache$Key.isEqual(FileSystem.java:1878)
     at 
org.apache.hadoop.fs.FileSystem$Cache$Key.equals(FileSystem.java:1888)
     at java.util.HashMap.put(HashMap.java:376)
     at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1781)
     - locked<0x00000000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c18537b8>  
(a org.apache.hadoop.fs.FileSystem$Cache)
     at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1750)
     at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:234)
     at org.apache.hadoop.fs.Path.getFileSystem(Path.java:189)
     at org.apache.hadoop.mapred.TaskTracker$3.run(TaskTracker.java:1006)
     at org.apache.hadoop.mapred.TaskTracker$3.run(TaskTracker.java:1004)
     at java.security.AccessController.doPrivileged(Native Method)
     at javax.security.auth.Subject.doAs(Subject.java:396)
     at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:742)
     at org.apache.hadoop.mapred.TaskTracker.getFS(TaskTracker.java:1003)
     at 
org.apache.hadoop.mapred.TaskTracker.localizeJobConfFile(TaskTracker.java:1098)
     at 
org.apache.hadoop.mapred.TaskTracker.localizeJobFiles(TaskTracker.java:1048)
     at 
org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:977)
     - locked<0x00000000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>ed5e8d08>  
(a org.apache.hadoop.mapred.TaskTracker$RunningJob)
     at 
org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2247)
     at 
org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:2212)

"Map-events fetcher for all reduce tasks on 
tracker_adc00bzu.us.oracle.com:localhost.localdomain/127.0.0.1:43784" 
daemon prio=10 tid=0x00002aaab4411800 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=4411800> 
nid=0x2a8a waiting for monitor entry [0x0000000042 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=0000000042>b42000] 

    java.lang.Thread.State: BLOCKED (on object monitor)
     at 
org.apache.hadoop.mapred.TaskTracker$MapEventsFetcherThread.reducesInShuffle(TaskTracker.java:799)
     - waiting to lock<0x00000000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>ed5e8d08>  
(a org.apache.hadoop.mapred.TaskTracker$RunningJob)
     at 
org.apache.hadoop.mapred.TaskTracker$MapEventsFetcherThread.run(TaskTracker.java:834)
     - locked<0x00000000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c18afb88>  
(a java.util.TreeMap)

"Thread-14" prio=10 tid=0x00002aaab440d000 nid=0x2a88 waiting on 
condition [0x0000000042940000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=0000000042940000>] 

    java.lang.Thread.State: TIMED_WAITING (sleeping)
     at java.lang.Thread.sleep(Native Method)
     at 
org.apache.hadoop.mapreduce.filecache.TrackerDistributedCacheManager$CleanupThread.run(TrackerDistributedCacheManager.java:892)

"IPC Server handler 3 on 43784" daemon prio=10 tid=0x00002aaab440b000 
nid=0x2a87 waiting on condition [0x000000004283 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=000000004283>f000] 

    java.lang.Thread.State: WAITING (parking)
     at sun.misc.Unsafe.park(Native Method)
     - parking to wait for<0x00000000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c1874508 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=1874508>>  
(a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
     at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
     at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1326)

"IPC Server handler 2 on 43784" daemon prio=10 tid=0x00002aaab4409000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=4409000> 
nid=0x2a86 waiting on condition [0x000000004273 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=000000004273>e000] 

    java.lang.Thread.State: WAITING (parking)
     at sun.misc.Unsafe.park(Native Method)
     - parking to wait for<0x00000000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c1874508 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=1874508>>  
(a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
     at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
     at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1326)

"IPC Server handler 1 on 43784" daemon prio=10 tid=0x00002aaab43eb800 
nid=0x2a85 waiting on condition [0x000000004263 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=000000004263>d000] 

    java.lang.Thread.State: WAITING (parking)
     at sun.misc.Unsafe.park(Native Method)
     - parking to wait for<0x00000000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c1874508 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=1874508>>  
(a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
     at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
     at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1326)

"IPC Server handler 0 on 43784" daemon prio=10 tid=0x00002aaab43ea800 
nid=0x2a84 waiting on condition [0x000000004253 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=000000004253>c000] 

    java.lang.Thread.State: WAITING (parking)
     at sun.misc.Unsafe.park(Native Method)
     - parking to wait for<0x00000000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c1874508 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=1874508>>  
(a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
     at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
     at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1326)

"IPC Server listener on 43784" daemon prio=10 tid=0x00002aaab437c000 
nid=0x2a83 runnable [0x000000004243 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=000000004243>b000] 

    java.lang.Thread.State: RUNNABLE
     at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
     at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
     at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
     at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
     - locked<0x00000000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c18758d8>  
(a sun.nio.ch.Util$2)
     - locked<0x00000000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c18758c8>  
(a java.util.Collections$UnmodifiableSet)
     - locked<0x00000000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c18754a8>  
(a sun.nio.ch.EPollSelectorImpl)
     at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
     at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:84)
     at org.apache.hadoop.ipc.Server$Listener.run(Server.java:426)

"IPC Server Responder" daemon prio=10 tid=0x00002aaab42b6000 nid=0x2a82 
runnable [0x000000004233 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=000000004233>a000] 

    java.lang.Thread.State: RUNNABLE
     at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
     at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
     at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
     at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
     - locked<0x00000000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c1876418 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=1876418>>  
(a sun.nio.ch.Util$2)
     - locked<0x00000000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c1876408 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=1876408>>  
(a java.util.Collections$UnmodifiableSet)
     - locked<0x00000000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c18761f0>  
(a sun.nio.ch.EPollSelectorImpl)
     at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
     at org.apache.hadoop.ipc.Server$Responder.run(Server.java:593)

"pool-3-thread-1" prio=10 tid=0x00002aaab4289000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=4289000> 
nid=0x2a81 runnable [0x0000000042239000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=0000000042239000>] 

    java.lang.Thread.State: RUNNABLE
     at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
     at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
     at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
     at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
     - locked<0x00000000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c1875078 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=1875078>>  
(a sun.nio.ch.Util$2)
     - locked<0x00000000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c1875068 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=1875068>>  
(a java.util.Collections$UnmodifiableSet)
     - locked<0x00000000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c1874e40>  
(a sun.nio.ch.EPollSelectorImpl)
     at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
     at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:84)
     at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:321)
     - locked<0x00000000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c1875ae8>  
(a org.apache.hadoop.ipc.Server$Listener$Reader)
     at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
     at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
     at java.lang.Thread.run(Thread.java:662)

"pool-2-thread-1" prio=10 tid=0x00002aaab4382000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=4382000> 
nid=0x2a80 runnable [0x0000000042138000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=0000000042138000>] 

    java.lang.Thread.State: RUNNABLE
     at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:39)
     at java.nio.ByteBuffer.allocate(ByteBuffer.java:312)
     at java.nio.charset.CharsetEncoder.encode(CharsetEncoder.java:760)
     at org.apache.hadoop.io.Text.encode(Text.java:396)
     at org.apache.hadoop.io.Text.set(Text.java:186)
     at org.apache.hadoop.io.Text.<init>(Text.java:89)
     at 
org.apache.hadoop.mapred.TIETaskTrackerInst$NodeInfoCollector.run(TIETaskTrackerInst.java:88)
     at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
     at 
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
     at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
     at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
     at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
     at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
     at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
     at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
     at java.lang.Thread.run(Thread.java:662)

"pool-1-thread-1" prio=10 tid=0x00002aaab4281800 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=4281800> 
nid=0x2a7f waiting on condition [0x0000000040 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=0000000040>fd7000] 

    java.lang.Thread.State: TIMED_WAITING (parking)
     at sun.misc.Unsafe.park(Native Method)
     - parking to wait for<0x00000000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c1867008 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=1867008>>  
(a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
     at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
     at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025)
     at java.util.concurrent.DelayQueue.take(DelayQueue.java:164)
     at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:609)
     at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:602)
     at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
     at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
     at java.lang.Thread.run(Thread.java:662)

"Timer-0" daemon prio=10 tid=0x00002aaab42a0000 nid=0x2a7d in 
Object.wait() [0x0000000040 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=0000000040>b6c000] 

    java.lang.Thread.State: TIMED_WAITING (on object monitor)
     at java.lang.Object.wait(Native Method)
     - waiting on<0x00000000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c187e998>  
(a java.util.TaskQueue)
     at java.util.TimerThread.mainLoop(Timer.java:509)
     - locked<0x00000000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c187e998>  
(a java.util.TaskQueue)
     at java.util.TimerThread.run(Timer.java:462)

"738807903@qtp0-0 - Acceptor0 
SelectChannelConnector@0.0.0.0:50060"prio=10 tid=0x00002aaab4293800 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=4293800> 
nid=0x2a7c runnable [0x0000000042037000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=0000000042037000>] 

    java.lang.Thread.State: RUNNABLE
     at java.util.HashMap.newKeyIterator(HashMap.java:840)
     at java.util.HashMap$KeySet.iterator(HashMap.java:874)
     at java.util.HashSet.iterator(HashSet.java:153)
     at 
sun.nio.ch.SelectorImpl.processDeregisterQueue(SelectorImpl.java:127)
     - locked<0x00000000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c18ae5c8>  
(a java.util.HashSet)
     at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:69)
     at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
     - locked<0x00000000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c18aea00>  
(a sun.nio.ch.Util$2)
     - locked<0x00000000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c18ae9f0>  
(a java.util.Collections$UnmodifiableSet)
     - locked<0x00000000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c18ae570>  
(a sun.nio.ch.EPollSelectorImpl)
     at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
     at 
org.mortbay.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:429)
     at 
org.mortbay.io.nio.SelectorManager.doSelect(SelectorManager.java:185)
     at 
org.mortbay.jetty.nio.SelectChannelConnector.accept(SelectChannelConnector.java:124)
     at 
org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:707)
     at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)

"Low Memory Detector" daemon prio=10 tid=0x000000005 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=000000005>d0d8800 
nid=0x2a79 runnable [0x0000000000000000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=0000000000000000>] 

    java.lang.Thread.State: RUNNABLE

"CompilerThread1" daemon prio=10 tid=0x000000005 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=000000005>d0d6800 
nid=0x2a78 waiting on condition [0x0000000000000000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=0000000000000000>] 

    java.lang.Thread.State: RUNNABLE

"CompilerThread0" daemon prio=10 tid=0x000000005 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=000000005>d0d0800 
nid=0x2a77 waiting on condition [0x0000000000000000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=0000000000000000>] 

    java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=10 tid=0x000000005 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=000000005>d0ce800 
nid=0x2a76 runnable [0x0000000000000000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=0000000000000000>] 

    java.lang.Thread.State: RUNNABLE

"Finalizer" daemon prio=10 tid=0x000000005 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=000000005>d0aa800 
nid=0x2a75 in Object.wait() [0x0000000041 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=0000000041>a5b000] 

    java.lang.Thread.State: WAITING (on object monitor)
     at java.lang.Object.wait(Native Method)
     - waiting on<0x00000000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c18ca210>  
(a java.lang.ref.ReferenceQueue$Lock)
     at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
     - locked<0x00000000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c18ca210>  
(a java.lang.ref.ReferenceQueue$Lock)
     at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
     at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

"Reference Handler" daemon prio=10 tid=0x000000005 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=000000005>d0a8800 
nid=0x2a74 in Object.wait() [0x000000004195 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=000000004195>a000] 

    java.lang.Thread.State: WAITING (on object monitor)
     at java.lang.Object.wait(Native Method)
     - waiting on<0x00000000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c18000b0>  
(a java.lang.ref.Reference$Lock)
     at java.lang.Object.wait(Object.java:485)
     at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
     - locked<0x00000000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c18000b0>  
(a java.lang.ref.Reference$Lock)

"main" prio=10 tid=0x000000005 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=000000005>d04a800 
nid=0x2a70 waiting for monitor entry [0x0000000040666000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=0000000040666000>] 

    java.lang.Thread.State: BLOCKED (on object monitor)
     at 
org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:1533)
     - waiting to lock<0x00000000 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c185f690>  
(a org.apache.hadoop.mapred.TaskTracker)
     at 
org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1432)
     at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2329)
     at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3461)

"VM Thread" prio=10 tid=0x000000005 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=000000005>d0a4000 
nid=0x2a73 runnable

"GC task thread#0 (ParallelGC)" prio=10 tid=0x000000005 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=000000005>d05d800 
nid=0x2a71 runnable

"GC task thread#1 (ParallelGC)" prio=10 tid=0x000000005 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=000000005>d05f800 
nid=0x2a72 runnable

"VM Periodic Task Thread" prio=10 tid=0x000000005 
<https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=000000005>d0e3000 
nid=0x2a7a waiting on condition

JNI global references: 1519



Re: after 2 weeks TaskTracker gets hung with 100% CPU consumption

Posted by Vladimir Egorov <vl...@oracle.com>.
0.21.0

Sorry for not missing this.

Vladimir

On 4/22/2012 1:07 AM, Arun C Murthy wrote:
> What version of Hadoop are you running?
>
> On Apr 21, 2012, at 12:20 AM, Vladimir Egorov wrote:
>
>> Hi,
>>
>> After around 2 weeks a TestTracker (TT) in our MR cluster gets hung 
>> with 100% CPU consumption. Most of the times no new tasks are sent to 
>> the node. We start getting more job failure in the cluster when this 
>> happens. Once we restart the TT the node is fine for around another 
>> two weeks.
>>
>> We also noticed that after restart some other TT in the cluster 
>> starts having the same behavior. This continues till all the TTs have 
>> been restarted. Another solution is to restart the MR cluster.
>>
>> A thread dump is posted below. It looks like TT is busy with some log 
>> cleanup. We also noticed that when we restart, sometimes TT fails to 
>> start because tobedeleted directory cannot be deleted. We have to 
>> delete it manually, and then TT starts normally.
>>
>> Has anyone seen this and is there a resolution or workaround.
>>
>> Thank you,
>> Vladimir
>>
>> Full thread dump Java HotSpot(TM) 64-Bit Server VM (19.0-b09 mixed 
>> mode):
>>
>> "Thread-97182" daemon prio=10 tid=0x00002aaab8a7f000 nid=0x1c7d 
>> runnable [0x0000000040508000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=0000000040508000>] 
>>
>>    java.lang.Thread.State: RUNNABLE
>>     at 
>> java.lang.StringCoding$StringEncoder.encode(StringCoding.java:232)
>>     at java.lang.StringCoding.encode(StringCoding.java:272)
>>     at java.lang.String.getBytes(String.java:946)
>>     at java.io.UnixFileSystem.list(Native Method)
>>     at java.io.File.list(File.java:973)
>>     at java.io.File.listFiles(File.java:1051)
>>     at 
>> org.apache.hadoop.fs.FileUtil.fullyDeleteContents(FileUtil.java:96)
>>     at org.apache.hadoop.fs.FileUtil.fullyDelete(FileUtil.java:84)
>>     at 
>> org.apache.hadoop.fs.FileUtil.fullyDeleteContents(FileUtil.java:115)
>>     at org.apache.hadoop.fs.FileUtil.fullyDelete(FileUtil.java:84)
>>     at 
>> org.apache.hadoop.fs.RawLocalFileSystem.delete(RawLocalFileSystem.java:293)
>>     at 
>> org.apache.hadoop.fs.ChecksumFileSystem.delete(ChecksumFileSystem.java:466)
>>     at 
>> org.apache.hadoop.mapreduce.util.MRAsyncDiskService$DeleteTask.run(MRAsyncDiskService.java:199)
>>     at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>     at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>     at java.lang.Thread.run(Thread.java:662)
>>
>> "Thread-97171" daemon prio=10 tid=0x00002aaab8a81000 nid=0x1bde 
>> waiting for monitor entry [0x000000004030 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=000000004030>a000] 
>>
>>    java.lang.Thread.State: BLOCKED (on object monitor)
>>     at 
>> org.apache.hadoop.mapred.TaskTracker.getTaskTrackerReportAddress(TaskTracker.java:1351)
>>     - waiting to lock<0x00000000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c185f690>  
>> (a org.apache.hadoop.mapred.TaskTracker)
>>     at 
>> org.apache.hadoop.mapred.TaskRunner.getVMArgs(TaskRunner.java:477)
>>     at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:210)
>>
>> "Thread-6" daemon prio=10 tid=0x00002aaab443e800 nid=0x2a98 runnable 
>> [0x0000000043047000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=0000000043047000>] 
>>
>>    java.lang.Thread.State: RUNNABLE
>>     at java.lang.String.substring(String.java:1939)
>>     at java.lang.String.substring(String.java:1904)
>>     at java.io.File.getName(File.java:401)
>>     at 
>> java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:229)
>>     at java.io.File.exists(File.java:733)
>>     at 
>> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:420)
>>     at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:964)
>>     at 
>> org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:430)
>>     at 
>> org.apache.hadoop.mapreduce.util.MRAsyncDiskService.moveAndDeleteRelativePath(MRAsyncDiskService.java:244)
>>     at 
>> org.apache.hadoop.mapreduce.util.MRAsyncDiskService.moveAndDeleteAbsolutePath(MRAsyncDiskService.java:361)
>>     at 
>> org.apache.hadoop.mapred.UserLogCleaner.deleteLogPath(UserLogCleaner.java:200)
>>     at 
>> org.apache.hadoop.mapred.UserLogCleaner.processCompletedJobs(UserLogCleaner.java:103)
>>     - locked<0x00000000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c18b0200>  
>> (a java.util.Collections$SynchronizedMap)
>>     at 
>> org.apache.hadoop.mapred.UserLogCleaner.run(UserLogCleaner.java:83)
>>
>> "Directory/File cleanup thread" daemon prio=10 tid=0x00002aaab443c800 
>> nid=0x2a97 waiting on condition [0x0000000042 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=0000000042>f46000] 
>>
>>    java.lang.Thread.State: WAITING (parking)
>>     at sun.misc.Unsafe.park(Native Method)
>>     - parking to wait for<0x00000000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c18c9b98>  
>> (a 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>>     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>>     at 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
>>     at 
>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
>>     at 
>> org.apache.hadoop.mapred.CleanupQueue$PathCleanupThread.run(CleanupQueue.java:130)
>>
>> "taskCleanup" daemon prio=10 tid=0x00002aaab443c000 nid=0x2a96 
>> waiting for monitor entry [0x0000000042 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=0000000042>e45000] 
>>
>>    java.lang.Thread.State: BLOCKED (on object monitor)
>>     at 
>> org.apache.hadoop.mapred.TaskTracker.purgeJob(TaskTracker.java:1892)
>>     - waiting to lock<0x00000000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c18afb88>  
>> (a java.util.TreeMap)
>>     - locked<0x00000000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c185f690>  
>> (a org.apache.hadoop.mapred.TaskTracker)
>>     at org.apache.hadoop.mapred.TaskTracker$1.run(TaskTracker.java:398)
>>     at java.lang.Thread.run(Thread.java:662)
>>
>> "TaskLauncher for REDUCE tasks" daemon prio=10 tid=0x00002aaab4438800 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=4438800> 
>> nid=0x2a95 in Object.wait() [0x0000000042 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=0000000042>c43000] 
>>
>>    java.lang.Thread.State: WAITING (on object monitor)
>>     at java.lang.Object.wait(Native Method)
>>     - waiting on<0x00000000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c185f660>  
>> (a java.util.LinkedList)
>>     at java.lang.Object.wait(Object.java:485)
>>     at 
>> org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:2157)
>>     - locked<0x00000000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c185f660>  
>> (a java.util.LinkedList)
>>
>> "TaskLauncher for MAP tasks" daemon prio=10 tid=0x00002aaab4431800 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=4431800> 
>> nid=0x2a94 waiting on condition [0x0000000042 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=0000000042>d43000] 
>>
>>    java.lang.Thread.State: RUNNABLE
>>     at java.util.HashMap.newKeyIterator(HashMap.java:840)
>>     at java.util.HashMap$KeySet.iterator(HashMap.java:874)
>>     at java.util.HashSet.iterator(HashSet.java:153)
>>     at 
>> java.util.AbstractCollection.containsAll(AbstractCollection.java:276)
>>     at java.util.AbstractSet.equals(AbstractSet.java:78)
>>     at 
>> java.util.Collections$SynchronizedSet.equals(Collections.java:1655)
>>     - locked<0x00000000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>ed5f12c0>  
>> (a java.util.Collections$SynchronizedSet)
>>     at javax.security.auth.Subject.equals(Subject.java:773)
>>     at 
>> org.apache.hadoop.security.UserGroupInformation.equals(UserGroupInformation.java:698)
>>     at 
>> org.apache.hadoop.fs.FileSystem$Cache$Key.isEqual(FileSystem.java:1878)
>>     at 
>> org.apache.hadoop.fs.FileSystem$Cache$Key.equals(FileSystem.java:1888)
>>     at java.util.HashMap.put(HashMap.java:376)
>>     at 
>> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1781)
>>     - locked<0x00000000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c18537b8>  
>> (a org.apache.hadoop.fs.FileSystem$Cache)
>>     at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1750)
>>     at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:234)
>>     at org.apache.hadoop.fs.Path.getFileSystem(Path.java:189)
>>     at org.apache.hadoop.mapred.TaskTracker$3.run(TaskTracker.java:1006)
>>     at org.apache.hadoop.mapred.TaskTracker$3.run(TaskTracker.java:1004)
>>     at java.security.AccessController.doPrivileged(Native Method)
>>     at javax.security.auth.Subject.doAs(Subject.java:396)
>>     at 
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:742)
>>     at org.apache.hadoop.mapred.TaskTracker.getFS(TaskTracker.java:1003)
>>     at 
>> org.apache.hadoop.mapred.TaskTracker.localizeJobConfFile(TaskTracker.java:1098)
>>     at 
>> org.apache.hadoop.mapred.TaskTracker.localizeJobFiles(TaskTracker.java:1048)
>>     at 
>> org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:977)
>>     - locked<0x00000000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>ed5e8d08>  
>> (a org.apache.hadoop.mapred.TaskTracker$RunningJob)
>>     at 
>> org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2247)
>>     at 
>> org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:2212)
>>
>> "Map-events fetcher for all reduce tasks on 
>> tracker_adc00bzu.us.oracle.com:localhost.localdomain/127.0.0.1:43784" 
>> daemon prio=10 tid=0x00002aaab4411800 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=4411800> 
>> nid=0x2a8a waiting for monitor entry [0x0000000042 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=0000000042>b42000] 
>>
>>    java.lang.Thread.State: BLOCKED (on object monitor)
>>     at 
>> org.apache.hadoop.mapred.TaskTracker$MapEventsFetcherThread.reducesInShuffle(TaskTracker.java:799)
>>     - waiting to lock<0x00000000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>ed5e8d08>  
>> (a org.apache.hadoop.mapred.TaskTracker$RunningJob)
>>     at 
>> org.apache.hadoop.mapred.TaskTracker$MapEventsFetcherThread.run(TaskTracker.java:834)
>>     - locked<0x00000000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c18afb88>  
>> (a java.util.TreeMap)
>>
>> "Thread-14" prio=10 tid=0x00002aaab440d000 nid=0x2a88 waiting on 
>> condition [0x0000000042940000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=0000000042940000>] 
>>
>>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>>     at java.lang.Thread.sleep(Native Method)
>>     at 
>> org.apache.hadoop.mapreduce.filecache.TrackerDistributedCacheManager$CleanupThread.run(TrackerDistributedCacheManager.java:892)
>>
>> "IPC Server handler 3 on 43784" daemon prio=10 tid=0x00002aaab440b000 
>> nid=0x2a87 waiting on condition [0x000000004283 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=000000004283>f000] 
>>
>>    java.lang.Thread.State: WAITING (parking)
>>     at sun.misc.Unsafe.park(Native Method)
>>     - parking to wait for<0x00000000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c1874508 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=1874508>>  
>> (a 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>>     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>>     at 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
>>     at 
>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
>>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1326)
>>
>> "IPC Server handler 2 on 43784" daemon prio=10 tid=0x00002aaab4409000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=4409000> 
>> nid=0x2a86 waiting on condition [0x000000004273 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=000000004273>e000] 
>>
>>    java.lang.Thread.State: WAITING (parking)
>>     at sun.misc.Unsafe.park(Native Method)
>>     - parking to wait for<0x00000000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c1874508 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=1874508>>  
>> (a 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>>     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>>     at 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
>>     at 
>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
>>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1326)
>>
>> "IPC Server handler 1 on 43784" daemon prio=10 tid=0x00002aaab43eb800 
>> nid=0x2a85 waiting on condition [0x000000004263 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=000000004263>d000] 
>>
>>    java.lang.Thread.State: WAITING (parking)
>>     at sun.misc.Unsafe.park(Native Method)
>>     - parking to wait for<0x00000000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c1874508 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=1874508>>  
>> (a 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>>     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>>     at 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
>>     at 
>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
>>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1326)
>>
>> "IPC Server handler 0 on 43784" daemon prio=10 tid=0x00002aaab43ea800 
>> nid=0x2a84 waiting on condition [0x000000004253 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=000000004253>c000] 
>>
>>    java.lang.Thread.State: WAITING (parking)
>>     at sun.misc.Unsafe.park(Native Method)
>>     - parking to wait for<0x00000000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c1874508 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=1874508>>  
>> (a 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>>     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>>     at 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
>>     at 
>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
>>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1326)
>>
>> "IPC Server listener on 43784" daemon prio=10 tid=0x00002aaab437c000 
>> nid=0x2a83 runnable [0x000000004243 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=000000004243>b000] 
>>
>>    java.lang.Thread.State: RUNNABLE
>>     at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>>     at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
>>     at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
>>     at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
>>     - locked<0x00000000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c18758d8>  
>> (a sun.nio.ch.Util$2)
>>     - locked<0x00000000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c18758c8>  
>> (a java.util.Collections$UnmodifiableSet)
>>     - locked<0x00000000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c18754a8>  
>> (a sun.nio.ch.EPollSelectorImpl)
>>     at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
>>     at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:84)
>>     at org.apache.hadoop.ipc.Server$Listener.run(Server.java:426)
>>
>> "IPC Server Responder" daemon prio=10 tid=0x00002aaab42b6000 
>> nid=0x2a82 runnable [0x000000004233 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=000000004233>a000] 
>>
>>    java.lang.Thread.State: RUNNABLE
>>     at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>>     at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
>>     at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
>>     at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
>>     - locked<0x00000000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c1876418 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=1876418>>  
>> (a sun.nio.ch.Util$2)
>>     - locked<0x00000000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c1876408 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=1876408>>  
>> (a java.util.Collections$UnmodifiableSet)
>>     - locked<0x00000000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c18761f0>  
>> (a sun.nio.ch.EPollSelectorImpl)
>>     at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
>>     at org.apache.hadoop.ipc.Server$Responder.run(Server.java:593)
>>
>> "pool-3-thread-1" prio=10 tid=0x00002aaab4289000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=4289000> 
>> nid=0x2a81 runnable [0x0000000042239000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=0000000042239000>] 
>>
>>    java.lang.Thread.State: RUNNABLE
>>     at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>>     at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
>>     at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
>>     at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
>>     - locked<0x00000000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c1875078 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=1875078>>  
>> (a sun.nio.ch.Util$2)
>>     - locked<0x00000000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c1875068 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=1875068>>  
>> (a java.util.Collections$UnmodifiableSet)
>>     - locked<0x00000000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c1874e40>  
>> (a sun.nio.ch.EPollSelectorImpl)
>>     at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
>>     at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:84)
>>     at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:321)
>>     - locked<0x00000000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c1875ae8>  
>> (a org.apache.hadoop.ipc.Server$Listener$Reader)
>>     at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>     at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>     at java.lang.Thread.run(Thread.java:662)
>>
>> "pool-2-thread-1" prio=10 tid=0x00002aaab4382000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=4382000> 
>> nid=0x2a80 runnable [0x0000000042138000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=0000000042138000>] 
>>
>>    java.lang.Thread.State: RUNNABLE
>>     at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:39)
>>     at java.nio.ByteBuffer.allocate(ByteBuffer.java:312)
>>     at java.nio.charset.CharsetEncoder.encode(CharsetEncoder.java:760)
>>     at org.apache.hadoop.io.Text.encode(Text.java:396)
>>     at org.apache.hadoop.io.Text.set(Text.java:186)
>>     at org.apache.hadoop.io.Text.<init>(Text.java:89)
>>     at 
>> org.apache.hadoop.mapred.TIETaskTrackerInst$NodeInfoCollector.run(TIETaskTrackerInst.java:88)
>>     at 
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>>     at 
>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>>     at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>>     at 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>>     at 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>>     at 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>>     at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>     at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>     at java.lang.Thread.run(Thread.java:662)
>>
>> "pool-1-thread-1" prio=10 tid=0x00002aaab4281800 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=4281800> 
>> nid=0x2a7f waiting on condition [0x0000000040 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=0000000040>fd7000] 
>>
>>    java.lang.Thread.State: TIMED_WAITING (parking)
>>     at sun.misc.Unsafe.park(Native Method)
>>     - parking to wait for<0x00000000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c1867008 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=1867008>>  
>> (a 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>>     at 
>> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
>>     at 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025)
>>     at java.util.concurrent.DelayQueue.take(DelayQueue.java:164)
>>     at 
>> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:609)
>>     at 
>> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:602)
>>     at 
>> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
>>     at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
>>     at java.lang.Thread.run(Thread.java:662)
>>
>> "Timer-0" daemon prio=10 tid=0x00002aaab42a0000 nid=0x2a7d in 
>> Object.wait() [0x0000000040 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=0000000040>b6c000] 
>>
>>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>>     at java.lang.Object.wait(Native Method)
>>     - waiting on<0x00000000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c187e998>  
>> (a java.util.TaskQueue)
>>     at java.util.TimerThread.mainLoop(Timer.java:509)
>>     - locked<0x00000000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c187e998>  
>> (a java.util.TaskQueue)
>>     at java.util.TimerThread.run(Timer.java:462)
>>
>> "738807903@qtp0-0 - Acceptor0 
>> SelectChannelConnector@0.0.0.0:50060"prio=10 tid=0x00002aaab4293800 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=4293800> 
>> nid=0x2a7c runnable [0x0000000042037000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=0000000042037000>] 
>>
>>    java.lang.Thread.State: RUNNABLE
>>     at java.util.HashMap.newKeyIterator(HashMap.java:840)
>>     at java.util.HashMap$KeySet.iterator(HashMap.java:874)
>>     at java.util.HashSet.iterator(HashSet.java:153)
>>     at 
>> sun.nio.ch.SelectorImpl.processDeregisterQueue(SelectorImpl.java:127)
>>     - locked<0x00000000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c18ae5c8>  
>> (a java.util.HashSet)
>>     at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:69)
>>     at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
>>     - locked<0x00000000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c18aea00>  
>> (a sun.nio.ch.Util$2)
>>     - locked<0x00000000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c18ae9f0>  
>> (a java.util.Collections$UnmodifiableSet)
>>     - locked<0x00000000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c18ae570>  
>> (a sun.nio.ch.EPollSelectorImpl)
>>     at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
>>     at 
>> org.mortbay.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:429)
>>     at 
>> org.mortbay.io.nio.SelectorManager.doSelect(SelectorManager.java:185)
>>     at 
>> org.mortbay.jetty.nio.SelectChannelConnector.accept(SelectChannelConnector.java:124)
>>     at 
>> org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:707)
>>     at 
>> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
>>
>> "Low Memory Detector" daemon prio=10 tid=0x000000005 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=000000005>d0d8800 
>> nid=0x2a79 runnable [0x0000000000000000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=0000000000000000>] 
>>
>>    java.lang.Thread.State: RUNNABLE
>>
>> "CompilerThread1" daemon prio=10 tid=0x000000005 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=000000005>d0d6800 
>> nid=0x2a78 waiting on condition [0x0000000000000000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=0000000000000000>] 
>>
>>    java.lang.Thread.State: RUNNABLE
>>
>> "CompilerThread0" daemon prio=10 tid=0x000000005 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=000000005>d0d0800 
>> nid=0x2a77 waiting on condition [0x0000000000000000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=0000000000000000>] 
>>
>>    java.lang.Thread.State: RUNNABLE
>>
>> "Signal Dispatcher" daemon prio=10 tid=0x000000005 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=000000005>d0ce800 
>> nid=0x2a76 runnable [0x0000000000000000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=0000000000000000>] 
>>
>>    java.lang.Thread.State: RUNNABLE
>>
>> "Finalizer" daemon prio=10 tid=0x000000005 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=000000005>d0aa800 
>> nid=0x2a75 in Object.wait() [0x0000000041 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=0000000041>a5b000] 
>>
>>    java.lang.Thread.State: WAITING (on object monitor)
>>     at java.lang.Object.wait(Native Method)
>>     - waiting on<0x00000000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c18ca210>  
>> (a java.lang.ref.ReferenceQueue$Lock)
>>     at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
>>     - locked<0x00000000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c18ca210>  
>> (a java.lang.ref.ReferenceQueue$Lock)
>>     at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
>>     at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
>>
>> "Reference Handler" daemon prio=10 tid=0x000000005 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=000000005>d0a8800 
>> nid=0x2a74 in Object.wait() [0x000000004195 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=000000004195>a000] 
>>
>>    java.lang.Thread.State: WAITING (on object monitor)
>>     at java.lang.Object.wait(Native Method)
>>     - waiting on<0x00000000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c18000b0>  
>> (a java.lang.ref.Reference$Lock)
>>     at java.lang.Object.wait(Object.java:485)
>>     at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
>>     - locked<0x00000000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c18000b0>  
>> (a java.lang.ref.Reference$Lock)
>>
>> "main" prio=10 tid=0x000000005 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=000000005>d04a800 
>> nid=0x2a70 waiting for monitor entry [0x0000000040666000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=0000000040666000>] 
>>
>>    java.lang.Thread.State: BLOCKED (on object monitor)
>>     at 
>> org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:1533)
>>     - waiting to lock<0x00000000 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=00000000>c185f690>  
>> (a org.apache.hadoop.mapred.TaskTracker)
>>     at 
>> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1432)
>>     at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2329)
>>     at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3461)
>>
>> "VM Thread" prio=10 tid=0x000000005 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=000000005>d0a4000 
>> nid=0x2a73 runnable
>>
>> "GC task thread#0 (ParallelGC)" prio=10 tid=0x000000005 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=000000005>d05d800 
>> nid=0x2a71 runnable
>>
>> "GC task thread#1 (ParallelGC)" prio=10 tid=0x000000005 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=000000005>d05f800 
>> nid=0x2a72 runnable
>>
>> "VM Periodic Task Thread" prio=10 tid=0x000000005 
>> <https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=000000005>d0e3000 
>> nid=0x2a7a waiting on condition
>>
>> JNI global references: 1519
>>
>>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>

Re: after 2 weeks TaskTracker gets hung with 100% CPU consumption

Posted by Arun C Murthy <ac...@hortonworks.com>.
What version of Hadoop are you running?

On Apr 21, 2012, at 12:20 AM, Vladimir Egorov wrote:

> Hi, 
> 
> After around 2 weeks a TestTracker (TT) in our MR cluster gets hung with 100% CPU consumption. Most of the times no new tasks are sent to the node. We start getting more job failure in the cluster when this happens. Once we restart the TT the node is fine for around another two weeks. 
> 
> We also noticed that after restart some other TT in the cluster starts having the same behavior. This continues till all the TTs have been restarted. Another solution is to restart the MR cluster. 
> 
> A thread dump is posted below. It looks like TT is busy with some log cleanup. We also noticed that when we restart, sometimes TT fails to start because tobedeleted directory cannot be deleted. We have to delete it manually, and then TT starts normally. 
> 
> Has anyone seen this and is there a resolution or workaround. 
> 
> Thank you, 
> Vladimir 
> 
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (19.0-b09 mixed mode): 
> 
> "Thread-97182" daemon prio=10 tid=0x00002aaab8a7f000 nid=0x1c7d runnable [0x0000000040508000] 
>    java.lang.Thread.State: RUNNABLE 
>     at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:232) 
>     at java.lang.StringCoding.encode(StringCoding.java:272) 
>     at java.lang.String.getBytes(String.java:946) 
>     at java.io.UnixFileSystem.list(Native Method) 
>     at java.io.File.list(File.java:973) 
>     at java.io.File.listFiles(File.java:1051) 
>     at org.apache.hadoop.fs.FileUtil.fullyDeleteContents(FileUtil.java:96) 
>     at org.apache.hadoop.fs.FileUtil.fullyDelete(FileUtil.java:84) 
>     at org.apache.hadoop.fs.FileUtil.fullyDeleteContents(FileUtil.java:115) 
>     at org.apache.hadoop.fs.FileUtil.fullyDelete(FileUtil.java:84) 
>     at org.apache.hadoop.fs.RawLocalFileSystem.delete(RawLocalFileSystem.java:293)
>     at org.apache.hadoop.fs.ChecksumFileSystem.delete(ChecksumFileSystem.java:466)
>     at org.apache.hadoop.mapreduce.util.MRAsyncDiskService$DeleteTask.run(MRAsyncDiskService.java:199)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>     at java.lang.Thread.run(Thread.java:662) 
> 
> "Thread-97171" daemon prio=10 tid=0x00002aaab8a81000 nid=0x1bde waiting for monitor entry [0x000000004030a000] 
>    java.lang.Thread.State: BLOCKED (on object monitor) 
>     at org.apache.hadoop.mapred.TaskTracker.getTaskTrackerReportAddress(TaskTracker.java:1351)
>     - waiting to lock<0x00000000c185f690>  (a org.apache.hadoop.mapred.TaskTracker) 
>     at org.apache.hadoop.mapred.TaskRunner.getVMArgs(TaskRunner.java:477) 
>     at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:210) 
> 
> "Thread-6" daemon prio=10 tid=0x00002aaab443e800 nid=0x2a98 runnable [0x0000000043047000] 
>    java.lang.Thread.State: RUNNABLE 
>     at java.lang.String.substring(String.java:1939) 
>     at java.lang.String.substring(String.java:1904) 
>     at java.io.File.getName(File.java:401) 
>     at java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:229) 
>     at java.io.File.exists(File.java:733) 
>     at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:420)
>     at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:964) 
>     at org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:430)
>     at org.apache.hadoop.mapreduce.util.MRAsyncDiskService.moveAndDeleteRelativePath(MRAsyncDiskService.java:244)
>     at org.apache.hadoop.mapreduce.util.MRAsyncDiskService.moveAndDeleteAbsolutePath(MRAsyncDiskService.java:361)
>     at org.apache.hadoop.mapred.UserLogCleaner.deleteLogPath(UserLogCleaner.java:200)
>     at org.apache.hadoop.mapred.UserLogCleaner.processCompletedJobs(UserLogCleaner.java:103)
>     - locked<0x00000000c18b0200>  (a java.util.Collections$SynchronizedMap) 
>     at org.apache.hadoop.mapred.UserLogCleaner.run(UserLogCleaner.java:83) 
> 
> "Directory/File cleanup thread" daemon prio=10 tid=0x00002aaab443c800 nid=0x2a97 waiting on condition [0x0000000042f46000] 
>    java.lang.Thread.State: WAITING (parking) 
>     at sun.misc.Unsafe.park(Native Method) 
>     - parking to wait for<0x00000000c18c9b98>  (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 
>     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) 
>     at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
>     at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
>     at org.apache.hadoop.mapred.CleanupQueue$PathCleanupThread.run(CleanupQueue.java:130)
> 
> "taskCleanup" daemon prio=10 tid=0x00002aaab443c000 nid=0x2a96 waiting for monitor entry [0x0000000042e45000] 
>    java.lang.Thread.State: BLOCKED (on object monitor) 
>     at org.apache.hadoop.mapred.TaskTracker.purgeJob(TaskTracker.java:1892) 
>     - waiting to lock<0x00000000c18afb88>  (a java.util.TreeMap) 
>     - locked<0x00000000c185f690>  (a org.apache.hadoop.mapred.TaskTracker) 
>     at org.apache.hadoop.mapred.TaskTracker$1.run(TaskTracker.java:398) 
>     at java.lang.Thread.run(Thread.java:662) 
> 
> "TaskLauncher for REDUCE tasks" daemon prio=10 tid=0x00002aaab4438800 nid=0x2a95 in Object.wait() [0x0000000042c43000] 
>    java.lang.Thread.State: WAITING (on object monitor) 
>     at java.lang.Object.wait(Native Method) 
>     - waiting on<0x00000000c185f660>  (a java.util.LinkedList) 
>     at java.lang.Object.wait(Object.java:485) 
>     at org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:2157)
>     - locked<0x00000000c185f660>  (a java.util.LinkedList) 
> 
> "TaskLauncher for MAP tasks" daemon prio=10 tid=0x00002aaab4431800 nid=0x2a94 waiting on condition [0x0000000042d43000] 
>    java.lang.Thread.State: RUNNABLE 
>     at java.util.HashMap.newKeyIterator(HashMap.java:840) 
>     at java.util.HashMap$KeySet.iterator(HashMap.java:874) 
>     at java.util.HashSet.iterator(HashSet.java:153) 
>     at java.util.AbstractCollection.containsAll(AbstractCollection.java:276) 
>     at java.util.AbstractSet.equals(AbstractSet.java:78) 
>     at java.util.Collections$SynchronizedSet.equals(Collections.java:1655) 
>     - locked<0x00000000ed5f12c0>  (a java.util.Collections$SynchronizedSet) 
>     at javax.security.auth.Subject.equals(Subject.java:773) 
>     at org.apache.hadoop.security.UserGroupInformation.equals(UserGroupInformation.java:698)
>     at org.apache.hadoop.fs.FileSystem$Cache$Key.isEqual(FileSystem.java:1878) 
>     at org.apache.hadoop.fs.FileSystem$Cache$Key.equals(FileSystem.java:1888) 
>     at java.util.HashMap.put(HashMap.java:376) 
>     at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1781) 
>     - locked<0x00000000c18537b8>  (a org.apache.hadoop.fs.FileSystem$Cache) 
>     at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1750) 
>     at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:234) 
>     at org.apache.hadoop.fs.Path.getFileSystem(Path.java:189) 
>     at org.apache.hadoop.mapred.TaskTracker$3.run(TaskTracker.java:1006) 
>     at org.apache.hadoop.mapred.TaskTracker$3.run(TaskTracker.java:1004) 
>     at java.security.AccessController.doPrivileged(Native Method) 
>     at javax.security.auth.Subject.doAs(Subject.java:396) 
>     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:742)
>     at org.apache.hadoop.mapred.TaskTracker.getFS(TaskTracker.java:1003) 
>     at org.apache.hadoop.mapred.TaskTracker.localizeJobConfFile(TaskTracker.java:1098)
>     at org.apache.hadoop.mapred.TaskTracker.localizeJobFiles(TaskTracker.java:1048)
>     at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:977) 
>     - locked<0x00000000ed5e8d08>  (a org.apache.hadoop.mapred.TaskTracker$RunningJob) 
>     at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2247) 
>     at org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:2212)
> 
> "Map-events fetcher for all reduce tasks on tracker_adc00bzu.us.oracle.com:localhost.localdomain/127.0.0.1:43784" daemon prio=10 tid=0x00002aaab4411800 nid=0x2a8a waiting for monitor entry [0x0000000042b42000] 
>    java.lang.Thread.State: BLOCKED (on object monitor) 
>     at org.apache.hadoop.mapred.TaskTracker$MapEventsFetcherThread.reducesInShuffle(TaskTracker.java:799)
>     - waiting to lock<0x00000000ed5e8d08>  (a org.apache.hadoop.mapred.TaskTracker$RunningJob) 
>     at org.apache.hadoop.mapred.TaskTracker$MapEventsFetcherThread.run(TaskTracker.java:834)
>     - locked<0x00000000c18afb88>  (a java.util.TreeMap) 
> 
> "Thread-14" prio=10 tid=0x00002aaab440d000 nid=0x2a88 waiting on condition [0x0000000042940000] 
>    java.lang.Thread.State: TIMED_WAITING (sleeping) 
>     at java.lang.Thread.sleep(Native Method) 
>     at org.apache.hadoop.mapreduce.filecache.TrackerDistributedCacheManager$CleanupThread.run(TrackerDistributedCacheManager.java:892)
> 
> "IPC Server handler 3 on 43784" daemon prio=10 tid=0x00002aaab440b000 nid=0x2a87 waiting on condition [0x000000004283f000] 
>    java.lang.Thread.State: WAITING (parking) 
>     at sun.misc.Unsafe.park(Native Method) 
>     - parking to wait for<0x00000000c1874508>  (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 
>     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) 
>     at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
>     at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1326) 
> 
> "IPC Server handler 2 on 43784" daemon prio=10 tid=0x00002aaab4409000 nid=0x2a86 waiting on condition [0x000000004273e000] 
>    java.lang.Thread.State: WAITING (parking) 
>     at sun.misc.Unsafe.park(Native Method) 
>     - parking to wait for<0x00000000c1874508>  (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 
>     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) 
>     at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
>     at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1326) 
> 
> "IPC Server handler 1 on 43784" daemon prio=10 tid=0x00002aaab43eb800 nid=0x2a85 waiting on condition [0x000000004263d000] 
>    java.lang.Thread.State: WAITING (parking) 
>     at sun.misc.Unsafe.park(Native Method) 
>     - parking to wait for<0x00000000c1874508>  (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 
>     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) 
>     at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
>     at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1326) 
> 
> "IPC Server handler 0 on 43784" daemon prio=10 tid=0x00002aaab43ea800 nid=0x2a84 waiting on condition [0x000000004253c000] 
>    java.lang.Thread.State: WAITING (parking) 
>     at sun.misc.Unsafe.park(Native Method) 
>     - parking to wait for<0x00000000c1874508>  (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 
>     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) 
>     at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
>     at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1326) 
> 
> "IPC Server listener on 43784" daemon prio=10 tid=0x00002aaab437c000 nid=0x2a83 runnable [0x000000004243b000] 
>    java.lang.Thread.State: RUNNABLE 
>     at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) 
>     at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) 
>     at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) 
>     at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) 
>     - locked<0x00000000c18758d8>  (a sun.nio.ch.Util$2) 
>     - locked<0x00000000c18758c8>  (a java.util.Collections$UnmodifiableSet) 
>     - locked<0x00000000c18754a8>  (a sun.nio.ch.EPollSelectorImpl) 
>     at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) 
>     at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:84) 
>     at org.apache.hadoop.ipc.Server$Listener.run(Server.java:426) 
> 
> "IPC Server Responder" daemon prio=10 tid=0x00002aaab42b6000 nid=0x2a82 runnable [0x000000004233a000] 
>    java.lang.Thread.State: RUNNABLE 
>     at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) 
>     at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) 
>     at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) 
>     at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) 
>     - locked<0x00000000c1876418>  (a sun.nio.ch.Util$2) 
>     - locked<0x00000000c1876408>  (a java.util.Collections$UnmodifiableSet) 
>     - locked<0x00000000c18761f0>  (a sun.nio.ch.EPollSelectorImpl) 
>     at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) 
>     at org.apache.hadoop.ipc.Server$Responder.run(Server.java:593) 
> 
> "pool-3-thread-1" prio=10 tid=0x00002aaab4289000 nid=0x2a81 runnable [0x0000000042239000] 
>    java.lang.Thread.State: RUNNABLE 
>     at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) 
>     at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) 
>     at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) 
>     at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) 
>     - locked<0x00000000c1875078>  (a sun.nio.ch.Util$2) 
>     - locked<0x00000000c1875068>  (a java.util.Collections$UnmodifiableSet) 
>     - locked<0x00000000c1874e40>  (a sun.nio.ch.EPollSelectorImpl) 
>     at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) 
>     at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:84) 
>     at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:321) 
>     - locked<0x00000000c1875ae8>  (a org.apache.hadoop.ipc.Server$Listener$Reader) 
>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>     at java.lang.Thread.run(Thread.java:662) 
> 
> "pool-2-thread-1" prio=10 tid=0x00002aaab4382000 nid=0x2a80 runnable [0x0000000042138000] 
>    java.lang.Thread.State: RUNNABLE 
>     at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:39) 
>     at java.nio.ByteBuffer.allocate(ByteBuffer.java:312) 
>     at java.nio.charset.CharsetEncoder.encode(CharsetEncoder.java:760) 
>     at org.apache.hadoop.io.Text.encode(Text.java:396) 
>     at org.apache.hadoop.io.Text.set(Text.java:186) 
>     at org.apache.hadoop.io.Text.<init>(Text.java:89) 
>     at org.apache.hadoop.mapred.TIETaskTrackerInst$NodeInfoCollector.run(TIETaskTrackerInst.java:88)
>     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) 
>     at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>     at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) 
>     at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>     at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>     at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>     at java.lang.Thread.run(Thread.java:662) 
> 
> "pool-1-thread-1" prio=10 tid=0x00002aaab4281800 nid=0x2a7f waiting on condition [0x0000000040fd7000] 
>    java.lang.Thread.State: TIMED_WAITING (parking) 
>     at sun.misc.Unsafe.park(Native Method) 
>     - parking to wait for<0x00000000c1867008>  (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 
>     at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198) 
>     at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025)
>     at java.util.concurrent.DelayQueue.take(DelayQueue.java:164) 
>     at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:609)
>     at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:602)
>     at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
>     at java.lang.Thread.run(Thread.java:662) 
> 
> "Timer-0" daemon prio=10 tid=0x00002aaab42a0000 nid=0x2a7d in Object.wait() [0x0000000040b6c000] 
>    java.lang.Thread.State: TIMED_WAITING (on object monitor) 
>     at java.lang.Object.wait(Native Method) 
>     - waiting on<0x00000000c187e998>  (a java.util.TaskQueue) 
>     at java.util.TimerThread.mainLoop(Timer.java:509) 
>     - locked<0x00000000c187e998>  (a java.util.TaskQueue) 
>     at java.util.TimerThread.run(Timer.java:462) 
> 
> "738807903@qtp0-0 - Acceptor0 SelectChannelConnector@0.0.0.0:50060" prio=10 tid=0x00002aaab4293800 nid=0x2a7c runnable [0x0000000042037000] 
>    java.lang.Thread.State: RUNNABLE 
>     at java.util.HashMap.newKeyIterator(HashMap.java:840) 
>     at java.util.HashMap$KeySet.iterator(HashMap.java:874) 
>     at java.util.HashSet.iterator(HashSet.java:153) 
>     at sun.nio.ch.SelectorImpl.processDeregisterQueue(SelectorImpl.java:127) 
>     - locked<0x00000000c18ae5c8>  (a java.util.HashSet) 
>     at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:69) 
>     at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) 
>     - locked<0x00000000c18aea00>  (a sun.nio.ch.Util$2) 
>     - locked<0x00000000c18ae9f0>  (a java.util.Collections$UnmodifiableSet) 
>     - locked<0x00000000c18ae570>  (a sun.nio.ch.EPollSelectorImpl) 
>     at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) 
>     at org.mortbay.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:429)
>     at org.mortbay.io.nio.SelectorManager.doSelect(SelectorManager.java:185) 
>     at org.mortbay.jetty.nio.SelectChannelConnector.accept(SelectChannelConnector.java:124)
>     at org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:707)
>     at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
> 
> "Low Memory Detector" daemon prio=10 tid=0x000000005d0d8800 nid=0x2a79 runnable [0x0000000000000000] 
>    java.lang.Thread.State: RUNNABLE 
> 
> "CompilerThread1" daemon prio=10 tid=0x000000005d0d6800 nid=0x2a78 waiting on condition [0x0000000000000000] 
>    java.lang.Thread.State: RUNNABLE 
> 
> "CompilerThread0" daemon prio=10 tid=0x000000005d0d0800 nid=0x2a77 waiting on condition [0x0000000000000000] 
>    java.lang.Thread.State: RUNNABLE 
> 
> "Signal Dispatcher" daemon prio=10 tid=0x000000005d0ce800 nid=0x2a76 runnable [0x0000000000000000] 
>    java.lang.Thread.State: RUNNABLE 
> 
> "Finalizer" daemon prio=10 tid=0x000000005d0aa800 nid=0x2a75 in Object.wait() [0x0000000041a5b000] 
>    java.lang.Thread.State: WAITING (on object monitor) 
>     at java.lang.Object.wait(Native Method) 
>     - waiting on<0x00000000c18ca210>  (a java.lang.ref.ReferenceQueue$Lock) 
>     at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) 
>     - locked<0x00000000c18ca210>  (a java.lang.ref.ReferenceQueue$Lock) 
>     at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134) 
>     at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) 
> 
> "Reference Handler" daemon prio=10 tid=0x000000005d0a8800 nid=0x2a74 in Object.wait() [0x000000004195a000] 
>    java.lang.Thread.State: WAITING (on object monitor) 
>     at java.lang.Object.wait(Native Method) 
>     - waiting on<0x00000000c18000b0>  (a java.lang.ref.Reference$Lock) 
>     at java.lang.Object.wait(Object.java:485) 
>     at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) 
>     - locked<0x00000000c18000b0>  (a java.lang.ref.Reference$Lock) 
> 
> "main" prio=10 tid=0x000000005d04a800 nid=0x2a70 waiting for monitor entry [0x0000000040666000] 
>    java.lang.Thread.State: BLOCKED (on object monitor) 
>     at org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:1533)
>     - waiting to lock<0x00000000c185f690>  (a org.apache.hadoop.mapred.TaskTracker) 
>     at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1432) 
>     at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2329) 
>     at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3461) 
> 
> "VM Thread" prio=10 tid=0x000000005d0a4000 nid=0x2a73 runnable 
> 
> "GC task thread#0 (ParallelGC)" prio=10 tid=0x000000005d05d800 nid=0x2a71 runnable 
> 
> "GC task thread#1 (ParallelGC)" prio=10 tid=0x000000005d05f800 nid=0x2a72 runnable 
> 
> "VM Periodic Task Thread" prio=10 tid=0x000000005d0e3000 nid=0x2a7a waiting on condition 
> 
> JNI global references: 1519 
> 
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/