You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by "Ashwin Shankar (JIRA)" <ji...@apache.org> on 2014/09/24 21:27:33 UTC
[jira] [Created] (MAPREDUCE-6107) Job history server becomes
unresponsive due to stuck thread in epollWait
Ashwin Shankar created MAPREDUCE-6107:
-----------------------------------------
Summary: Job history server becomes unresponsive due to stuck thread in epollWait
Key: MAPREDUCE-6107
URL: https://issues.apache.org/jira/browse/MAPREDUCE-6107
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: jobhistoryserver
Affects Versions: 2.4.0
Reporter: Ashwin Shankar
About once every week, we see job history server becomes unresponsive on one of our 2000 node hadoop cluster. Looking at the thread dump, I see that multiple threads are blocked on locks acquired by couple of threads, which in turn are endlessly stuck in epollWait while talking to hdfs to get a history file.
When the number of blocked threads touches the thread pool size, JHS becomes unresponsive to new clients requests.
Thread dump attached.
Has anyone seen this before ?
Here is the thread stuck at epollWait.
{code}
"IPC Server handler 4 on 10020" daemon prio=10 tid=0x00007f7eb10f5000 nid=0x144d runnable [0x00007f7e9108d000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
- locked <0x00000006c89d3240> (a sun.nio.ch.Util$2)
- locked <0x00000006c89d3228> (a java.util.Collections$UnmodifiableSet)
- locked <0x00000006bb32f8b8> (a sun.nio.ch.EPollSelectorImpl)
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)