You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Konstantin Shvachko (Commented) (JIRA)" <ji...@apache.org> on 2012/02/25 03:11:53 UTC
[jira] [Commented] (HADOOP-7191) BackUpNameNode is using 100% CPU
and not accepting any requests.
[ https://issues.apache.org/jira/browse/HADOOP-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13216246#comment-13216246 ]
Konstantin Shvachko commented on HADOOP-7191:
---------------------------------------------
I've seen this in DataNode this time, similar to as described long time ago in HADOOP-3132.
Thousands of DataXceiver threads are deadlocked in epollWait. Like below.
{code}
"DataXceiver for client /000.000.000.000:50532 [Sending block blk_902175426054289774_62873730]" daemon prio=10 tid=0x00007f18ccb93000 nid=0x7e2b runnable [0x00007f17effff000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
- locked <0x0000000788e43cd0> (a sun.nio.ch.Util$2)
- locked <0x0000000788e43cc0> (a java.util.Collections$UnmodifiableSet)
- locked <0x0000000788e43a78> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:332)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:159)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:132)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
- locked <0x0000000789025fe0> (a java.io.BufferedInputStream)
at java.io.DataInputStream.readShort(DataInputStream.java:295)
at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.readOp(DataTransferProtocol.java:314)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:141)
at java.lang.Thread.run(Thread.java:662)
{code}
Ramkrishna, is your patch intended to address this java bug [6403933|http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6403933] and providing a workaround?
Is there a way to test the condition (in any form)?
Also we should probably rename this jira, as it has more general application.
> BackUpNameNode is using 100% CPU and not accepting any requests.
> -----------------------------------------------------------------
>
> Key: HADOOP-7191
> URL: https://issues.apache.org/jira/browse/HADOOP-7191
> Project: Hadoop Common
> Issue Type: Bug
> Components: ipc
> Affects Versions: 0.20-append, 0.23.0
> Reporter: Uma Maheswara Rao G
> Attachments: HADOOP-7191.patch
>
>
> In our environment, after 3 days long run Backup NameNode is using 100% CPU and not accepting any calls.
> *Thread dump*
> "IPC Server Responder" daemon prio=10 tid=0x00007f86c41c6800 nid=0x3b2a runnable [0x00007f86ce579000]
> java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
> at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
> at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
> at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
> locked <0x00007f86d67e2a20> (a sun.nio.ch.Util$1)
> locked <0x00007f86d67e2a08> (a java.util.Collections$UnmodifiableSet)
> locked <0x00007f86d67e26a8> (a sun.nio.ch.EPollSelectorImpl)
> at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
> at org.apache.hadoop.ipc.Server$Responder.run(Server.java:501)
> Looks like we are running into similar issue like this Jetty one. http://jira.codehaus.org/browse/JETTY-937
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira