You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Konstantin Shvachko (Commented) (JIRA)" <ji...@apache.org> on 2012/02/25 03:11:53 UTC

[jira] [Commented] (HADOOP-7191) BackUpNameNode is using 100% CPU and not accepting any requests.

    [ https://issues.apache.org/jira/browse/HADOOP-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13216246#comment-13216246 ] 

Konstantin Shvachko commented on HADOOP-7191:
---------------------------------------------

I've seen this in DataNode this time, similar to as described long time ago in HADOOP-3132.
Thousands of DataXceiver threads are deadlocked in epollWait. Like below.
{code}
"DataXceiver for client /000.000.000.000:50532 [Sending block blk_902175426054289774_62873730]" daemon prio=10 tid=0x00007f18ccb93000 nid=0x7e2b runnable [0x00007f17effff000]
   java.lang.Thread.State: RUNNABLE
	at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
	at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
	at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
	at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
	- locked <0x0000000788e43cd0> (a sun.nio.ch.Util$2)
	- locked <0x0000000788e43cc0> (a java.util.Collections$UnmodifiableSet)
	- locked <0x0000000788e43a78> (a sun.nio.ch.EPollSelectorImpl)
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
	at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:332)
	at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:159)
	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:132)
	at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
	- locked <0x0000000789025fe0> (a java.io.BufferedInputStream)
	at java.io.DataInputStream.readShort(DataInputStream.java:295)
	at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.readOp(DataTransferProtocol.java:314)
	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:141)
	at java.lang.Thread.run(Thread.java:662)
{code}

Ramkrishna, is your patch intended to address this java bug [6403933|http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6403933] and providing a workaround?
Is there a way to test the condition (in any form)?

Also we should probably rename this jira, as it has more general application.
                
>  BackUpNameNode is using 100% CPU and not accepting any requests.
> -----------------------------------------------------------------
>
>                 Key: HADOOP-7191
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7191
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ipc
>    Affects Versions: 0.20-append, 0.23.0
>            Reporter: Uma Maheswara Rao G
>         Attachments: HADOOP-7191.patch
>
>
> In our environment, after 3 days long run Backup NameNode is using 100% CPU and not accepting any calls. 
> *Thread dump*
> "IPC Server Responder" daemon prio=10 tid=0x00007f86c41c6800 nid=0x3b2a runnable [0x00007f86ce579000]
> java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
> at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
> at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
> at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
> locked <0x00007f86d67e2a20> (a sun.nio.ch.Util$1) 
> locked <0x00007f86d67e2a08> (a java.util.Collections$UnmodifiableSet) 
> locked <0x00007f86d67e26a8> (a sun.nio.ch.EPollSelectorImpl)
> at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
> at org.apache.hadoop.ipc.Server$Responder.run(Server.java:501) 
> Looks like we are running into similar issue like this Jetty one. http://jira.codehaus.org/browse/JETTY-937

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira