You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Uma Maheswara Rao G (JIRA)" <ji...@apache.org> on 2011/03/14 14:13:29 UTC

[jira] Created: (HADOOP-7186) BackUpNameNode is using 100% CPU and not accepting any requests.

 BackUpNameNode is using 100% CPU and not accepting any requests.
-----------------------------------------------------------------

                 Key: HADOOP-7186
                 URL: https://issues.apache.org/jira/browse/HADOOP-7186
             Project: Hadoop Common
          Issue Type: Bug
          Components: io, ipc
            Reporter: Uma Maheswara Rao G


In our environment, after 3 days long run Backup NameNode is using 100% CPU and not accepting any calls. 
*Thread dump*
"IPC Server Responder" daemon prio=10 tid=0x00007f86c41c6800 nid=0x3b2a runnable [0x00007f86ce579000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)

locked <0x00007f86d67e2a20> (a sun.nio.ch.Util$1) 
locked <0x00007f86d67e2a08> (a java.util.Collections$UnmodifiableSet) 
locked <0x00007f86d67e26a8> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
at org.apache.hadoop.ipc.Server$Responder.run(Server.java:501) 
Looks like we are running into similar issue like this Jetty one. http://jira.codehaus.org/browse/JETTY-937



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-7191) BackUpNameNode is using 100% CPU and not accepting any requests.

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ramkrishna.s.vasudevan updated HADOOP-7191:
-------------------------------------------

    Attachment: HADOOP-7191.patch

>  BackUpNameNode is using 100% CPU and not accepting any requests.
> -----------------------------------------------------------------
>
>                 Key: HADOOP-7191
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7191
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ipc
>            Reporter: Uma Maheswara Rao G
>         Attachments: HADOOP-7191.patch
>
>
> In our environment, after 3 days long run Backup NameNode is using 100% CPU and not accepting any calls. 
> *Thread dump*
> "IPC Server Responder" daemon prio=10 tid=0x00007f86c41c6800 nid=0x3b2a runnable [0x00007f86ce579000]
> java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
> at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
> at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
> at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
> locked <0x00007f86d67e2a20> (a sun.nio.ch.Util$1) 
> locked <0x00007f86d67e2a08> (a java.util.Collections$UnmodifiableSet) 
> locked <0x00007f86d67e26a8> (a sun.nio.ch.EPollSelectorImpl)
> at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
> at org.apache.hadoop.ipc.Server$Responder.run(Server.java:501) 
> Looks like we are running into similar issue like this Jetty one. http://jira.codehaus.org/browse/JETTY-937

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-7191) BackUpNameNode is using 100% CPU and not accepting any requests.

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042804#comment-13042804 ] 

Hadoop QA commented on HADOOP-7191:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12481236/HADOOP-7191.patch
  against trunk revision 1129989.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 system test framework.  The patch passed system test framework compile.

Test results: https://builds.apache.org/hudson/job/PreCommit-HADOOP-Build/560//testReport/
Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-HADOOP-Build/560//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/hudson/job/PreCommit-HADOOP-Build/560//console

This message is automatically generated.

>  BackUpNameNode is using 100% CPU and not accepting any requests.
> -----------------------------------------------------------------
>
>                 Key: HADOOP-7191
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7191
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ipc
>    Affects Versions: 0.20-append, 0.23.0
>            Reporter: Uma Maheswara Rao G
>         Attachments: HADOOP-7191.patch
>
>
> In our environment, after 3 days long run Backup NameNode is using 100% CPU and not accepting any calls. 
> *Thread dump*
> "IPC Server Responder" daemon prio=10 tid=0x00007f86c41c6800 nid=0x3b2a runnable [0x00007f86ce579000]
> java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
> at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
> at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
> at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
> locked <0x00007f86d67e2a20> (a sun.nio.ch.Util$1) 
> locked <0x00007f86d67e2a08> (a java.util.Collections$UnmodifiableSet) 
> locked <0x00007f86d67e26a8> (a sun.nio.ch.EPollSelectorImpl)
> at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
> at org.apache.hadoop.ipc.Server$Responder.run(Server.java:501) 
> Looks like we are running into similar issue like this Jetty one. http://jira.codehaus.org/browse/JETTY-937

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Moved: (HADOOP-7191) BackUpNameNode is using 100% CPU and not accepting any requests.

Posted by "Uma Maheswara Rao G (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uma Maheswara Rao G moved HDFS-1756 to HADOOP-7191:
---------------------------------------------------

    Component/s:     (was: name-node)
                 ipc
            Key: HADOOP-7191  (was: HDFS-1756)
        Project: Hadoop Common  (was: Hadoop HDFS)

>  BackUpNameNode is using 100% CPU and not accepting any requests.
> -----------------------------------------------------------------
>
>                 Key: HADOOP-7191
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7191
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ipc
>            Reporter: Uma Maheswara Rao G
>
> In our environment, after 3 days long run Backup NameNode is using 100% CPU and not accepting any calls. 
> *Thread dump*
> "IPC Server Responder" daemon prio=10 tid=0x00007f86c41c6800 nid=0x3b2a runnable [0x00007f86ce579000]
> java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
> at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
> at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
> at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
> locked <0x00007f86d67e2a20> (a sun.nio.ch.Util$1) 
> locked <0x00007f86d67e2a08> (a java.util.Collections$UnmodifiableSet) 
> locked <0x00007f86d67e26a8> (a sun.nio.ch.EPollSelectorImpl)
> at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
> at org.apache.hadoop.ipc.Server$Responder.run(Server.java:501) 
> Looks like we are running into similar issue like this Jetty one. http://jira.codehaus.org/browse/JETTY-937

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-7191) BackUpNameNode is using 100% CPU and not accepting any requests.

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ramkrishna.s.vasudevan updated HADOOP-7191:
-------------------------------------------

    Affects Version/s: 0.23.0
                       0.20-append
               Status: Patch Available  (was: Open)

As per the patch in DIRMINA-678

>  BackUpNameNode is using 100% CPU and not accepting any requests.
> -----------------------------------------------------------------
>
>                 Key: HADOOP-7191
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7191
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ipc
>    Affects Versions: 0.20-append, 0.23.0
>            Reporter: Uma Maheswara Rao G
>         Attachments: HADOOP-7191.patch
>
>
> In our environment, after 3 days long run Backup NameNode is using 100% CPU and not accepting any calls. 
> *Thread dump*
> "IPC Server Responder" daemon prio=10 tid=0x00007f86c41c6800 nid=0x3b2a runnable [0x00007f86ce579000]
> java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
> at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
> at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
> at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
> locked <0x00007f86d67e2a20> (a sun.nio.ch.Util$1) 
> locked <0x00007f86d67e2a08> (a java.util.Collections$UnmodifiableSet) 
> locked <0x00007f86d67e26a8> (a sun.nio.ch.EPollSelectorImpl)
> at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
> at org.apache.hadoop.ipc.Server$Responder.run(Server.java:501) 
> Looks like we are running into similar issue like this Jetty one. http://jira.codehaus.org/browse/JETTY-937

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-7191) BackUpNameNode is using 100% CPU and not accepting any requests.

Posted by "Konstantin Shvachko (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13216246#comment-13216246 ] 

Konstantin Shvachko commented on HADOOP-7191:
---------------------------------------------

I've seen this in DataNode this time, similar to as described long time ago in HADOOP-3132.
Thousands of DataXceiver threads are deadlocked in epollWait. Like below.
{code}
"DataXceiver for client /000.000.000.000:50532 [Sending block blk_902175426054289774_62873730]" daemon prio=10 tid=0x00007f18ccb93000 nid=0x7e2b runnable [0x00007f17effff000]
   java.lang.Thread.State: RUNNABLE
	at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
	at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
	at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
	at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
	- locked <0x0000000788e43cd0> (a sun.nio.ch.Util$2)
	- locked <0x0000000788e43cc0> (a java.util.Collections$UnmodifiableSet)
	- locked <0x0000000788e43a78> (a sun.nio.ch.EPollSelectorImpl)
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
	at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:332)
	at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:159)
	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:132)
	at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
	- locked <0x0000000789025fe0> (a java.io.BufferedInputStream)
	at java.io.DataInputStream.readShort(DataInputStream.java:295)
	at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.readOp(DataTransferProtocol.java:314)
	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:141)
	at java.lang.Thread.run(Thread.java:662)
{code}

Ramkrishna, is your patch intended to address this java bug [6403933|http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6403933] and providing a workaround?
Is there a way to test the condition (in any form)?

Also we should probably rename this jira, as it has more general application.
                
>  BackUpNameNode is using 100% CPU and not accepting any requests.
> -----------------------------------------------------------------
>
>                 Key: HADOOP-7191
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7191
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ipc
>    Affects Versions: 0.20-append, 0.23.0
>            Reporter: Uma Maheswara Rao G
>         Attachments: HADOOP-7191.patch
>
>
> In our environment, after 3 days long run Backup NameNode is using 100% CPU and not accepting any calls. 
> *Thread dump*
> "IPC Server Responder" daemon prio=10 tid=0x00007f86c41c6800 nid=0x3b2a runnable [0x00007f86ce579000]
> java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
> at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
> at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
> at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
> locked <0x00007f86d67e2a20> (a sun.nio.ch.Util$1) 
> locked <0x00007f86d67e2a08> (a java.util.Collections$UnmodifiableSet) 
> locked <0x00007f86d67e26a8> (a sun.nio.ch.EPollSelectorImpl)
> at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
> at org.apache.hadoop.ipc.Server$Responder.run(Server.java:501) 
> Looks like we are running into similar issue like this Jetty one. http://jira.codehaus.org/browse/JETTY-937

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira