You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Vinayakumar B (JIRA)" <ji...@apache.org> on 2016/05/31 07:27:12 UTC

[jira] [Moved] (HADOOP-13219) NameNode Rpc Reader Thread crash, and cluster hang.

     [ https://issues.apache.org/jira/browse/HADOOP-13219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinayakumar B moved HDFS-10472 to HADOOP-13219:
-----------------------------------------------

    Affects Version/s:     (was: 2.6.4)
                           (was: 2.6.2)
                           (was: 2.7.2)
                           (was: 2.8.0)
                           (was: 2.6.0)
                           (was: 2.5.0)
                       2.8.0
                       2.5.0
                       2.6.0
                       2.7.2
                       2.6.2
                       2.6.4
          Component/s:     (was: hdfs)
                           (was: namenode)
                       rpc-server
                  Key: HADOOP-13219  (was: HDFS-10472)
              Project: Hadoop Common  (was: Hadoop HDFS)

> NameNode Rpc Reader Thread crash, and cluster hang.
> ---------------------------------------------------
>
>                 Key: HADOOP-13219
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13219
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: rpc-server
>    Affects Versions: 2.6.4, 2.6.2, 2.7.2, 2.6.0, 2.5.0, 2.8.0
>            Reporter: ChenFolin
>              Labels: patch
>         Attachments: HDFS-10472.patch
>
>
> My Cluster hang yesterday .
> Becuase the rpc server Reader threads crash. So all rpc request  timeout, include datanode hearbeat &.
> We can see , the method doRunLoop just catch InterruptedException and IOException:
> while (running) {
>           SelectionKey key = null;
>           try {
>             // consume as many connections as currently queued to avoid
>             // unbridled acceptance of connections that starves the select
>             int size = pendingConnections.size();
>             for (int i=size; i>0; i--) {
>               Connection conn = pendingConnections.take();
>               conn.channel.register(readSelector, SelectionKey.OP_READ, conn);
>             }
>             readSelector.select();
>             Iterator<SelectionKey> iter = readSelector.selectedKeys().iterator();
>             while (iter.hasNext()) {
>               key = iter.next();
>               iter.remove();
>               if (key.isValid()) {
>                 if (key.isReadable()) {
>                   doRead(key);
>                 }
>               }
>               key = null;
>             }
>           } catch (InterruptedException e) {
>             if (running) {                      // unexpected -- log it
>               LOG.info(Thread.currentThread().getName() + " unexpectedly interrupted", e);
>             }
>           } catch (IOException ex) {
>             LOG.error("Error in Reader", ex);
>           } 
>         }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org