You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Sanjay Radia (JIRA)" <ji...@apache.org> on 2008/03/27 20:03:26 UTC

[jira] Created: (HADOOP-3109) RPC should accepted connections even when rpc queue is full (ie undo part of HADOOP-2910)

RPC should accepted connections even when rpc queue is full (ie undo part of HADOOP-2910)
-----------------------------------------------------------------------------------------

Key: HADOOP-3109
URL: https://issues.apache.org/jira/browse/HADOOP-3109
Project: Hadoop Core
Issue Type: Bug
Reporter: Sanjay Radia
Assignee: Hairong Kuang
Fix For: 0.17.0

HADOOP-2910 changed HDFS to stop accepting new connections when the rpc queue is full. It should continue to accept connections and let the OS deal with limiting connections.

HADOOP-2910's decision to not read from open sockets when queue is full is exactly right - backup on the
client sockets and they will just wait( especially with HADOOP-2188 that removes client timeouts).
However we should continue to accept connections:

The OS refuses new connections after a large number of connections are open (this is configurable parameter). With this patch, we have new lower limit for # of open connections when the RPC queue is full.
The problem is that when there is a surge of requests, we would stop
accepting connection and clients will get a connection failed (a change from old behavior).
Instead if you continue to accept connections it is likely that the surge will be over shortly and
clients will get served. Of course if the surge lasts a long time the OS will stop accepting connections
and clients will fail and there not much one can do (except raise the os limit).

I propose that we continue accepting connections, but not read from
connections when the RPC queue is full. (ie undo part of 2910 work back to the old behavior).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3109) RPC should accepted connections even when rpc queue is full (ie undo part of HADOOP-2910)

Posted by "Robert Chansler (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Chansler updated HADOOP-3109:
------------------------------------

    Component/s: dfs

> RPC should accepted connections even when rpc queue is full (ie undo part of HADOOP-2910)
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3109
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3109
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Sanjay Radia
>            Assignee: Hairong Kuang
>             Fix For: 0.17.0
>
>
> HADOOP-2910 changed HDFS to stop accepting new connections when the rpc queue is full. It should continue to accept connections and let the OS  deal with limiting connections.
> HADOOP-2910's decision to not read from open sockets when queue is full is exactly right -  backup on the
> client sockets and they will just wait( especially with HADOOP-2188 that removes client timeouts).
> However we should continue to  accept connections:
> The OS refuses new connections after a large number of connections are open (this is configurable parameter). With this patch, we have  new lower limit for # of open connections when the RPC queue is full.
> The problem is that when there is a surge of requests, we would stop
> accepting connection and clients will get a connection failed (a change from old behavior).
> Instead if you continue to accept connections it is likely that the surge will be over shortly and
> clients will get served. Of course if the surge lasts a long time the OS will stop accepting connections
> and clients will fail and there not much one can do (except raise the os limit).
> I propose that we continue accepting connections, but not read from
> connections when the RPC queue is full. (ie undo part of 2910 work back to the old behavior).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3109) RPC should accepted connections even when rpc queue is full (ie undo part of HADOOP-2910)

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-3109:
----------------------------------

    Fix Version/s:     (was: 0.17.0)
                   0.18.0

I mark this jira to be resolved in 0.18. I will revert the patch to HADOOP-2910.

> RPC should accepted connections even when rpc queue is full (ie undo part of HADOOP-2910)
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3109
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3109
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Sanjay Radia
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.18.0
>
>
> HADOOP-2910 changed HDFS to stop accepting new connections when the rpc queue is full. It should continue to accept connections and let the OS  deal with limiting connections.
> HADOOP-2910's decision to not read from open sockets when queue is full is exactly right -  backup on the
> client sockets and they will just wait( especially with HADOOP-2188 that removes client timeouts).
> However we should continue to  accept connections:
> The OS refuses new connections after a large number of connections are open (this is configurable parameter). With this patch, we have  new lower limit for # of open connections when the RPC queue is full.
> The problem is that when there is a surge of requests, we would stop
> accepting connection and clients will get a connection failed (a change from old behavior).
> Instead if you continue to accept connections it is likely that the surge will be over shortly and
> clients will get served. Of course if the surge lasts a long time the OS will stop accepting connections
> and clients will fail and there not much one can do (except raise the os limit).
> I propose that we continue accepting connections, but not read from
> connections when the RPC queue is full. (ie undo part of 2910 work back to the old behavior).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3109) RPC should accepted connections even when rpc queue is full (ie undo part of HADOOP-2910)

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12582785#action_12582785 ] 

Raghu Angadi commented on HADOOP-3109:
--------------------------------------

> It should continue to accept connections and let the OS deal with limiting connections.
How can OS limit connections properly if application keeps accepting them?

There could be some global limit in the OS, but isn't that very harsh on everything else on the machine? Which parameter is this?

> The problem is that when there is a surge of requests, we would stop
> accepting connection and clients will get a connection failed (a change from old behavior).
timeout is removed in 2188.. if it is good for 2188, it good here too. Ideally we should just have 2188 :).



> RPC should accepted connections even when rpc queue is full (ie undo part of HADOOP-2910)
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3109
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3109
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Sanjay Radia
>            Assignee: Hairong Kuang
>             Fix For: 0.17.0
>
>
> HADOOP-2910 changed HDFS to stop accepting new connections when the rpc queue is full. It should continue to accept connections and let the OS  deal with limiting connections.
> HADOOP-2910's decision to not read from open sockets when queue is full is exactly right -  backup on the
> client sockets and they will just wait( especially with HADOOP-2188 that removes client timeouts).
> However we should continue to  accept connections:
> The OS refuses new connections after a large number of connections are open (this is configurable parameter). With this patch, we have  new lower limit for # of open connections when the RPC queue is full.
> The problem is that when there is a surge of requests, we would stop
> accepting connection and clients will get a connection failed (a change from old behavior).
> Instead if you continue to accept connections it is likely that the surge will be over shortly and
> clients will get served. Of course if the surge lasts a long time the OS will stop accepting connections
> and clients will fail and there not much one can do (except raise the os limit).
> I propose that we continue accepting connections, but not read from
> connections when the RPC queue is full. (ie undo part of 2910 work back to the old behavior).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3109) RPC should accepted connections even when rpc queue is full (ie undo part of HADOOP-2910)

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12583921#action_12583921 ] 

Doug Cutting commented on HADOOP-3109:
--------------------------------------

> Many Linux versions will truncate the backlog to 128

The default max is 128, but it's easy to increase this:

  sudo sysctl -w net.core.somaxconn=2048

That doesn't seem too onerous.

We do need to limit the number of accepted connections to substantially less than the file handle limit.  Increasing the listen queue length is a cheap way to get headroom beyond this.


> RPC should accepted connections even when rpc queue is full (ie undo part of HADOOP-2910)
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3109
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3109
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Sanjay Radia
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.18.0
>
>
> HADOOP-2910 changed HDFS to stop accepting new connections when the rpc queue is full. It should continue to accept connections and let the OS  deal with limiting connections.
> HADOOP-2910's decision to not read from open sockets when queue is full is exactly right -  backup on the
> client sockets and they will just wait( especially with HADOOP-2188 that removes client timeouts).
> However we should continue to  accept connections:
> The OS refuses new connections after a large number of connections are open (this is configurable parameter). With this patch, we have  new lower limit for # of open connections when the RPC queue is full.
> The problem is that when there is a surge of requests, we would stop
> accepting connection and clients will get a connection failed (a change from old behavior).
> Instead if you continue to accept connections it is likely that the surge will be over shortly and
> clients will get served. Of course if the surge lasts a long time the OS will stop accepting connections
> and clients will fail and there not much one can do (except raise the os limit).
> I propose that we continue accepting connections, but not read from
> connections when the RPC queue is full. (ie undo part of 2910 work back to the old behavior).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3109) RPC should accepted connections even when rpc queue is full (ie undo part of HADOOP-2910)

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12583907#action_12583907 ] 

Hairong Kuang commented on HADOOP-3109:
---------------------------------------

That explains why more than 7 connect requests got served when I set the backlog length to be 1. Looks that Linux does not observe the backlog parameter.

> RPC should accepted connections even when rpc queue is full (ie undo part of HADOOP-2910)
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3109
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3109
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Sanjay Radia
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.17.0
>
>
> HADOOP-2910 changed HDFS to stop accepting new connections when the rpc queue is full. It should continue to accept connections and let the OS  deal with limiting connections.
> HADOOP-2910's decision to not read from open sockets when queue is full is exactly right -  backup on the
> client sockets and they will just wait( especially with HADOOP-2188 that removes client timeouts).
> However we should continue to  accept connections:
> The OS refuses new connections after a large number of connections are open (this is configurable parameter). With this patch, we have  new lower limit for # of open connections when the RPC queue is full.
> The problem is that when there is a surge of requests, we would stop
> accepting connection and clients will get a connection failed (a change from old behavior).
> Instead if you continue to accept connections it is likely that the surge will be over shortly and
> clients will get served. Of course if the surge lasts a long time the OS will stop accepting connections
> and clients will fail and there not much one can do (except raise the os limit).
> I propose that we continue accepting connections, but not read from
> connections when the RPC queue is full. (ie undo part of 2910 work back to the old behavior).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3109) RPC should accepted connections even when rpc queue is full (ie undo part of HADOOP-2910)

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12582781#action_12582781 ] 

Doug Cutting commented on HADOOP-3109:
--------------------------------------

Wouldn't it be easier to increase the sockets backlog size and remove the connect timeout?

> RPC should accepted connections even when rpc queue is full (ie undo part of HADOOP-2910)
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3109
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3109
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Sanjay Radia
>            Assignee: Hairong Kuang
>             Fix For: 0.17.0
>
>
> HADOOP-2910 changed HDFS to stop accepting new connections when the rpc queue is full. It should continue to accept connections and let the OS  deal with limiting connections.
> HADOOP-2910's decision to not read from open sockets when queue is full is exactly right -  backup on the
> client sockets and they will just wait( especially with HADOOP-2188 that removes client timeouts).
> However we should continue to  accept connections:
> The OS refuses new connections after a large number of connections are open (this is configurable parameter). With this patch, we have  new lower limit for # of open connections when the RPC queue is full.
> The problem is that when there is a surge of requests, we would stop
> accepting connection and clients will get a connection failed (a change from old behavior).
> Instead if you continue to accept connections it is likely that the surge will be over shortly and
> clients will get served. Of course if the surge lasts a long time the OS will stop accepting connections
> and clients will fail and there not much one can do (except raise the os limit).
> I propose that we continue accepting connections, but not read from
> connections when the RPC queue is full. (ie undo part of 2910 work back to the old behavior).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3109) RPC should accepted connections even when rpc queue is full (ie undo part of HADOOP-2910)

Posted by "Robert Chansler (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Chansler updated HADOOP-3109:
------------------------------------

    Priority: Blocker  (was: Major)

0.17? Yes!

> RPC should accepted connections even when rpc queue is full (ie undo part of HADOOP-2910)
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3109
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3109
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Sanjay Radia
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.17.0
>
>
> HADOOP-2910 changed HDFS to stop accepting new connections when the rpc queue is full. It should continue to accept connections and let the OS  deal with limiting connections.
> HADOOP-2910's decision to not read from open sockets when queue is full is exactly right -  backup on the
> client sockets and they will just wait( especially with HADOOP-2188 that removes client timeouts).
> However we should continue to  accept connections:
> The OS refuses new connections after a large number of connections are open (this is configurable parameter). With this patch, we have  new lower limit for # of open connections when the RPC queue is full.
> The problem is that when there is a surge of requests, we would stop
> accepting connection and clients will get a connection failed (a change from old behavior).
> Instead if you continue to accept connections it is likely that the surge will be over shortly and
> clients will get served. Of course if the surge lasts a long time the OS will stop accepting connections
> and clients will fail and there not much one can do (except raise the os limit).
> I propose that we continue accepting connections, but not read from
> connections when the RPC queue is full. (ie undo part of 2910 work back to the old behavior).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3109) RPC should accepted connections even when rpc queue is full (ie undo part of HADOOP-2910)

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12583261#action_12583261 ] 

Hairong Kuang commented on HADOOP-3109:
---------------------------------------

> Wouldn't it be easier to increase the sockets backlog size and remove the connect timeout?
Increasing sockets backlog size might be a good solution. What should be a good backlog size? The connect timeout is already removed in HADOOP-2910.

> RPC should accepted connections even when rpc queue is full (ie undo part of HADOOP-2910)
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3109
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3109
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Sanjay Radia
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.17.0
>
>
> HADOOP-2910 changed HDFS to stop accepting new connections when the rpc queue is full. It should continue to accept connections and let the OS  deal with limiting connections.
> HADOOP-2910's decision to not read from open sockets when queue is full is exactly right -  backup on the
> client sockets and they will just wait( especially with HADOOP-2188 that removes client timeouts).
> However we should continue to  accept connections:
> The OS refuses new connections after a large number of connections are open (this is configurable parameter). With this patch, we have  new lower limit for # of open connections when the RPC queue is full.
> The problem is that when there is a surge of requests, we would stop
> accepting connection and clients will get a connection failed (a change from old behavior).
> Instead if you continue to accept connections it is likely that the surge will be over shortly and
> clients will get served. Of course if the surge lasts a long time the OS will stop accepting connections
> and clients will fail and there not much one can do (except raise the os limit).
> I propose that we continue accepting connections, but not read from
> connections when the RPC queue is full. (ie undo part of 2910 work back to the old behavior).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3109) RPC should accepted connections even when rpc queue is full (ie undo part of HADOOP-2910)

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12583768#action_12583768 ] 

Doug Cutting commented on HADOOP-3109:
--------------------------------------

> What should be a good backlog size?

Perhaps this should be proportional to call queue?  Currently we queue 100 calls per handler with 10 handlers, or 1000 by default.  The backlog is currently 128.  So setting the backlog to the call queue length would make it 1000 by default.  Folks with large clusters increase the number of handlers to 50 or so, so they'd get a backlog of 5000.  Does that sound like enough, or should we use a multiple of this?


> RPC should accepted connections even when rpc queue is full (ie undo part of HADOOP-2910)
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3109
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3109
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Sanjay Radia
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.17.0
>
>
> HADOOP-2910 changed HDFS to stop accepting new connections when the rpc queue is full. It should continue to accept connections and let the OS  deal with limiting connections.
> HADOOP-2910's decision to not read from open sockets when queue is full is exactly right -  backup on the
> client sockets and they will just wait( especially with HADOOP-2188 that removes client timeouts).
> However we should continue to  accept connections:
> The OS refuses new connections after a large number of connections are open (this is configurable parameter). With this patch, we have  new lower limit for # of open connections when the RPC queue is full.
> The problem is that when there is a surge of requests, we would stop
> accepting connection and clients will get a connection failed (a change from old behavior).
> Instead if you continue to accept connections it is likely that the surge will be over shortly and
> clients will get served. Of course if the surge lasts a long time the OS will stop accepting connections
> and clients will fail and there not much one can do (except raise the os limit).
> I propose that we continue accepting connections, but not read from
> connections when the RPC queue is full. (ie undo part of 2910 work back to the old behavior).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HADOOP-3109) RPC should accepted connections even when rpc queue is full (ie undo part of HADOOP-2910)

Posted by "Robert Chansler (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Chansler resolved HADOOP-3109.
-------------------------------------

    Resolution: Fixed

This was incorporated into the ultimate resolution for 2910.
There is no independent patch or change.

> RPC should accepted connections even when rpc queue is full (ie undo part of HADOOP-2910)
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3109
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3109
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Sanjay Radia
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.18.0
>
>
> HADOOP-2910 changed HDFS to stop accepting new connections when the rpc queue is full. It should continue to accept connections and let the OS  deal with limiting connections.
> HADOOP-2910's decision to not read from open sockets when queue is full is exactly right -  backup on the
> client sockets and they will just wait( especially with HADOOP-2188 that removes client timeouts).
> However we should continue to  accept connections:
> The OS refuses new connections after a large number of connections are open (this is configurable parameter). With this patch, we have  new lower limit for # of open connections when the RPC queue is full.
> The problem is that when there is a surge of requests, we would stop
> accepting connection and clients will get a connection failed (a change from old behavior).
> Instead if you continue to accept connections it is likely that the surge will be over shortly and
> clients will get served. Of course if the surge lasts a long time the OS will stop accepting connections
> and clients will fail and there not much one can do (except raise the os limit).
> I propose that we continue accepting connections, but not read from
> connections when the RPC queue is full. (ie undo part of 2910 work back to the old behavior).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3109) RPC should accepted connections even when rpc queue is full (ie undo part of HADOOP-2910)

Posted by "Sameer Paranjpye (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12583901#action_12583901 ] 

Sameer Paranjpye commented on HADOOP-3109:
------------------------------------------

Managing the backlog in a portable way is not easy. Many Linux versions will truncate the backlog to 128 (silently) if it is set higher, for example.




> RPC should accepted connections even when rpc queue is full (ie undo part of HADOOP-2910)
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3109
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3109
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Sanjay Radia
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.17.0
>
>
> HADOOP-2910 changed HDFS to stop accepting new connections when the rpc queue is full. It should continue to accept connections and let the OS  deal with limiting connections.
> HADOOP-2910's decision to not read from open sockets when queue is full is exactly right -  backup on the
> client sockets and they will just wait( especially with HADOOP-2188 that removes client timeouts).
> However we should continue to  accept connections:
> The OS refuses new connections after a large number of connections are open (this is configurable parameter). With this patch, we have  new lower limit for # of open connections when the RPC queue is full.
> The problem is that when there is a surge of requests, we would stop
> accepting connection and clients will get a connection failed (a change from old behavior).
> Instead if you continue to accept connections it is likely that the surge will be over shortly and
> clients will get served. Of course if the surge lasts a long time the OS will stop accepting connections
> and clients will fail and there not much one can do (except raise the os limit).
> I propose that we continue accepting connections, but not read from
> connections when the RPC queue is full. (ie undo part of 2910 work back to the old behavior).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3109) RPC should accepted connections even when rpc queue is full (ie undo part of HADOOP-2910)

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12583913#action_12583913 ] 

Hairong Kuang commented on HADOOP-3109:
---------------------------------------

So what I plan to do is to have a thread that only accepts new connections and a threading that reads from accepted connections. Each thread has its own seclector. The accepting thread notifies the reading thread of the newly accepted connection through a pipe.

> RPC should accepted connections even when rpc queue is full (ie undo part of HADOOP-2910)
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3109
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3109
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Sanjay Radia
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.17.0
>
>
> HADOOP-2910 changed HDFS to stop accepting new connections when the rpc queue is full. It should continue to accept connections and let the OS  deal with limiting connections.
> HADOOP-2910's decision to not read from open sockets when queue is full is exactly right -  backup on the
> client sockets and they will just wait( especially with HADOOP-2188 that removes client timeouts).
> However we should continue to  accept connections:
> The OS refuses new connections after a large number of connections are open (this is configurable parameter). With this patch, we have  new lower limit for # of open connections when the RPC queue is full.
> The problem is that when there is a surge of requests, we would stop
> accepting connection and clients will get a connection failed (a change from old behavior).
> Instead if you continue to accept connections it is likely that the surge will be over shortly and
> clients will get served. Of course if the surge lasts a long time the OS will stop accepting connections
> and clients will fail and there not much one can do (except raise the os limit).
> I propose that we continue accepting connections, but not read from
> connections when the RPC queue is full. (ie undo part of 2910 work back to the old behavior).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.