You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-issues@hadoop.apache.org by "Owen O'Malley (JIRA)" <ji...@apache.org> on 2011/03/04 18:19:37 UTC

[jira] Created: (HADOOP-7163) "java.net.SocketTimeoutException: 60000 millis timeout" happens a lot

"java.net.SocketTimeoutException: 60000 millis timeout" happens a lot
---------------------------------------------------------------------

                 Key: HADOOP-7163
                 URL: https://issues.apache.org/jira/browse/HADOOP-7163
             Project: Hadoop Common
          Issue Type: Bug
          Components: ipc
            Reporter: Owen O'Malley
            Assignee: Devaraj Das
             Fix For: 0.20.100


We don't have retries for the case where the secure SASL connection is getting created from the tasks. There is retry
for TCP connections, but once the TCP connection has been set up, communication at the RPC layer (and that includes
SASL handshake) happens without retries. So for example, a client's "read" can timeout.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-7163) "java.net.SocketTimeoutException: 60000 millis timeout" happens a lot

Posted by "Steve Loughran (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-7163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066902#comment-13066902 ] 

Steve Loughran commented on HADOOP-7163:
----------------------------------------

Have you got a stack trace?

> "java.net.SocketTimeoutException: 60000 millis timeout" happens a lot
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-7163
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7163
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ipc
>            Reporter: Owen O'Malley
>            Assignee: Devaraj Das
>             Fix For: 0.20.203.0
>
>
> We don't have retries for the case where the secure SASL connection is getting created from the tasks. There is retry
> for TCP connections, but once the TCP connection has been set up, communication at the RPC layer (and that includes
> SASL handshake) happens without retries. So for example, a client's "read" can timeout.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-7163) "java.net.SocketTimeoutException: 60000 millis timeout" happens a lot

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-7163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated HADOOP-7163:
----------------------------------

    Fix Version/s:     (was: 0.20.204.0)
                   0.20.203.0

> "java.net.SocketTimeoutException: 60000 millis timeout" happens a lot
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-7163
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7163
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ipc
>            Reporter: Owen O'Malley
>            Assignee: Devaraj Das
>             Fix For: 0.20.203.0
>
>
> We don't have retries for the case where the secure SASL connection is getting created from the tasks. There is retry
> for TCP connections, but once the TCP connection has been set up, communication at the RPC layer (and that includes
> SASL handshake) happens without retries. So for example, a client's "read" can timeout.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-7163) "java.net.SocketTimeoutException: 60000 millis timeout" happens a lot

Posted by "Eli Collins (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-7163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091480#comment-13091480 ] 

Eli Collins commented on HADOOP-7163:
-------------------------------------

What change fixed this?

> "java.net.SocketTimeoutException: 60000 millis timeout" happens a lot
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-7163
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7163
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ipc
>            Reporter: Owen O'Malley
>            Assignee: Devaraj Das
>             Fix For: 0.20.203.0
>
>
> We don't have retries for the case where the secure SASL connection is getting created from the tasks. There is retry
> for TCP connections, but once the TCP connection has been set up, communication at the RPC layer (and that includes
> SASL handshake) happens without retries. So for example, a client's "read" can timeout.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-7163) "java.net.SocketTimeoutException: 60000 millis timeout" happens a lot

Posted by "Steve Loughran (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-7163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Steve Loughran updated HADOOP-7163:
-----------------------------------

    Fix Version/s:     (was: 0.20.203.0)
                   0.20.204.0

> "java.net.SocketTimeoutException: 60000 millis timeout" happens a lot
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-7163
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7163
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ipc
>            Reporter: Owen O'Malley
>            Assignee: Devaraj Das
>             Fix For: 0.20.204.0
>
>
> We don't have retries for the case where the secure SASL connection is getting created from the tasks. There is retry
> for TCP connections, but once the TCP connection has been set up, communication at the RPC layer (and that includes
> SASL handshake) happens without retries. So for example, a client's "read" can timeout.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HADOOP-7163) "java.net.SocketTimeoutException: 60000 millis timeout" happens a lot

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-7163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley resolved HADOOP-7163.
-----------------------------------

    Resolution: Fixed

> "java.net.SocketTimeoutException: 60000 millis timeout" happens a lot
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-7163
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7163
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ipc
>            Reporter: Owen O'Malley
>            Assignee: Devaraj Das
>             Fix For: 0.20.203.0
>
>
> We don't have retries for the case where the secure SASL connection is getting created from the tasks. There is retry
> for TCP connections, but once the TCP connection has been set up, communication at the RPC layer (and that includes
> SASL handshake) happens without retries. So for example, a client's "read" can timeout.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-7163) "java.net.SocketTimeoutException: 60000 millis timeout" happens a lot

Posted by "Dave Thompson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-7163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13078453#comment-13078453 ] 

Dave Thompson commented on HADOOP-7163:
---------------------------------------

It looks like the client code cleans up and is capable of handling connection failures, ready for retry.   Would RetryProxy be suitable for the particular scenario you're thinking of?

> "java.net.SocketTimeoutException: 60000 millis timeout" happens a lot
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-7163
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7163
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ipc
>            Reporter: Owen O'Malley
>            Assignee: Devaraj Das
>             Fix For: 0.20.204.0
>
>
> We don't have retries for the case where the secure SASL connection is getting created from the tasks. There is retry
> for TCP connections, but once the TCP connection has been set up, communication at the RPC layer (and that includes
> SASL handshake) happens without retries. So for example, a client's "read" can timeout.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira