You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-issues@hadoop.apache.org by "Hairong Kuang (JIRA)" <ji...@apache.org> on 2010/07/29 22:15:16 UTC

[jira] Created: (HADOOP-6889) Make RPC to have an option to timeout

Make RPC to have an option to timeout
-------------------------------------

Key: HADOOP-6889
URL: https://issues.apache.org/jira/browse/HADOOP-6889
Project: Hadoop Common
Issue Type: New Feature
Components: ipc
Affects Versions: 0.22.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
Fix For: 0.22.0, 0.20-append

Currently Hadoop RPC does not timeout when the RPC server is alive. What it currently does is that a RPC client sends a ping to the server whenever a socket timeout happens. If the server is still alive, it continues to wait instead of throwing a SocketTimeoutException. This is to avoid a client to retry when a server is busy and thus making the server even busier. This works great if the RPC server is NameNode.

But Hadoop RPC is also used for some of client to DataNode communications, for example, for getting a replica's length. When a client comes across a problematic DataNode, it gets stuck and can not switch to a different DataNode. In this case, it would be better that the client receives a timeout exception.

I plan to add a new configuration ipc.client.max.pings that specifies the max number of pings that a client could try. If a response can not be received after the specified max number of pings, a SocketTimeoutException is thrown. If this configuration property is not set, a client maintains the current semantics, waiting forever.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6889) Make RPC to have an option to timeout

Posted by "Kan Zhang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896754#action_12896754 ] 

Kan Zhang commented on HADOOP-6889:
-----------------------------------

If we can confirm Java always uses the same object for the same integer, your code is correct. Otherwise, I'm fine with using rpcTimeout as hashcode. I've seen recommendations on computing hashcode iteratively as follows.
{code}
hash = PRIME * hash + component_value
{code}
Seems we are doing ^ instead of +. Does it matter?

> Make RPC to have an option to timeout
> -------------------------------------
>
>                 Key: HADOOP-6889
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6889
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: ipc
>    Affects Versions: 0.22.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.22.0, 0.20-append
>
>         Attachments: ipcTimeout.patch, ipcTimeout1.patch, ipcTimeout2.patch
>
>
> Currently Hadoop RPC does not timeout when the RPC server is alive. What it currently does is that a RPC client sends a ping to the server whenever a socket timeout happens. If the server is still alive, it continues to wait instead of throwing a SocketTimeoutException. This is to avoid a client to retry when a server is busy and thus making the server even busier. This works great if the RPC server is NameNode.
> But Hadoop RPC is also used for some of client to DataNode communications, for example, for getting a replica's length. When a client comes across a problematic DataNode, it gets stuck and can not switch to a different DataNode. In this case, it would be better that the client receives a timeout exception.
> I plan to add a new configuration ipc.client.max.pings that specifies the max number of pings that a client could try. If a response can not be received after the specified max number of pings, a SocketTimeoutException is thrown. If this configuration property is not set, a client maintains the current semantics, waiting forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6889) Make RPC to have an option to timeout

Posted by "Kan Zhang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896738#action_12896738 ] 

Kan Zhang commented on HADOOP-6889:
-----------------------------------

> I think you can simply use rpcTimeout itself as the hashcode.
If that's what you intended. I haven't considered its randomness.

> Make RPC to have an option to timeout
> -------------------------------------
>
>                 Key: HADOOP-6889
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6889
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: ipc
>    Affects Versions: 0.22.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.22.0, 0.20-append
>
>         Attachments: ipcTimeout.patch, ipcTimeout1.patch, ipcTimeout2.patch
>
>
> Currently Hadoop RPC does not timeout when the RPC server is alive. What it currently does is that a RPC client sends a ping to the server whenever a socket timeout happens. If the server is still alive, it continues to wait instead of throwing a SocketTimeoutException. This is to avoid a client to retry when a server is busy and thus making the server even busier. This works great if the RPC server is NameNode.
> But Hadoop RPC is also used for some of client to DataNode communications, for example, for getting a replica's length. When a client comes across a problematic DataNode, it gets stuck and can not switch to a different DataNode. In this case, it would be better that the client receives a timeout exception.
> I plan to add a new configuration ipc.client.max.pings that specifies the max number of pings that a client could try. If a response can not be received after the specified max number of pings, a SocketTimeoutException is thrown. If this configuration property is not set, a client maintains the current semantics, waiting forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6889) Make RPC to have an option to timeout

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-6889:
----------------------------------

    Status: Open  (was: Patch Available)

> Make RPC to have an option to timeout
> -------------------------------------
>
>                 Key: HADOOP-6889
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6889
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: ipc
>    Affects Versions: 0.22.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.22.0, 0.20-append
>
>         Attachments: ipcTimeout.patch, ipcTimeout1.patch, ipcTimeout2.patch
>
>
> Currently Hadoop RPC does not timeout when the RPC server is alive. What it currently does is that a RPC client sends a ping to the server whenever a socket timeout happens. If the server is still alive, it continues to wait instead of throwing a SocketTimeoutException. This is to avoid a client to retry when a server is busy and thus making the server even busier. This works great if the RPC server is NameNode.
> But Hadoop RPC is also used for some of client to DataNode communications, for example, for getting a replica's length. When a client comes across a problematic DataNode, it gets stuck and can not switch to a different DataNode. In this case, it would be better that the client receives a timeout exception.
> I plan to add a new configuration ipc.client.max.pings that specifies the max number of pings that a client could try. If a response can not be received after the specified max number of pings, a SocketTimeoutException is thrown. If this configuration property is not set, a client maintains the current semantics, waiting forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6889) Make RPC to have an option to timeout

Posted by "Kan Zhang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896740#action_12896740 ] 

Kan Zhang commented on HADOOP-6889:
-----------------------------------

PS. Maybe Java optimizes on reusing the same object for the same integer. I don't know if this is something we can depend on.

> Make RPC to have an option to timeout
> -------------------------------------
>
>                 Key: HADOOP-6889
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6889
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: ipc
>    Affects Versions: 0.22.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.22.0, 0.20-append
>
>         Attachments: ipcTimeout.patch, ipcTimeout1.patch, ipcTimeout2.patch
>
>
> Currently Hadoop RPC does not timeout when the RPC server is alive. What it currently does is that a RPC client sends a ping to the server whenever a socket timeout happens. If the server is still alive, it continues to wait instead of throwing a SocketTimeoutException. This is to avoid a client to retry when a server is busy and thus making the server even busier. This works great if the RPC server is NameNode.
> But Hadoop RPC is also used for some of client to DataNode communications, for example, for getting a replica's length. When a client comes across a problematic DataNode, it gets stuck and can not switch to a different DataNode. In this case, it would be better that the client receives a timeout exception.
> I plan to add a new configuration ipc.client.max.pings that specifies the max number of pings that a client could try. If a response can not be received after the specified max number of pings, a SocketTimeoutException is thrown. If this configuration property is not set, a client maintains the current semantics, waiting forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6889) Make RPC to have an option to timeout

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-6889:
----------------------------------

    Status: Open  (was: Patch Available)

> Make RPC to have an option to timeout
> -------------------------------------
>
>                 Key: HADOOP-6889
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6889
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: ipc
>    Affects Versions: 0.22.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.22.0, 0.20-append
>
>         Attachments: ipcTimeout.patch, ipcTimeout1.patch, ipcTimeout2.patch
>
>
> Currently Hadoop RPC does not timeout when the RPC server is alive. What it currently does is that a RPC client sends a ping to the server whenever a socket timeout happens. If the server is still alive, it continues to wait instead of throwing a SocketTimeoutException. This is to avoid a client to retry when a server is busy and thus making the server even busier. This works great if the RPC server is NameNode.
> But Hadoop RPC is also used for some of client to DataNode communications, for example, for getting a replica's length. When a client comes across a problematic DataNode, it gets stuck and can not switch to a different DataNode. In this case, it would be better that the client receives a timeout exception.
> I plan to add a new configuration ipc.client.max.pings that specifies the max number of pings that a client could try. If a response can not be received after the specified max number of pings, a SocketTimeoutException is thrown. If this configuration property is not set, a client maintains the current semantics, waiting forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6889) Make RPC to have an option to timeout

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895724#action_12895724 ] 

Hairong Kuang commented on HADOOP-6889:
---------------------------------------

Hudson is down. I ran ant clean test on my linux box twice. They all passed:

BUILD SUCCESSFUL
Total time: 11 minutes 8 seconds

BUILD SUCCESSFUL
Total time: 9 minutes 55 seconds

> Make RPC to have an option to timeout
> -------------------------------------
>
>                 Key: HADOOP-6889
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6889
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: ipc
>    Affects Versions: 0.22.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.22.0, 0.20-append
>
>         Attachments: ipcTimeout.patch, ipcTimeout1.patch, ipcTimeout2.patch
>
>
> Currently Hadoop RPC does not timeout when the RPC server is alive. What it currently does is that a RPC client sends a ping to the server whenever a socket timeout happens. If the server is still alive, it continues to wait instead of throwing a SocketTimeoutException. This is to avoid a client to retry when a server is busy and thus making the server even busier. This works great if the RPC server is NameNode.
> But Hadoop RPC is also used for some of client to DataNode communications, for example, for getting a replica's length. When a client comes across a problematic DataNode, it gets stuck and can not switch to a different DataNode. In this case, it would be better that the client receives a timeout exception.
> I plan to add a new configuration ipc.client.max.pings that specifies the max number of pings that a client could try. If a response can not be received after the specified max number of pings, a SocketTimeoutException is thrown. If this configuration property is not set, a client maintains the current semantics, waiting forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6889) Make RPC to have an option to timeout

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-6889:
----------------------------------

          Status: Patch Available  (was: Open)
    Hadoop Flags: [Reviewed]

> Make RPC to have an option to timeout
> -------------------------------------
>
>                 Key: HADOOP-6889
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6889
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: ipc
>    Affects Versions: 0.22.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.22.0, 0.20-append
>
>         Attachments: ipcTimeout.patch, ipcTimeout1.patch
>
>
> Currently Hadoop RPC does not timeout when the RPC server is alive. What it currently does is that a RPC client sends a ping to the server whenever a socket timeout happens. If the server is still alive, it continues to wait instead of throwing a SocketTimeoutException. This is to avoid a client to retry when a server is busy and thus making the server even busier. This works great if the RPC server is NameNode.
> But Hadoop RPC is also used for some of client to DataNode communications, for example, for getting a replica's length. When a client comes across a problematic DataNode, it gets stuck and can not switch to a different DataNode. In this case, it would be better that the client receives a timeout exception.
> I plan to add a new configuration ipc.client.max.pings that specifies the max number of pings that a client could try. If a response can not be received after the specified max number of pings, a SocketTimeoutException is thrown. If this configuration property is not set, a client maintains the current semantics, waiting forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6889) Make RPC to have an option to timeout

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-6889:
----------------------------------

    Status: Patch Available  (was: Open)

> Make RPC to have an option to timeout
> -------------------------------------
>
>                 Key: HADOOP-6889
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6889
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: ipc
>    Affects Versions: 0.22.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.22.0, 0.20-append
>
>         Attachments: ipcTimeout.patch, ipcTimeout1.patch, ipcTimeout2.patch
>
>
> Currently Hadoop RPC does not timeout when the RPC server is alive. What it currently does is that a RPC client sends a ping to the server whenever a socket timeout happens. If the server is still alive, it continues to wait instead of throwing a SocketTimeoutException. This is to avoid a client to retry when a server is busy and thus making the server even busier. This works great if the RPC server is NameNode.
> But Hadoop RPC is also used for some of client to DataNode communications, for example, for getting a replica's length. When a client comes across a problematic DataNode, it gets stuck and can not switch to a different DataNode. In this case, it would be better that the client receives a timeout exception.
> I plan to add a new configuration ipc.client.max.pings that specifies the max number of pings that a client could try. If a response can not be received after the specified max number of pings, a SocketTimeoutException is thrown. If this configuration property is not set, a client maintains the current semantics, waiting forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6889) Make RPC to have an option to timeout

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-6889:
----------------------------------

        Status: Resolved  (was: Patch Available)
    Resolution: Fixed

I've just committed this!

> Make RPC to have an option to timeout
> -------------------------------------
>
>                 Key: HADOOP-6889
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6889
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: ipc
>    Affects Versions: 0.22.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.22.0, 0.20-append
>
>         Attachments: ipcTimeout.patch, ipcTimeout1.patch, ipcTimeout2.patch
>
>
> Currently Hadoop RPC does not timeout when the RPC server is alive. What it currently does is that a RPC client sends a ping to the server whenever a socket timeout happens. If the server is still alive, it continues to wait instead of throwing a SocketTimeoutException. This is to avoid a client to retry when a server is busy and thus making the server even busier. This works great if the RPC server is NameNode.
> But Hadoop RPC is also used for some of client to DataNode communications, for example, for getting a replica's length. When a client comes across a problematic DataNode, it gets stuck and can not switch to a different DataNode. In this case, it would be better that the client receives a timeout exception.
> I plan to add a new configuration ipc.client.max.pings that specifies the max number of pings that a client could try. If a response can not be received after the specified max number of pings, a SocketTimeoutException is thrown. If this configuration property is not set, a client maintains the current semantics, waiting forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6889) Make RPC to have an option to timeout

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12893863#action_12893863 ] 

Todd Lipcon commented on HADOOP-6889:
-------------------------------------

Hi Hairong. I agree that it should be per-proxy, and also that this is useful. We have a customer who has occasionally run into this with problematic "stuck" datanodes blocking pipeline recovery indefinitely (at least until ops notices the hung machine and powers it down, causing TCP connection to drop).

> Make RPC to have an option to timeout
> -------------------------------------
>
>                 Key: HADOOP-6889
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6889
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: ipc
>    Affects Versions: 0.22.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.22.0, 0.20-append
>
>
> Currently Hadoop RPC does not timeout when the RPC server is alive. What it currently does is that a RPC client sends a ping to the server whenever a socket timeout happens. If the server is still alive, it continues to wait instead of throwing a SocketTimeoutException. This is to avoid a client to retry when a server is busy and thus making the server even busier. This works great if the RPC server is NameNode.
> But Hadoop RPC is also used for some of client to DataNode communications, for example, for getting a replica's length. When a client comes across a problematic DataNode, it gets stuck and can not switch to a different DataNode. In this case, it would be better that the client receives a timeout exception.
> I plan to add a new configuration ipc.client.max.pings that specifies the max number of pings that a client could try. If a response can not be received after the specified max number of pings, a SocketTimeoutException is thrown. If this configuration property is not set, a client maintains the current semantics, waiting forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6889) Make RPC to have an option to timeout

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895403#action_12895403 ] 

Hadoop QA commented on HADOOP-6889:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12451180/ipcTimeout1.patch
  against trunk revision 981714.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated 1 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/662/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/662/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/662/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/662/console

This message is automatically generated.

> Make RPC to have an option to timeout
> -------------------------------------
>
>                 Key: HADOOP-6889
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6889
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: ipc
>    Affects Versions: 0.22.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.22.0, 0.20-append
>
>         Attachments: ipcTimeout.patch, ipcTimeout1.patch
>
>
> Currently Hadoop RPC does not timeout when the RPC server is alive. What it currently does is that a RPC client sends a ping to the server whenever a socket timeout happens. If the server is still alive, it continues to wait instead of throwing a SocketTimeoutException. This is to avoid a client to retry when a server is busy and thus making the server even busier. This works great if the RPC server is NameNode.
> But Hadoop RPC is also used for some of client to DataNode communications, for example, for getting a replica's length. When a client comes across a problematic DataNode, it gets stuck and can not switch to a different DataNode. In this case, it would be better that the client receives a timeout exception.
> I plan to add a new configuration ipc.client.max.pings that specifies the max number of pings that a client could try. If a response can not be received after the specified max number of pings, a SocketTimeoutException is thrown. If this configuration property is not set, a client maintains the current semantics, waiting forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6889) Make RPC to have an option to timeout

Posted by "Kan Zhang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895827#action_12895827 ] 

Kan Zhang commented on HADOOP-6889:
-----------------------------------

I see.

> Make RPC to have an option to timeout
> -------------------------------------
>
>                 Key: HADOOP-6889
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6889
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: ipc
>    Affects Versions: 0.22.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.22.0, 0.20-append
>
>         Attachments: ipcTimeout.patch, ipcTimeout1.patch, ipcTimeout2.patch
>
>
> Currently Hadoop RPC does not timeout when the RPC server is alive. What it currently does is that a RPC client sends a ping to the server whenever a socket timeout happens. If the server is still alive, it continues to wait instead of throwing a SocketTimeoutException. This is to avoid a client to retry when a server is busy and thus making the server even busier. This works great if the RPC server is NameNode.
> But Hadoop RPC is also used for some of client to DataNode communications, for example, for getting a replica's length. When a client comes across a problematic DataNode, it gets stuck and can not switch to a different DataNode. In this case, it would be better that the client receives a timeout exception.
> I plan to add a new configuration ipc.client.max.pings that specifies the max number of pings that a client could try. If a response can not be received after the specified max number of pings, a SocketTimeoutException is thrown. If this configuration property is not set, a client maintains the current semantics, waiting forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6889) Make RPC to have an option to timeout

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-6889:
----------------------------------

    Attachment: ipcTimeout2.patch

This fixed the failed tests.

> Make RPC to have an option to timeout
> -------------------------------------
>
>                 Key: HADOOP-6889
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6889
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: ipc
>    Affects Versions: 0.22.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.22.0, 0.20-append
>
>         Attachments: ipcTimeout.patch, ipcTimeout1.patch, ipcTimeout2.patch
>
>
> Currently Hadoop RPC does not timeout when the RPC server is alive. What it currently does is that a RPC client sends a ping to the server whenever a socket timeout happens. If the server is still alive, it continues to wait instead of throwing a SocketTimeoutException. This is to avoid a client to retry when a server is busy and thus making the server even busier. This works great if the RPC server is NameNode.
> But Hadoop RPC is also used for some of client to DataNode communications, for example, for getting a replica's length. When a client comes across a problematic DataNode, it gets stuck and can not switch to a different DataNode. In this case, it would be better that the client receives a timeout exception.
> I plan to add a new configuration ipc.client.max.pings that specifies the max number of pings that a client could try. If a response can not be received after the specified max number of pings, a SocketTimeoutException is thrown. If this configuration property is not set, a client maintains the current semantics, waiting forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6889) Make RPC to have an option to timeout

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895368#action_12895368 ] 

Doug Cutting commented on HADOOP-6889:
--------------------------------------

+1 this patch looks reasonable to me, pending Hudson's scrutiny.

> Make RPC to have an option to timeout
> -------------------------------------
>
>                 Key: HADOOP-6889
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6889
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: ipc
>    Affects Versions: 0.22.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.22.0, 0.20-append
>
>         Attachments: ipcTimeout.patch, ipcTimeout1.patch
>
>
> Currently Hadoop RPC does not timeout when the RPC server is alive. What it currently does is that a RPC client sends a ping to the server whenever a socket timeout happens. If the server is still alive, it continues to wait instead of throwing a SocketTimeoutException. This is to avoid a client to retry when a server is busy and thus making the server even busier. This works great if the RPC server is NameNode.
> But Hadoop RPC is also used for some of client to DataNode communications, for example, for getting a replica's length. When a client comes across a problematic DataNode, it gets stuck and can not switch to a different DataNode. In this case, it would be better that the client receives a timeout exception.
> I plan to add a new configuration ipc.client.max.pings that specifies the max number of pings that a client could try. If a response can not be received after the specified max number of pings, a SocketTimeoutException is thrown. If this configuration property is not set, a client maintains the current semantics, waiting forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6889) Make RPC to have an option to timeout

Posted by "Kan Zhang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895814#action_12895814 ] 

Kan Zhang commented on HADOOP-6889:
-----------------------------------

I know I'm a little late to the party. But I'm just wondering if it's better to pass rcpTimeout using the existing conf param rather than adding a new parameter to getProxy()?

> Make RPC to have an option to timeout
> -------------------------------------
>
>                 Key: HADOOP-6889
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6889
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: ipc
>    Affects Versions: 0.22.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.22.0, 0.20-append
>
>         Attachments: ipcTimeout.patch, ipcTimeout1.patch, ipcTimeout2.patch
>
>
> Currently Hadoop RPC does not timeout when the RPC server is alive. What it currently does is that a RPC client sends a ping to the server whenever a socket timeout happens. If the server is still alive, it continues to wait instead of throwing a SocketTimeoutException. This is to avoid a client to retry when a server is busy and thus making the server even busier. This works great if the RPC server is NameNode.
> But Hadoop RPC is also used for some of client to DataNode communications, for example, for getting a replica's length. When a client comes across a problematic DataNode, it gets stuck and can not switch to a different DataNode. In this case, it would be better that the client receives a timeout exception.
> I plan to add a new configuration ipc.client.max.pings that specifies the max number of pings that a client could try. If a response can not be received after the specified max number of pings, a SocketTimeoutException is thrown. If this configuration property is not set, a client maintains the current semantics, waiting forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6889) Make RPC to have an option to timeout

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896951#action_12896951 ] 

Hairong Kuang commented on HADOOP-6889:
---------------------------------------

It looks that Java returns the same value for every call of System.indentifyHashcode(x) as long as int x remains the same. But I feel more comfortable using x as hashcode directly. 

Either ^ or + should be fine. But I think we should consistently use either ^ or +.

> Make RPC to have an option to timeout
> -------------------------------------
>
>                 Key: HADOOP-6889
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6889
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: ipc
>    Affects Versions: 0.22.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.22.0, 0.20-append
>
>         Attachments: ipcTimeout.patch, ipcTimeout1.patch, ipcTimeout2.patch
>
>
> Currently Hadoop RPC does not timeout when the RPC server is alive. What it currently does is that a RPC client sends a ping to the server whenever a socket timeout happens. If the server is still alive, it continues to wait instead of throwing a SocketTimeoutException. This is to avoid a client to retry when a server is busy and thus making the server even busier. This works great if the RPC server is NameNode.
> But Hadoop RPC is also used for some of client to DataNode communications, for example, for getting a replica's length. When a client comes across a problematic DataNode, it gets stuck and can not switch to a different DataNode. In this case, it would be better that the client receives a timeout exception.
> I plan to add a new configuration ipc.client.max.pings that specifies the max number of pings that a client could try. If a response can not be received after the specified max number of pings, a SocketTimeoutException is thrown. If this configuration property is not set, a client maintains the current semantics, waiting forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6889) Make RPC to have an option to timeout

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-6889:
----------------------------------

    Status: Patch Available  (was: Open)

Resubmit since the previous one seemed to stuck.

> Make RPC to have an option to timeout
> -------------------------------------
>
>                 Key: HADOOP-6889
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6889
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: ipc
>    Affects Versions: 0.22.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.22.0, 0.20-append
>
>         Attachments: ipcTimeout.patch, ipcTimeout1.patch, ipcTimeout2.patch
>
>
> Currently Hadoop RPC does not timeout when the RPC server is alive. What it currently does is that a RPC client sends a ping to the server whenever a socket timeout happens. If the server is still alive, it continues to wait instead of throwing a SocketTimeoutException. This is to avoid a client to retry when a server is busy and thus making the server even busier. This works great if the RPC server is NameNode.
> But Hadoop RPC is also used for some of client to DataNode communications, for example, for getting a replica's length. When a client comes across a problematic DataNode, it gets stuck and can not switch to a different DataNode. In this case, it would be better that the client receives a timeout exception.
> I plan to add a new configuration ipc.client.max.pings that specifies the max number of pings that a client could try. If a response can not be received after the specified max number of pings, a SocketTimeoutException is thrown. If this configuration property is not set, a client maintains the current semantics, waiting forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6889) Make RPC to have an option to timeout

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12893833#action_12893833 ] 

Hairong Kuang commented on HADOOP-6889:
---------------------------------------

On a second thought, it would be nice if RPC proxies could configure differently. For example, one may want to have timeout on RPCs to DataNodes but no timeout on RPCs to NameNode. So I propose to add an additional parameter, maxRpcWaitingTime, to RPC#getProxy, meaning for every RPC call to this proxy, SocketTimeout is thrown if a client has not received a response in maxRpcWaitingTime. If maxRpcWaitingTime is zero, a RPC call will not timeout.

In this way, a client could configure maxRpcWaitingTime differently for its communicatations to NameNode and DataNodes. 



> Make RPC to have an option to timeout
> -------------------------------------
>
>                 Key: HADOOP-6889
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6889
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: ipc
>    Affects Versions: 0.22.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.22.0, 0.20-append
>
>
> Currently Hadoop RPC does not timeout when the RPC server is alive. What it currently does is that a RPC client sends a ping to the server whenever a socket timeout happens. If the server is still alive, it continues to wait instead of throwing a SocketTimeoutException. This is to avoid a client to retry when a server is busy and thus making the server even busier. This works great if the RPC server is NameNode.
> But Hadoop RPC is also used for some of client to DataNode communications, for example, for getting a replica's length. When a client comes across a problematic DataNode, it gets stuck and can not switch to a different DataNode. In this case, it would be better that the client receives a timeout exception.
> I plan to add a new configuration ipc.client.max.pings that specifies the max number of pings that a client could try. If a response can not be received after the specified max number of pings, a SocketTimeoutException is thrown. If this configuration property is not set, a client maintains the current semantics, waiting forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6889) Make RPC to have an option to timeout

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895823#action_12895823 ] 

Hairong Kuang commented on HADOOP-6889:
---------------------------------------

Kan, thanks for taking a look at the patch!

I thought about introducing a configuration parameter. But clients or DataNodes want to have timeout for RPCs to DataNodes but no timeout for RPCs to NameNodes. Adding a rpcTimeout parameter makes this easy.

> Make RPC to have an option to timeout
> -------------------------------------
>
>                 Key: HADOOP-6889
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6889
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: ipc
>    Affects Versions: 0.22.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.22.0, 0.20-append
>
>         Attachments: ipcTimeout.patch, ipcTimeout1.patch, ipcTimeout2.patch
>
>
> Currently Hadoop RPC does not timeout when the RPC server is alive. What it currently does is that a RPC client sends a ping to the server whenever a socket timeout happens. If the server is still alive, it continues to wait instead of throwing a SocketTimeoutException. This is to avoid a client to retry when a server is busy and thus making the server even busier. This works great if the RPC server is NameNode.
> But Hadoop RPC is also used for some of client to DataNode communications, for example, for getting a replica's length. When a client comes across a problematic DataNode, it gets stuck and can not switch to a different DataNode. In this case, it would be better that the client receives a timeout exception.
> I plan to add a new configuration ipc.client.max.pings that specifies the max number of pings that a client could try. If a response can not be received after the specified max number of pings, a SocketTimeoutException is thrown. If this configuration property is not set, a client maintains the current semantics, waiting forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6889) Make RPC to have an option to timeout

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12893887#action_12893887 ] 

Doug Cutting commented on HADOOP-6889:
--------------------------------------

As I recall, RPC calls used to have a timeout that, combined with the retry proxy, would spiral load on servers.  But I think timeouts when retries go to a different server should not have the same problem.

> Make RPC to have an option to timeout
> -------------------------------------
>
>                 Key: HADOOP-6889
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6889
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: ipc
>    Affects Versions: 0.22.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.22.0, 0.20-append
>
>
> Currently Hadoop RPC does not timeout when the RPC server is alive. What it currently does is that a RPC client sends a ping to the server whenever a socket timeout happens. If the server is still alive, it continues to wait instead of throwing a SocketTimeoutException. This is to avoid a client to retry when a server is busy and thus making the server even busier. This works great if the RPC server is NameNode.
> But Hadoop RPC is also used for some of client to DataNode communications, for example, for getting a replica's length. When a client comes across a problematic DataNode, it gets stuck and can not switch to a different DataNode. In this case, it would be better that the client receives a timeout exception.
> I plan to add a new configuration ipc.client.max.pings that specifies the max number of pings that a client could try. If a response can not be received after the specified max number of pings, a SocketTimeoutException is thrown. If this configuration property is not set, a client maintains the current semantics, waiting forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6889) Make RPC to have an option to timeout

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896750#action_12896750 ] 

Hairong Kuang commented on HADOOP-6889:
---------------------------------------

Kan, thanks for your insight. It seems that you are right, identifyHashcode is based on references.

Do you mind if we simply use rpcTime as the hashcode?

> Make RPC to have an option to timeout
> -------------------------------------
>
>                 Key: HADOOP-6889
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6889
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: ipc
>    Affects Versions: 0.22.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.22.0, 0.20-append
>
>         Attachments: ipcTimeout.patch, ipcTimeout1.patch, ipcTimeout2.patch
>
>
> Currently Hadoop RPC does not timeout when the RPC server is alive. What it currently does is that a RPC client sends a ping to the server whenever a socket timeout happens. If the server is still alive, it continues to wait instead of throwing a SocketTimeoutException. This is to avoid a client to retry when a server is busy and thus making the server even busier. This works great if the RPC server is NameNode.
> But Hadoop RPC is also used for some of client to DataNode communications, for example, for getting a replica's length. When a client comes across a problematic DataNode, it gets stuck and can not switch to a different DataNode. In this case, it would be better that the client receives a timeout exception.
> I plan to add a new configuration ipc.client.max.pings that specifies the max number of pings that a client could try. If a response can not be received after the specified max number of pings, a SocketTimeoutException is thrown. If this configuration property is not set, a client maintains the current semantics, waiting forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6889) Make RPC to have an option to timeout

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-6889:
----------------------------------

    Attachment: ipcTimeout.patch

This patch passes a rpcTimout parameter to RPC#getProxy method. A a non-positive rpcTimeout means that RPC does not timeout as the default behavior. If rpcTimeout is positive, a RPC client throws SocketTimeoutException if the client has not received a response in rpcTimeout period.

> Make RPC to have an option to timeout
> -------------------------------------
>
>                 Key: HADOOP-6889
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6889
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: ipc
>    Affects Versions: 0.22.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.22.0, 0.20-append
>
>         Attachments: ipcTimeout.patch
>
>
> Currently Hadoop RPC does not timeout when the RPC server is alive. What it currently does is that a RPC client sends a ping to the server whenever a socket timeout happens. If the server is still alive, it continues to wait instead of throwing a SocketTimeoutException. This is to avoid a client to retry when a server is busy and thus making the server even busier. This works great if the RPC server is NameNode.
> But Hadoop RPC is also used for some of client to DataNode communications, for example, for getting a replica's length. When a client comes across a problematic DataNode, it gets stuck and can not switch to a different DataNode. In this case, it would be better that the client receives a timeout exception.
> I plan to add a new configuration ipc.client.max.pings that specifies the max number of pings that a client could try. If a response can not be received after the specified max number of pings, a SocketTimeoutException is thrown. If this configuration property is not set, a client maintains the current semantics, waiting forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6889) Make RPC to have an option to timeout

Posted by "Kan Zhang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896735#action_12896735 ] 

Kan Zhang commented on HADOOP-6889:
-----------------------------------

{code}
-    @Override
+    @Override  // simply use the default Object#hashcode() ?
     public int hashCode() {
-      return (address.hashCode() + PRIME * System.identityHashCode(protocol)) ^ 
-             (ticket == null ? 0 : ticket.hashCode());
+      return (address.hashCode() + PRIME * (
+                PRIME * (
+                  PRIME * System.identityHashCode(protocol) ^
+                  System.identityHashCode(ticket)
+                ) ^ System.identityHashCode(rpcTimeout)
+              ));
{code}

I wonder if System.identityHashCode(rpcTimeout) is correct in the above code. I think Java does an autoboxing to get an Integer object and what identityHashCode() returns is the address of this object (it doesn't call the object's hashcode() method). So even if two ConnectionId objects have the same rpcTimeout you will get two different hashcodes. I think you can simply use rpcTimeout itself as the hashcode.

> Make RPC to have an option to timeout
> -------------------------------------
>
>                 Key: HADOOP-6889
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6889
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: ipc
>    Affects Versions: 0.22.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.22.0, 0.20-append
>
>         Attachments: ipcTimeout.patch, ipcTimeout1.patch, ipcTimeout2.patch
>
>
> Currently Hadoop RPC does not timeout when the RPC server is alive. What it currently does is that a RPC client sends a ping to the server whenever a socket timeout happens. If the server is still alive, it continues to wait instead of throwing a SocketTimeoutException. This is to avoid a client to retry when a server is busy and thus making the server even busier. This works great if the RPC server is NameNode.
> But Hadoop RPC is also used for some of client to DataNode communications, for example, for getting a replica's length. When a client comes across a problematic DataNode, it gets stuck and can not switch to a different DataNode. In this case, it would be better that the client receives a timeout exception.
> I plan to add a new configuration ipc.client.max.pings that specifies the max number of pings that a client could try. If a response can not be received after the specified max number of pings, a SocketTimeoutException is thrown. If this configuration property is not set, a client maintains the current semantics, waiting forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6889) Make RPC to have an option to timeout

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-6889:
----------------------------------

    Attachment: ipcTimeout1.patch

ipcTimeout1.patch additionally makes waitForProxy has rpcTimeout as a parameter.

> Make RPC to have an option to timeout
> -------------------------------------
>
>                 Key: HADOOP-6889
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6889
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: ipc
>    Affects Versions: 0.22.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.22.0, 0.20-append
>
>         Attachments: ipcTimeout.patch, ipcTimeout1.patch
>
>
> Currently Hadoop RPC does not timeout when the RPC server is alive. What it currently does is that a RPC client sends a ping to the server whenever a socket timeout happens. If the server is still alive, it continues to wait instead of throwing a SocketTimeoutException. This is to avoid a client to retry when a server is busy and thus making the server even busier. This works great if the RPC server is NameNode.
> But Hadoop RPC is also used for some of client to DataNode communications, for example, for getting a replica's length. When a client comes across a problematic DataNode, it gets stuck and can not switch to a different DataNode. In this case, it would be better that the client receives a timeout exception.
> I plan to add a new configuration ipc.client.max.pings that specifies the max number of pings that a client could try. If a response can not be received after the specified max number of pings, a SocketTimeoutException is thrown. If this configuration property is not set, a client maintains the current semantics, waiting forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6889) Make RPC to have an option to timeout

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-6889:
----------------------------------

    Attachment: ipcTimeout2.patch

> Make RPC to have an option to timeout
> -------------------------------------
>
>                 Key: HADOOP-6889
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6889
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: ipc
>    Affects Versions: 0.22.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.22.0, 0.20-append
>
>         Attachments: ipcTimeout.patch, ipcTimeout1.patch, ipcTimeout2.patch
>
>
> Currently Hadoop RPC does not timeout when the RPC server is alive. What it currently does is that a RPC client sends a ping to the server whenever a socket timeout happens. If the server is still alive, it continues to wait instead of throwing a SocketTimeoutException. This is to avoid a client to retry when a server is busy and thus making the server even busier. This works great if the RPC server is NameNode.
> But Hadoop RPC is also used for some of client to DataNode communications, for example, for getting a replica's length. When a client comes across a problematic DataNode, it gets stuck and can not switch to a different DataNode. In this case, it would be better that the client receives a timeout exception.
> I plan to add a new configuration ipc.client.max.pings that specifies the max number of pings that a client could try. If a response can not be received after the specified max number of pings, a SocketTimeoutException is thrown. If this configuration property is not set, a client maintains the current semantics, waiting forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6889) Make RPC to have an option to timeout

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-6889:
----------------------------------

    Attachment:     (was: ipcTimeout2.patch)

> Make RPC to have an option to timeout
> -------------------------------------
>
>                 Key: HADOOP-6889
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6889
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: ipc
>    Affects Versions: 0.22.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.22.0, 0.20-append
>
>         Attachments: ipcTimeout.patch, ipcTimeout1.patch
>
>
> Currently Hadoop RPC does not timeout when the RPC server is alive. What it currently does is that a RPC client sends a ping to the server whenever a socket timeout happens. If the server is still alive, it continues to wait instead of throwing a SocketTimeoutException. This is to avoid a client to retry when a server is busy and thus making the server even busier. This works great if the RPC server is NameNode.
> But Hadoop RPC is also used for some of client to DataNode communications, for example, for getting a replica's length. When a client comes across a problematic DataNode, it gets stuck and can not switch to a different DataNode. In this case, it would be better that the client receives a timeout exception.
> I plan to add a new configuration ipc.client.max.pings that specifies the max number of pings that a client could try. If a response can not be received after the specified max number of pings, a SocketTimeoutException is thrown. If this configuration property is not set, a client maintains the current semantics, waiting forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6889) Make RPC to have an option to timeout

Posted by "Kan Zhang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899171#action_12899171 ] 

Kan Zhang commented on HADOOP-6889:
-----------------------------------

This patch's hashcode() implementation also changed the UGI semantics. It used to be that two UGI's having the same subject object are equal. Now they are not equal. I hope to fix both issues in HADOOP-6907. 

> Make RPC to have an option to timeout
> -------------------------------------
>
>                 Key: HADOOP-6889
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6889
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: ipc
>    Affects Versions: 0.22.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.22.0, 0.20-append
>
>         Attachments: ipcTimeout.patch, ipcTimeout1.patch, ipcTimeout2.patch
>
>
> Currently Hadoop RPC does not timeout when the RPC server is alive. What it currently does is that a RPC client sends a ping to the server whenever a socket timeout happens. If the server is still alive, it continues to wait instead of throwing a SocketTimeoutException. This is to avoid a client to retry when a server is busy and thus making the server even busier. This works great if the RPC server is NameNode.
> But Hadoop RPC is also used for some of client to DataNode communications, for example, for getting a replica's length. When a client comes across a problematic DataNode, it gets stuck and can not switch to a different DataNode. In this case, it would be better that the client receives a timeout exception.
> I plan to add a new configuration ipc.client.max.pings that specifies the max number of pings that a client could try. If a response can not be received after the specified max number of pings, a SocketTimeoutException is thrown. If this configuration property is not set, a client maintains the current semantics, waiting forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.