You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "Andrew Purtell (Created) (JIRA)" <ji...@apache.org> on 2012/03/08 21:53:57 UTC

[jira] [Created] (HBASE-5543) Add a keepalive option for IPC connections

Add a keepalive option for IPC connections
------------------------------------------

                 Key: HBASE-5543
                 URL: https://issues.apache.org/jira/browse/HBASE-5543
             Project: HBase
          Issue Type: Improvement
          Components: client, coprocessors, ipc
            Reporter: Andrew Purtell


On the user list someone wrote in with a connection failure due to a long running coprocessor:
{quote}
On Wed, Mar 7, 2012 at 10:59 PM, raghavendhra rahul wrote:
2012-03-08 12:03:09,475 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call execCoprocessor([B@50cb21, getProjection(), rpc version=1, client version=0, methodsFingerPrint=0), rpc version=1, client version=29, methodsFingerPrint=54742778 from 10.184.17.26:46472: output error
2012-03-08 12:03:09,476 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 7 on 60020 caught: java.nio.channels.ClosedChannelException
{quote}

I suggested in response we might consider give our RPC a keepalive option for calls that may run for a long time (like execCoprocessor).

LarsH +1ed the idea:
{quote}
+1 on "keepalive". It's a shame (especially for long running server code) to do all the work, just to find out at the end that the client has given up.

Or maybe there should be a way to cancel an operation if the clients decides it does not want to wait any longer (PostgreSQL does that for example). Here that would mean the server would need to check periodically and coprocessors would need to be written to support that - so maybe that's no-starter.
{quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5543) Add a keepalive option for IPC connections

Posted by "Todd Lipcon (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225573#comment-13225573 ] 

Todd Lipcon commented on HBASE-5543:
------------------------------------

We have "ipc.client.ping", right? What we want is sort of the opposite direction of that?
                
> Add a keepalive option for IPC connections
> ------------------------------------------
>
>                 Key: HBASE-5543
>                 URL: https://issues.apache.org/jira/browse/HBASE-5543
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, coprocessors, ipc
>            Reporter: Andrew Purtell
>
> On the user list someone wrote in with a connection failure due to a long running coprocessor:
> {quote}
> On Wed, Mar 7, 2012 at 10:59 PM, raghavendhra rahul wrote:
> 2012-03-08 12:03:09,475 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call execCoprocessor([B@50cb21, getProjection(), rpc version=1, client version=0, methodsFingerPrint=0), rpc version=1, client version=29, methodsFingerPrint=54742778 from 10.184.17.26:46472: output error
> 2012-03-08 12:03:09,476 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 7 on 60020 caught: java.nio.channels.ClosedChannelException
> {quote}
> I suggested in response we might consider give our RPC a keepalive option for calls that may run for a long time (like execCoprocessor).
> LarsH +1ed the idea:
> {quote}
> +1 on "keepalive". It's a shame (especially for long running server code) to do all the work, just to find out at the end that the client has given up.
> Or maybe there should be a way to cancel an operation if the clients decides it does not want to wait any longer (PostgreSQL does that for example). Here that would mean the server would need to check periodically and coprocessors would need to be written to support that - so maybe that's no-starter.
> {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5543) Add a keepalive option for IPC connections

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225818#comment-13225818 ] 

stack commented on HBASE-5543:
------------------------------

bq. Instead of adding to the rpc to make it keep alive longer, maybe be make it async, returning some sort of uuid token that the client can poll (or get notified) for progress instead?

I like this idea.
                
> Add a keepalive option for IPC connections
> ------------------------------------------
>
>                 Key: HBASE-5543
>                 URL: https://issues.apache.org/jira/browse/HBASE-5543
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, coprocessors, ipc
>            Reporter: Andrew Purtell
>
> On the user list someone wrote in with a connection failure due to a long running coprocessor:
> {quote}
> On Wed, Mar 7, 2012 at 10:59 PM, raghavendhra rahul wrote:
> 2012-03-08 12:03:09,475 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call execCoprocessor([B@50cb21, getProjection(), rpc version=1, client version=0, methodsFingerPrint=0), rpc version=1, client version=29, methodsFingerPrint=54742778 from 10.184.17.26:46472: output error
> 2012-03-08 12:03:09,476 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 7 on 60020 caught: java.nio.channels.ClosedChannelException
> {quote}
> I suggested in response we might consider give our RPC a keepalive option for calls that may run for a long time (like execCoprocessor).
> LarsH +1ed the idea:
> {quote}
> +1 on "keepalive". It's a shame (especially for long running server code) to do all the work, just to find out at the end that the client has given up.
> Or maybe there should be a way to cancel an operation if the clients decides it does not want to wait any longer (PostgreSQL does that for example). Here that would mean the server would need to check periodically and coprocessors would need to be written to support that - so maybe that's no-starter.
> {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5543) Add a keepalive option for IPC connections

Posted by "Jonathan Hsieh (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225803#comment-13225803 ] 

Jonathan Hsieh commented on HBASE-5543:
---------------------------------------

Instead of adding to the rpc to make it keep alive longer, maybe be make it async, returning some sort of uuid token that the client can poll (or get notified) for progress instead?
                
> Add a keepalive option for IPC connections
> ------------------------------------------
>
>                 Key: HBASE-5543
>                 URL: https://issues.apache.org/jira/browse/HBASE-5543
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, coprocessors, ipc
>            Reporter: Andrew Purtell
>
> On the user list someone wrote in with a connection failure due to a long running coprocessor:
> {quote}
> On Wed, Mar 7, 2012 at 10:59 PM, raghavendhra rahul wrote:
> 2012-03-08 12:03:09,475 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call execCoprocessor([B@50cb21, getProjection(), rpc version=1, client version=0, methodsFingerPrint=0), rpc version=1, client version=29, methodsFingerPrint=54742778 from 10.184.17.26:46472: output error
> 2012-03-08 12:03:09,476 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 7 on 60020 caught: java.nio.channels.ClosedChannelException
> {quote}
> I suggested in response we might consider give our RPC a keepalive option for calls that may run for a long time (like execCoprocessor).
> LarsH +1ed the idea:
> {quote}
> +1 on "keepalive". It's a shame (especially for long running server code) to do all the work, just to find out at the end that the client has given up.
> Or maybe there should be a way to cancel an operation if the clients decides it does not want to wait any longer (PostgreSQL does that for example). Here that would mean the server would need to check periodically and coprocessors would need to be written to support that - so maybe that's no-starter.
> {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5543) Add a keepalive option for IPC connections

Posted by "Himanshu Vashishtha (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13227333#comment-13227333 ] 

Himanshu Vashishtha commented on HBASE-5543:
--------------------------------------------

What is the scope of the uuid token in the Coprocessor context. The current approach is to subdivide the calls in terms of regions; then submit a Callable object for each of these Regions; obtain a Future object on each of these calls and block until all of them have returned some result. 
So, a uuid from the client side server proxy object, or a list of uuids from all the involved regions, or something more elegant which I am missing. Please suggest. Thanks.
                
> Add a keepalive option for IPC connections
> ------------------------------------------
>
>                 Key: HBASE-5543
>                 URL: https://issues.apache.org/jira/browse/HBASE-5543
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, coprocessors, ipc
>            Reporter: Andrew Purtell
>
> On the user list someone wrote in with a connection failure due to a long running coprocessor:
> {quote}
> On Wed, Mar 7, 2012 at 10:59 PM, raghavendhra rahul wrote:
> 2012-03-08 12:03:09,475 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call execCoprocessor([B@50cb21, getProjection(), rpc version=1, client version=0, methodsFingerPrint=0), rpc version=1, client version=29, methodsFingerPrint=54742778 from 10.184.17.26:46472: output error
> 2012-03-08 12:03:09,476 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 7 on 60020 caught: java.nio.channels.ClosedChannelException
> {quote}
> I suggested in response we might consider give our RPC a keepalive option for calls that may run for a long time (like execCoprocessor).
> LarsH +1ed the idea:
> {quote}
> +1 on "keepalive". It's a shame (especially for long running server code) to do all the work, just to find out at the end that the client has given up.
> Or maybe there should be a way to cancel an operation if the clients decides it does not want to wait any longer (PostgreSQL does that for example). Here that would mean the server would need to check periodically and coprocessors would need to be written to support that - so maybe that's no-starter.
> {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5543) Add a keepalive option for IPC connections

Posted by "Andrew Purtell (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225575#comment-13225575 ] 

Andrew Purtell commented on HBASE-5543:
---------------------------------------

bq. We have "ipc.client.ping", right? What we want is sort of the opposite direction of that?

That's what I had in mind.
                
> Add a keepalive option for IPC connections
> ------------------------------------------
>
>                 Key: HBASE-5543
>                 URL: https://issues.apache.org/jira/browse/HBASE-5543
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, coprocessors, ipc
>            Reporter: Andrew Purtell
>
> On the user list someone wrote in with a connection failure due to a long running coprocessor:
> {quote}
> On Wed, Mar 7, 2012 at 10:59 PM, raghavendhra rahul wrote:
> 2012-03-08 12:03:09,475 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call execCoprocessor([B@50cb21, getProjection(), rpc version=1, client version=0, methodsFingerPrint=0), rpc version=1, client version=29, methodsFingerPrint=54742778 from 10.184.17.26:46472: output error
> 2012-03-08 12:03:09,476 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 7 on 60020 caught: java.nio.channels.ClosedChannelException
> {quote}
> I suggested in response we might consider give our RPC a keepalive option for calls that may run for a long time (like execCoprocessor).
> LarsH +1ed the idea:
> {quote}
> +1 on "keepalive". It's a shame (especially for long running server code) to do all the work, just to find out at the end that the client has given up.
> Or maybe there should be a way to cancel an operation if the clients decides it does not want to wait any longer (PostgreSQL does that for example). Here that would mean the server would need to check periodically and coprocessors would need to be written to support that - so maybe that's no-starter.
> {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5543) Add a keepalive option for IPC connections

Posted by "Jonathan Hsieh (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229551#comment-13229551 ] 

Jonathan Hsieh commented on HBASE-5543:
---------------------------------------

A straw man:

As a mechanism for dealing with long running rpcs, we could adding something similar to what I understand Hadoop's Progressable class (the uuid could be a reference to this in a map or something).  The coprocessor context would have a ref Progressable that the coprocessor would have to periodically call to demonstrate progress.  If it isn't called for a while, it is assumed to be hung.

This could possibly be wired into the hbase rpc mechanism also -- for HBase ServerCallables on the server side, we might add a ref to a Progressable -- if a call is long running (like a bulk call), calls to the progress() method might reset the rpc timeout counter.
                
> Add a keepalive option for IPC connections
> ------------------------------------------
>
>                 Key: HBASE-5543
>                 URL: https://issues.apache.org/jira/browse/HBASE-5543
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, coprocessors, ipc
>            Reporter: Andrew Purtell
>
> On the user list someone wrote in with a connection failure due to a long running coprocessor:
> {quote}
> On Wed, Mar 7, 2012 at 10:59 PM, raghavendhra rahul wrote:
> 2012-03-08 12:03:09,475 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call execCoprocessor([B@50cb21, getProjection(), rpc version=1, client version=0, methodsFingerPrint=0), rpc version=1, client version=29, methodsFingerPrint=54742778 from 10.184.17.26:46472: output error
> 2012-03-08 12:03:09,476 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 7 on 60020 caught: java.nio.channels.ClosedChannelException
> {quote}
> I suggested in response we might consider give our RPC a keepalive option for calls that may run for a long time (like execCoprocessor).
> LarsH +1ed the idea:
> {quote}
> +1 on "keepalive". It's a shame (especially for long running server code) to do all the work, just to find out at the end that the client has given up.
> Or maybe there should be a way to cancel an operation if the clients decides it does not want to wait any longer (PostgreSQL does that for example). Here that would mean the server would need to check periodically and coprocessors would need to be written to support that - so maybe that's no-starter.
> {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5543) Add a keepalive option for IPC connections

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225566#comment-13225566 ] 

stack commented on HBASE-5543:
------------------------------

Yeah, it looks like its inevitable that we'll ask the server to do legitimate stuff that will take longer than the rpctimeout yet the server is making headway: e.g. the reproducing test case, though a little artificial, for "HBASE-4890  fix possible NPE in HConnectionManager" was asking the regionserver to open 3k regions.

If its a task like the above, there should be a facility for telling client we're alive still or we should just refuse the request because it will take too long (The latter we need to do toooo -- from Benoiit.  If server is going to take too long servicing a request, so long the client will be gone by the time its done its work, then refuse the request... don't do the increment or update that the updating client will not be around to see).
                
> Add a keepalive option for IPC connections
> ------------------------------------------
>
>                 Key: HBASE-5543
>                 URL: https://issues.apache.org/jira/browse/HBASE-5543
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, coprocessors, ipc
>            Reporter: Andrew Purtell
>
> On the user list someone wrote in with a connection failure due to a long running coprocessor:
> {quote}
> On Wed, Mar 7, 2012 at 10:59 PM, raghavendhra rahul wrote:
> 2012-03-08 12:03:09,475 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call execCoprocessor([B@50cb21, getProjection(), rpc version=1, client version=0, methodsFingerPrint=0), rpc version=1, client version=29, methodsFingerPrint=54742778 from 10.184.17.26:46472: output error
> 2012-03-08 12:03:09,476 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 7 on 60020 caught: java.nio.channels.ClosedChannelException
> {quote}
> I suggested in response we might consider give our RPC a keepalive option for calls that may run for a long time (like execCoprocessor).
> LarsH +1ed the idea:
> {quote}
> +1 on "keepalive". It's a shame (especially for long running server code) to do all the work, just to find out at the end that the client has given up.
> Or maybe there should be a way to cancel an operation if the clients decides it does not want to wait any longer (PostgreSQL does that for example). Here that would mean the server would need to check periodically and coprocessors would need to be written to support that - so maybe that's no-starter.
> {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira