You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2012/05/10 00:05:49 UTC
[jira] [Commented] (HBASE-5973) Add ability for potentially long-running IPC calls to abort if client disconnects

    [ https://issues.apache.org/jira/browse/HBASE-5973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271863#comment-13271863 ] 

Todd Lipcon commented on HBASE-5973:
------------------------------------

Attached patch implements the suggested idea, and hooks it up for scanner.next().

I spent 2.5 hours trying to write a test case for it, but we have so many layers of byzantine caching going on above the IPC sockets that I couldn't figure out how to make a client IPC connection actually hard-disconnect. So I tested it from the shell. here's the manual test plan:

1) create a table with 100 or so rows
2) issue following from shell:

{code}
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.95-SNAPSHOT, r5c65cc4a19fbc00876a365b10e98142238dc9a97, Wed May  9 13:06:25 PDT 2012

hbase(main):001:0> import org.apache.hadoop.hbase.filter.TestFilter
=> Java::OrgApacheHadoopHbaseFilter::TestFilter
hbase(main):002:0> scan 't1', { FILTER => TestFilter::SlowScanFilter.new(), CACHE => 50 }
ROW                                            COLUMN+CELL                                                                                                                             
{code}
(shell will hang here)

On the server side, you should see:
{code}

12/05/09 15:03:29 INFO filter.TestFilter: Handler thread Thread[IPC Server handler 0 on 58364,5,main] sleeping in filter...
12/05/09 15:03:30 INFO filter.TestFilter: Handler thread Thread[IPC Server handler 0 on 58364,5,main] sleeping in filter...
12/05/09 15:03:31 INFO filter.TestFilter: Handler thread Thread[IPC Server handler 0 on 58364,5,main] sleeping in filter...
12/05/09 15:03:32 INFO filter.TestFilter: Handler thread Thread[IPC Server handler 0 on 58364,5,main] sleeping in filter...
{code}

Now ^C the shell. You should see on the server:

{code}
12/05/09 15:03:33 ERROR regionserver.RegionServer: 
org.apache.hadoop.hbase.ipc.CallerDisconnectedException: Aborting call scan(null, scannerId: 4581116627867187291
numberOfRows: 50
closeScanner: false
), rpc version=1, client version=1, methodsFingerPrint=-944626147 from 127.0.0.1:55648 after 5009 ms, since caller disconnected
        at org.apache.hadoop.hbase.ipc.HBaseServer$Call.throwExceptionIfCallerDisconnected(HBaseServer.java:417)
        at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3433)
        at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3391)
        at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3415)
        at org.apache.hadoop.hbase.regionserver.RegionServer.scan(RegionServer.java:828)
        at sun.reflect.GeneratedMethodAccessor26.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:358)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1387)
12/05/09 15:03:33 WARN ipc.HBaseServer: IPC Server Responder, call scan(null, scannerId: 4581116627867187291
numberOfRows: 50
closeScanner: false
), rpc version=1, client version=1, methodsFingerPrint=-944626147 from 127.0.0.1:55648: output error
12/05/09 15:03:33 WARN ipc.HBaseServer: IPC Server handler 0 on 58364 caught a ClosedChannelException, this means that the server was processing a request but the client went away. The error message was: null
{code}

We could probably improve the messaging slightly, but this is at least an improvement in that the thread doesn't continue to get hung up indefinitely.
                
> Add ability for potentially long-running IPC calls to abort if client disconnects
> ---------------------------------------------------------------------------------
>
>                 Key: HBASE-5973
>                 URL: https://issues.apache.org/jira/browse/HBASE-5973
>             Project: HBase
>          Issue Type: Improvement
>          Components: ipc
>    Affects Versions: 0.90.7, 0.92.1, 0.94.0, 0.96.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hbase-5973.txt
>
>
> We recently had a cluster issue where a user was submitting scanners with a very restrictive filter, and then calling next() with a high scanner caching value. The clients would generally time out the next() call and disconnect, but the IPC kept running looking to fill the requested number of rows. Since this was in the context of MR, the tasks making the calls would retry, and the retries wuld be more likely to time out due to contention with the previous still-running scanner next() call. Eventually, the system spiraled out of control.
> We should add a hook to the IPC system so that RPC calls can check if the client has already disconnected. In such a case, the next() call could abort processing, given any further work is wasted. I imagine coprocessor endpoints, etc, could make good use of this as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira