You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Andrew Purtell (JIRA)" <ji...@apache.org> on 2018/04/18 21:00:00 UTC

[jira] [Comment Edited] (HBASE-20445) Defer work when a row lock is busy

    [ https://issues.apache.org/jira/browse/HBASE-20445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443188#comment-16443188 ] 

Andrew Purtell edited comment on HBASE-20445 at 4/18/18 8:59 PM:
-----------------------------------------------------------------

Yes, this is a design hypothetical. Definitely a valid concern [~mdrob]

Given the phased approach described above, this would be an argument to stop at phase 2: batch updates would be retried via requeue instead of looping in the handler (but heap utilization would be similar), and scans with allowPartialResults would return early (so not requiring more heap), and scans without allowPartialResults would block the handler (so we won't incur cost of caching multiple partial results building toward the complete set).


was (Author: apurtell):
Yes, this is a design hypothetical. Definitely a valid concern [~mdrob]

> Defer work when a row lock is busy
> ----------------------------------
>
>                 Key: HBASE-20445
>                 URL: https://issues.apache.org/jira/browse/HBASE-20445
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Andrew Purtell
>            Priority: Major
>
> Instead of blocking on row locks, defer the call and make the call runner available so it can service other activity. Have runners pick up deferred calls in the background after servicing the other request. 
> Spin briefly on tryLock() wherever we are now using lock() to acquire a row lock. Introduce two new configuration parameters: one for the amount of time to wait between lock acquisition attempts, and another for the total number of times we wait before deferring the work. If the lock cannot be acquired, put the call back into the call queue. Call queues therefore should be priority queues sorted by deadline. Currently they are implemented with LinkedBlockingQueue (which isn't), or AdaptiveLifoCoDelCallQueue (which is) if the CoDel scheduler is enabled. Perhaps we could just require use of AdaptiveLifoCoDelCallQueue. Runners will be picking up work from the head of the queues as long as they are not empty, so deferred calls will be serviced again, or dropped if the deadline has passed.
> Implementing continuations for simple operations should be straightforward. 
> Batch mutations try to acquire as many rowlocks as they can, then do the partial batch over the successfully locked rows, then loop back to attempt the remaining work. This is a partial implementation of what we need so we can build on it. Rather than loop around, save the partial batch completion state and put a pointer to it along with the call back into the RPC queue.
> For scans where allowPartialResults has been set to true we can simply complete the call at the point we fail to acquire a row lock. The client will handle the rest. For scans where allowPartialResults is false we have to save the scanner state and partial results, and put a pointer to this state along with the call back into the queue. 
> We could approach this in phases:
> Phase 0 - Sort out the call queuing details. Do we require AdaptiveLifoCoDelCallQueue? Certainly we can make use of it. Can we also have RWQueueRpcExecutor create queues as PriorityBlockingQueue instead of LinkedBlockingQueue? There must be a reason why not already.
> Phase 1 - Implement deferral of simple ops only. (Batch mutations and scans will still block on rowlocks.)
> Phase 2 - Implement deferral of batch mutations. (Scans will still block on rowlocks.)
> Phase 3 - Implement deferral of scans where allowPartialResults is false.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)