You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Elliott Clark (JIRA)" <ji...@apache.org> on 2016/06/22 06:23:57 UTC

[jira] [Comment Edited] (HBASE-15146) Don't block on Reader threads queueing to a scheduler queue

    [ https://issues.apache.org/jira/browse/HBASE-15146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15343790#comment-15343790 ] 

Elliott Clark edited comment on HBASE-15146 at 6/22/16 6:23 AM:
----------------------------------------------------------------

bq.In general, gradually reducing performance is rather preferable in heavy load.
We've found the exact opposite many many times. Pushing back on the client is a well know and understood load shedding mechanism. That allows the server to take what it can handle and no more.

By contrast every time the server promises to do work that it can't handle things get worse. GC gets worse, queue call times get worse, and it becomes a cycle. That continues until a regionserver is in-operable. Removing threads that can call select leads to multiple seconds where no tcp acks are sent. On loaded servers we saw all reader threads completely stop any network selects at all.

bq.Selector.select immediately causes a context switch when an event occurs, 

Yes it does, and you want to get the reader threads back to the calling select as fast as possible. That's the most basic tenant of an event loop. What was happening was that the threads would stop for multiple seconds because the queues were full. That meant the event loop is stopped.

bq.and this patch might make worse performance in such subtle heavy congestion.
The opposite has held true under load.


was (Author: eclark):
bq.In general, gradually reducing performance is rather preferable in heavy load.
We've found the exact opposite many many times. Pushing back on the client is a well know and understood load shedding mechanism. That allows the server to take what it can handle and no more.

By contrast every time the server promises to do work that it can't handle things get worse. GC gets worse, queue call times get worse, and it becomes a cycle. That continues until a regionserver is in-operable. Removing threads that can call select leads to multiple seconds where no tcp acks are sent. On loaded servers we saw all reader threads completely stop any network selects at all.

bq.Selector.select immediately causes a context switch when an event occurs, and this patch might make worse performance in such subtle heavy congestion.

Yes it does, and you want to get the reader threads back to the calling select as fast as possible. That's the most basic tenant of an event loop. What was happening was that the threads would stop for multiple seconds because the queues were full. That meant the event loop is stopped.

> Don't block on Reader threads queueing to a scheduler queue
> -----------------------------------------------------------
>
>                 Key: HBASE-15146
>                 URL: https://issues.apache.org/jira/browse/HBASE-15146
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.2.0
>            Reporter: Elliott Clark
>            Assignee: Elliott Clark
>            Priority: Blocker
>             Fix For: 2.0.0, 1.2.0, 1.3.0
>
>         Attachments: HBASE-15146-v7.patch, HBASE-15146-v8.patch, HBASE-15146-v8.patch, HBASE-15146.0.patch, HBASE-15146.1.patch, HBASE-15146.2.patch, HBASE-15146.3.patch, HBASE-15146.4.patch, HBASE-15146.5.patch, HBASE-15146.6.patch
>
>
> Blocking on the epoll thread is awful. The new rpc scheduler can have lots of different queues. Those queues have different capacity limits. Currently the dispatch method can block trying to add the the blocking queue in any of the schedulers.
> This causes readers to block, tcp acks are delayed, and everything slows down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)