You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Karthik Ranganathan (JIRA)" <ji...@apache.org> on 2010/02/25 02:32:27 UTC

[jira] Commented: (HBASE-2023) Client sync block can cause 1 thread of a multi-threaded client to block all others

    [ https://issues.apache.org/jira/browse/HBASE-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838137#action_12838137 ] 

Karthik Ranganathan commented on HBASE-2023:
--------------------------------------------

Kannan and I took a look at this issue and came up with yet another possibility in addition to the 3 JD mentioned:

Move the synchronized block inside the try catch loop just around the getClosestRowBefore() call. This causes each thread to give up the lock before sleeping to retry. This allows other threads to make a call in case one particular region was offline. In addition, if useCache is true, we can look at the cache and return the region right away without ever entering the synchronized section. So the new workflow in  locateRegionInMeta() will look as follows:

1. If useCache is true and the region is in the cache, return the region. If not, We have to make a remote call. 
2. for the number of retries
3.   wait for lock
4.   check cache again (someone could have filled the cache while we were waiting). Return if found.
5.   make the remote call
6.   release lock
7.   return on success, otherwise usual error handling/sleep, goto 2

I can work on the fix if this sounds good to you guys.


> Client sync block can cause 1 thread of a multi-threaded client to block all others
> -----------------------------------------------------------------------------------
>
>                 Key: HBASE-2023
>                 URL: https://issues.apache.org/jira/browse/HBASE-2023
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.2
>            Reporter: ryan rawson
>
> Take a highly multithreaded client, processing a few thousand requests a second.  If a table goes offline, one thread will get stuck in "locateRegionInMeta" which is located inside the following sync block:
>         synchronized(userRegionLock){
>           return locateRegionInMeta(META_TABLE_NAME, tableName, row, useCache);
>         }
> So when other threads need to find a region (EVEN IF ITS CACHED!!!) it will encounter this sync and wait. 
> This can become an issue on a busy thrift server (where I first noticed the problem), one region offline can prevent access to all other regions!
> Potential solution: narrow this lock, or perhaps just get rid of it completely.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.