You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Uma Maheswara Rao G (Commented) (JIRA)" <ji...@apache.org> on 2012/03/26 14:18:28 UTC

[jira] [Commented] (HBASE-5635) If getTaskList() returns null splitlogWorker is down. It wont serve any requests.

    [ https://issues.apache.org/jira/browse/HBASE-5635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238328#comment-13238328 ] 

Uma Maheswara Rao G commented on HBASE-5635:
--------------------------------------------

Yes, I think, continuing without SplitLogWroker may not be a good behaviour.
Because that particular regionServer may have more capacity to take up the new regions. With the current behaviour it may not compete for taking any new splilog work.

I feel we can retry for some times and then we can shutdown regionServer?
or other option is to retry forever on any ZK exception. And can exit only on interrupted exception.

Also i am seeing this issue may be bit dangerous bacause, if ZK is not available for some time, all RegionServer may face this problem and no one will take up the splitlog work.

listChildrenAndWatchForNewChildren will return null only if node does not exist. If it is not able to find any children then it will return empty list. So, zookeeper.znode.splitlog will be always set.

On Other keeperExceptions like ZK unavalability and all, we have to handle.
                
> If getTaskList() returns null splitlogWorker is down. It wont serve any requests. 
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-5635
>                 URL: https://issues.apache.org/jira/browse/HBASE-5635
>             Project: HBase
>          Issue Type: Bug
>          Components: wal
>    Affects Versions: 0.92.1
>            Reporter: Kristam Subba Swathi
>
> During the hlog split operation if all the zookeepers are down ,then the paths will be returned as null and the splitworker thread wil be exited
> Now this regionserver wil not be able to acquire any other tasks since the splitworker thread is exited
> Please find the attached code for more details
> ------------------------------------------
> private List<String> getTaskList() {
>     for (int i = 0; i < zkretries; i++) {
>       try {
>         return (ZKUtil.listChildrenAndWatchForNewChildren(this.watcher,
>             this.watcher.splitLogZNode));
>       } catch (KeeperException e) {
>         LOG.warn("Could not get children of znode " +
>             this.watcher.splitLogZNode, e);
>         try {
>           Thread.sleep(1000);
>         } catch (InterruptedException e1) {
>           LOG.warn("Interrupted while trying to get task list ...", e1);
>           Thread.currentThread().interrupt();
>           return null;
>         }
>       }
>     }
> in the org.apache.hadoop.hbase.regionserver.SplitLogWorker 
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira