You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org> on 2011/06/16 21:24:47 UTC

[jira] [Commented] (HBASE-3994) SplitTransaction has a window where clients can get RegionOfflineException

    [ https://issues.apache.org/jira/browse/HBASE-3994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050665#comment-13050665 ] 

Jean-Daniel Cryans commented on HBASE-3994:
-------------------------------------------

I'm still digging in my logs, but it appears that the region server took 40 secs to open a single file from one of the daughters and that's why the clients eventually ran out of retries. It seems at first that it didn't retry at all, but now I think we should just have a better error message.

> SplitTransaction has a window where clients can get RegionOfflineException
> --------------------------------------------------------------------------
>
>                 Key: HBASE-3994
>                 URL: https://issues.apache.org/jira/browse/HBASE-3994
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Jean-Daniel Cryans
>            Priority: Critical
>             Fix For: 0.90.4
>
>
> I just witnessed a job having failed tasks because of RegionOfflineException. This should normally happen because the table is disabled, but this can also happen because the parent is offline. Probably 99.999% of the time users don't hit it because SplitTransaction is able to offline the parent and add the first daughter quickly enough, but in my case the cluster was so slow that I was able to see.
> Maybe we should check in HCM not only if the region is offline but also if it's split, in which case we should retry?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira