You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Lars Hofhansl (JIRA)" <ji...@apache.org> on 2011/09/25 00:33:26 UTC

[jira] [Commented] (HBASE-4335) Splits can create temporary holes in .META. that confuse clients and regionservers

    [ https://issues.apache.org/jira/browse/HBASE-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114083#comment-13114083 ] 

Lars Hofhansl commented on HBASE-4335:
--------------------------------------

To restate the problem:
If the first daughter is added to .META. first, any key lookup for a key >= splitKey would incorrectly return the first daughter region.
This seems like a legitimate, if rare, problem.

Checking the end key would work for all point operation (put,get,delete,icv,cap,etc), and most already do that (except GET as you state in HBASE-4334). I don't think scans do that either, and not sure how it would work for scans. Hmm... Seems like it could work, and scans are serial and start the next region with the last value from the previous region, so if the startKey was checked we would catch this and could do a retry.

I think it is better to avoid holes, though, overlap between active and offline-split regions seem fine. 

So what about splitting up the splitting process? The daughters are added to .META. in postOpenDeployTasks. That does not contain any long running operations.
What if we removed that from the DaughterOpener threads and call it synchronous in the right order? (And also need to add the special stopped || stopping case).


> Splits can create temporary holes in .META. that confuse clients and regionservers
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-4335
>                 URL: https://issues.apache.org/jira/browse/HBASE-4335
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.4
>            Reporter: Joe Pallas
>            Priority: Critical
>             Fix For: 0.92.0
>
>
> When a SplitTransaction is performed, three updates are done to .META.:
> 1. The parent region is marked as splitting (and hence offline)
> 2. The first daughter region is added (same start key as parent)
> 3. The second daughter region is added (split key is start key)
> (later, the original parent region is deleted, but that's not important to this discussion)
> Steps 2 and 3 are actually done concurrently by SplitTransaction.DaughterOpener threads.  While the master is notified when a split is complete, the only visibility that clients have is whether the daughter regions have appeared in .META.
> If the second daughter is added to .META. first, then .META. will contain the (offline) parent region followed by the second daughter region.  If the client looks up a key that is greater than (or equal to) the split, the client will find the second daughter region and use it.  If the key is less than the split key, the client will find the parent region and see that it is offline, triggering a retry.
> If the first daughter is added to .META. before the second daughter, there is a window during which .META. has a hole: the first daughter effectively hides the parent region (same start key), but there is no entry for the second daughter.  A region lookup will find the first daughter for all keys in the parent's range, but the first daughter does not include keys at or beyond the split key.
> See HBASE-4333 and HBASE-4334 for details on how this causes problems and suggestions for mitigating this in the client and regionserver.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira