You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Bryan Beaudreault (Jira)" <ji...@apache.org> on 2023/02/17 04:31:00 UTC
[jira] [Commented] (HBASE-27650) Merging empty regions corrupts meta cache

    [ https://issues.apache.org/jira/browse/HBASE-27650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17690136#comment-17690136 ] 

Bryan Beaudreault commented on HBASE-27650:
-------------------------------------------

One option here is, after scanning meta when we are going to cache the new locations, we can call MetaTableAccessor.getMergeRegions to find any merge regions in the meta result. If any exists, proactively clear them from cache.

The problem with this is, the CatalogJanitor will eventually clear out these merge qualifiers. If no requests come for any of the merged regions before that happens, we'll be left in the same situation as before.

> Merging empty regions corrupts meta cache
> -----------------------------------------
>
>                 Key: HBASE-27650
>                 URL: https://issues.apache.org/jira/browse/HBASE-27650
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Bryan Beaudreault
>            Priority: Major
>
> Let's say you have three regions with start keys A, B, C and all are cached in the meta cache. Region B is empty and not getting any requests, and all 3 regions are merged together. The new merged region has start key A.
> A user submits a request for row C1, which would previously have gone to region C. That region no longer exists, so the MetaCache returns region C, the request goes out to the server which throws NotServingRegionException. That region C is now removed from the cache, and meta is scanned. The meta scan returns the newly merged region A, which is cached into the MetaCache.
> So now we have a MetaCache where A has been updated with the newly merged RegionInfo, B still exists with the old/deleted RegionInfo, and C has been removed.
> A user submits a request for row C1 again. This _should_ go to region A, but we do cache.floorEntry(C1) which returns the old but still cached region B. We have checks in MetaCache which validate the RegionInfo.getEndKey() against the requested row, and that validation fails because C1 is beyond the endkey of the old region. The cached region B result is ignored and cache returns null. Meta is scanned, and returns the new region A, which is cached again.
> Requests to rows C1+ will still succeed... but they will always require a meta scan because the meta cache will always return that old region B which is invalid and doesn't contain the C1+ rows.
> Currently, the only way this will ever resolve is if a request is sent to region B, which will cause a NotServingRegionException which will finally clear region B from the cache. At that point, requests for C1+ will properly get resolved to region A in the cache.
> I've created a reproducible test case here: [https://gist.github.com/bbeaudreault/c82ff9f8ad0b9424eb987483ede35c12]
> This problem affects both AsyncTable and branch-2's Table.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)