You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Duo Zhang (Jira)" <ji...@apache.org> on 2022/08/09 08:03:00 UTC

[jira] [Resolved] (HBASE-25229) Instantiate BucketCache before RS creates a their ephemeral node when rolling-upgrade

     [ https://issues.apache.org/jira/browse/HBASE-25229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Duo Zhang resolved HBASE-25229.
-------------------------------
      Assignee:     (was: Jeongdae Kim)
    Resolution: Won't Fix

All 1.x release lines are EOL.

Feel fee to reopen if this also affects 2.x and master.

> Instantiate BucketCache before RS creates a their ephemeral node when rolling-upgrade
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-25229
>                 URL: https://issues.apache.org/jira/browse/HBASE-25229
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 1.5.0, 1.6.0, 1.7.0, 1.4.13
>            Reporter: Jeongdae Kim
>            Priority: Minor
>
> We observed many clients couldn't get information on region locations for tens of seconds during rolling-upgrade from 1.2.x to 1.4.x, and all requests to regions moved by graceful restart failed.
>  
> The reason is that 
> # Since HBASE-17931, system tables are assigned to RS with highest version
> # Since HBASE-12034, bucket cache initialization process has moved from RS instantiation to RS initialization process after reporting to master, moreover an ephemeral node for RS is created before bucket cache creation.
> # when using offheap bucketcache, it takes too much time to allocate memory for it (18 seconds for 31GB in our case) [https://github.com/apache/hbase/blob/branch-1.4/hbase-common/src/main/java/org/apache/hadoop/hbase/util/ByteBufferArray.java#L52-L72]
> # Once ephemeral nodes created, a master try to move system regions to RS with highest version when first RS restart of whole rolling-restart process. but, by 3) the RS is not ready for serving system regions yet. moving system regions keep failing until 3) is finished.
>  
> I think this could happen only in branch-1, because an ephemeral node is created after creating block caches in hbase 2.x. there is no need to create block caches after ephemeral node creation at all.
>  
> I verified this issue could be resolved by just changing their creation order.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)