You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Junhong Xu (Jira)" <ji...@apache.org> on 2020/06/12 10:12:00 UTC

[jira] [Updated] (HBASE-24548) improvement for HBase RS Stop

     [ https://issues.apache.org/jira/browse/HBASE-24548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Junhong Xu updated HBASE-24548:
-------------------------------
    Summary: improvement for HBase RS Stop   (was: improvement for HBase SCP)

> improvement for HBase RS Stop 
> ------------------------------
>
>                 Key: HBASE-24548
>                 URL: https://issues.apache.org/jira/browse/HBASE-24548
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Junhong Xu
>            Assignee: Junhong Xu
>            Priority: Major
>
> In our internal hbase based on branch-2.1 in community, we find after the regionserver is stopped about 30 s later, the master find it dead finally from its ephemeral node deleted in zk. During this time, the regions on this server is unavailable and no progress. The log is as follows:
> {code:java}
> [2020-06-12 15:51:41.888 ActorThreadPool-consumer-processor-talos-set-alias-55-1 ERROR c.x.xmpush.hbase.utils.HBaseHelper] [get data hbase failed, tableName = mipush:app_alias_new]
> com.xiaomi.infra.hbase.client.HException: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=10, exceptions:
> Fri Jun 12 15:50:44 CST 2020, org.apache.hadoop.hbase.client.RpcRetryingCaller@2dc1865, org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: Server c3-hadoop-srv-st639.bj,13700,1591932264018 stopping
>         at org.apache.hadoop.hbase.regionserver.RSRpcServices.checkOpen(RSRpcServices.java:1551)
>         at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2565)
>         at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41998)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:134)
>         at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
>         at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)
> Fri Jun 12 15:50:44 CST 2020, org.apache.hadoop.hbase.client.RpcRetryingCaller@2dc1865, org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: Server c3-hadoop-srv-st639.bj,13700,1591932264018 stopping
>         at org.apache.hadoop.hbase.regionserver.RSRpcServices.checkOpen(RSRpcServices.java:1551)
>         at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2565)
>         at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41998)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:134)
>         at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
>         at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)
> {code}
> The logs in master:
> {code:java}
> 2020-06-12,15:51:12,003 INFO [RegionServerTracker-0] org.apache.hadoop.hbase.master.RegionServerTracker: RegionServer ephemeral node deleted, processing expiration [c3-hadoop-srv-st639.bj,13700,1591932264018]
> 2020-06-12,15:51:12,003 INFO [RegionServerTracker-0] org.apache.hadoop.hbase.master.ServerManager: Processing expiration of c3-hadoop-srv-st639.bj,13700,1591932264018 on c3-hadoop-miui-zk05.bj,13600,1591927126881
> 2020-06-12,15:51:12,109 INFO [RegionServerTracker-0] org.apache.hadoop.hbase.master.assignment.AssignmentManager: Added c3-hadoop-srv-st639.bj,13700,1591932264018 to dead servers which carryingMeta=false, submitted ServerCrashProcedure pid=97428
> 2020-06-12,15:51:12,109 INFO [org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$ServerEventsListenerThread-c3-hadoop-miui-zk05.bj,13600,1591927126881] org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$ServerEventsListenerThread: Updating default servers.
> 2020-06-12,15:51:12,111 INFO [PEWorker-11] org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Start pid=97428, state=RUNNABLE:SERVER_CRASH_START, locked=true; ServerCrashProcedure server=c3-hadoop-srv-st639.bj,13700,1591932264018, splitWal=true, meta=false
> {code}
> After discussion with [~zghao] offline, we could accelerate this process by sending the message to the master or deleting the ephemeral node itself before stop.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)