You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Michael Stack (Jira)" <ji...@apache.org> on 2019/12/20 06:02:00 UTC

[jira] [Comment Edited] (HBASE-23597) Give high priority for meta assign procedure and ServerCrashProcedure which carry meta.

    [ https://issues.apache.org/jira/browse/HBASE-23597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17000636#comment-17000636 ] 

Michael Stack edited comment on HBASE-23597 at 12/20/19 6:01 AM:
-----------------------------------------------------------------

[~binlijin] Sorry. Yeah, polluting your issue with prioritizing meta request concerns. Let me undo my mess. Sorry about that.

My concern is edits on meta being rejected because CallQueue is full. This would be with HBASE-22280 in place. I was sharing my studies so far to see if it would here but misconstrued what this was about.


was (Author: stack):
[~binlijin] Sorry. Yeah, polluting your issue with prioritizing meta request concerns. Let me undo my mess. Sorry about that.

> Give high priority for meta assign procedure and ServerCrashProcedure which carry meta.
> ---------------------------------------------------------------------------------------
>
>                 Key: HBASE-23597
>                 URL: https://issues.apache.org/jira/browse/HBASE-23597
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 2.2.2
>            Reporter: Lijin Bin
>            Priority: Major
>
> {code}
> 2019-12-18 16:14:41,698 INFO  [KeepAlivePEWorker-18] assignment.TransitRegionStateProcedure: Starting pid=23568, ppid=23567, state=RUNNABLE:REGION_STATE_TRANSITION_OPEN, locked=true; TransitRegionStateProcedure table=hbase:meta, region=1588230740, ASSIGN; openRegion rit=OPEN, location=100.107.165.61,60020,1576553057082; loc=100.107.165.61,60020,1576553057082
> 2019-12-18 16:14:41,698 INFO  [KeepAlivePEWorker-18] zookeeper.MetaTableLocator: Setting hbase:meta (replicaId=0) location in ZooKeeper as 100.107.165.61,60020,1576553057082
> 2019-12-18 16:14:43,515 INFO  [PEWorker-2] procedure2.ProcedureExecutor: Finished subprocedure pid=23569, resume processing parent pid=23568, ppid=23567, state=RUNNABLE:REGION_STATE_TRANSITION_CONFIRM_OPENED, locked=true; TransitRegionStateProcedure table=hbase:meta, region=1588230740, ASSIGN
> 2019-12-18 16:14:43,518 INFO  [PEWorker-2] procedure2.ProcedureExecutor: Finished pid=23569, ppid=23568, state=SUCCESS; org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure in 1.5970sec
> 2019-12-18 16:14:43,522 INFO  [PEWorker-9] procedure2.ProcedureExecutor: Finished subprocedure pid=23568, resume processing parent pid=23567, state=RUNNABLE:SERVER_CRASH_GET_REGIONS, locked=true; ServerCrashProcedure server=100.107.165.22,60020,1576553019781, splitWal=true, meta=true
> 2019-12-18 16:14:43,522 INFO  [PEWorker-9] procedure2.ProcedureExecutor: Finished pid=23568, ppid=23567, state=SUCCESS; TransitRegionStateProcedure table=hbase:meta, region=1588230740, ASSIGN in 6.4630sec
> 2019-12-18 16:15:07,212 DEBUG [RegionServerTracker-0] procedure2.ProcedureExecutor: Stored pid=69619, state=RUNNABLE:SERVER_CRASH_START; ServerCrashProcedure server=100.107.165.61,60020,1576553057082, splitWal=true, meta=true
> 2019-12-18 16:15:07,212 DEBUG [RegionServerTracker-0] procedure.MasterProcedureScheduler: Add ServerQueue(100.107.165.61,60020,1576553057082, xlock=false sharedLock=0 size=1) to run queue because: the exclusive lock is not held by anyone when adding pid=69619, state=RUNNABLE:SERVER_CRASH_START; ServerCrashProcedure server=100.107.165.61,60020,1576553057082, splitWal=true, meta=true
> 2019-12-18 16:15:07,212 INFO  [RegionServerTracker-0] assignment.AssignmentManager: Scheduled SCP pid=69619 for 100.107.165.61,60020,1576553057082 (carryingMeta=true) 100.107.165.61,60020,1576553057082/CRASHED/regionCount=13026/lock=java.util.concurrent.locks.ReentrantReadWriteLock@68f2ee72[Write locks = 1, Read locks = 0], oldState=ONLINE.
> 2019-12-18 16:15:21,629 DEBUG [RpcServer.default.FPBQ.Fifo.handler=959,queue=191,port=60000] master.DeadServer: Removed 100.107.165.61,60020,1576553057082, processing=true, numProcessing=0
> 2019-12-18 16:16:14,779 DEBUG [qtp1688526221-1038] client.ConnectionImplementation: locateRegionInMeta parentTable='hbase:meta', attempt=0 of 31 failed; retrying after sleep of 31
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=31, exceptions:
> Wed Dec 18 16:16:14 CST 2019, null, java.net.SocketTimeoutException: callTimeout=60000, callDuration=81362: org.apache.hadoop.hbase.NotServingRegionException: hbase:meta,,1 is not online on 100.107.165.61,60020,1576656916048
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3349)
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3326)
>         at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1439)
>         at org.apache.hadoop.hbase.regionserver.RSRpcServices.newRegionScanner(RSRpcServices.java:2967)
>         at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3300)
>         at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42190)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
>         at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
>         at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)
>  row 'hbase:rsgroup,,99999999999999' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=100.107.165.61,60020,1576553057082, seqNum=-1
> 2019-12-18 16:33:04,715 ERROR [PEWorker-15] master.HMaster: Master server abort: loaded coprocessors are: [org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint]
> 2019-12-18 16:33:04,715 ERROR [PEWorker-15] master.HMaster: ***** ABORTING master 100.107.176.225,60000,1576656778460: FAILED persisting region=38d18fd824890c80cff972cbf2e4c174 state=OPENING *****
> java.net.SocketTimeoutException: callTimeout=1200000, callDuration=1286005: org.apache.hadoop.hbase.NotServingRegionException: hbase:meta,,1 is not online on 100.107.165.61,60020,1576656916048
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3349)
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3326)
>         at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1439)
>         at org.apache.hadoop.hbase.regionserver.RSRpcServices.mutate(RSRpcServices.java:2772)
>         at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42188)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
>         at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
>         at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)
>  row 'table1w,user22322,1575810347321.38d18fd824890c80cff972cbf2e4c174.' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=100.107.165.61,60020,1576553057082, seqNum=-1
>         at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:159)
>         at org.apache.hadoop.hbase.client.HTable.put(HTable.java:540)
>         at org.apache.hadoop.hbase.master.assignment.RegionStateStore.updateRegionLocation(RegionStateStore.java:209)
>         at org.apache.hadoop.hbase.master.assignment.RegionStateStore.updateUserRegionLocation(RegionStateStore.java:203)
>         at org.apache.hadoop.hbase.master.assignment.RegionStateStore.updateRegionLocation(RegionStateStore.java:141)
>         at org.apache.hadoop.hbase.master.assignment.AssignmentManager.transitStateAndUpdate(AssignmentManager.java:1632)
>         at org.apache.hadoop.hbase.master.assignment.AssignmentManager.regionOpening(AssignmentManager.java:1647)
>         at org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure.openRegion(TransitRegionStateProcedure.java:207)
>         at org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure.executeFromState(TransitRegionStateProcedure.java:339)
>         at org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure.executeFromState(TransitRegionStateProcedure.java:102)
>         at org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:194)
>         at org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure.execute(TransitRegionStateProcedure.java:319)
>         at org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure.execute(TransitRegionStateProcedure.java:102)
>         at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:962)
>         at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1648)
>         at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1395)
>         at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1100(ProcedureExecutor.java:78)
>         at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1965)
> Caused by: org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: hbase:meta,,1 is not online on 100.107.165.61,60020,1576656916048
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3349)
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3326)
>         at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1439)
>         at org.apache.hadoop.hbase.regionserver.RSRpcServices.mutate(RSRpcServices.java:2772)
>         at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42188)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
>         at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
>         at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)
>         at sun.reflect.GeneratedConstructorAccessor14.newInstance(Unknown Source)
>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>         at org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.instantiateException(RemoteWithExtrasException.java:99)
>         at org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.unwrapRemoteException(RemoteWithExtrasException.java:89)
>         at org.apache.hadoop.hbase.protobuf.ProtobufUtil.makeIOExceptionOfException(ProtobufUtil.java:282)
>         at org.apache.hadoop.hbase.protobuf.ProtobufUtil.handleRemoteException(ProtobufUtil.java:269)
>         at org.apache.hadoop.hbase.client.RegionServerCallable.call(RegionServerCallable.java:129)
>         at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:107)
>         ... 17 more
> Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.NotServingRegionException): org.apache.hadoop.hbase.NotServingRegionException: hbase:meta,,1 is not online on 100.107.165.61,60020,1576656916048
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3349)
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3326)
>         at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1439)
>         at org.apache.hadoop.hbase.regionserver.RSRpcServices.mutate(RSRpcServices.java:2772)
>         at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42188)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
>         at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
>         at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)
>         at org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:389)
>         at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:97)
>         at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:423)
>         at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:419)
>         at org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:103)
>         at org.apache.hadoop.hbase.ipc.Call.setException(Call.java:118)
>         at org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.readResponse(NettyRpcDuplexHandler.java:162)
>         at org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.channelRead(NettyRpcDuplexHandler.java:192)
>         at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:359)
> 2019-12-18 16:33:04,728 DEBUG [PEWorker-15] procedure.MasterProcedureScheduler: Remove ServerQueue(100.107.165.61,60020,1576553057082, xlock=false sharedLock=0 size=0) from run queue because: queue is empty after polling out pid=69619, state=RUNNABLE:SERVER_CRASH_START; ServerCrashProcedure server=100.107.165.61,60020,1576553057082, splitWal=true, meta=true
> 2019-12-18 16:33:04,728 DEBUG [PEWorker-15] procedure.MasterProcedureScheduler: Remove ServerQueue(100.107.165.61,60020,1576553057082, xlock=true (69619) sharedLock=0 size=0) from run queue because: pid=69619, state=RUNNABLE:SERVER_CRASH_START; ServerCrashProcedure server=100.107.165.61,60020,1576553057082, splitWal=true, meta=true held exclusive lock
> 2019-12-18 16:33:06,353 INFO  [PEWorker-13] assignment.TransitRegionStateProcedure: Starting pid=69626, ppid=69619, state=RUNNABLE:REGION_STATE_TRANSITION_OPEN, locked=true; TransitRegionStateProcedure table=hbase:meta, region=1588230740, ASSIGN; openRegion rit=OPEN, location=100.107.165.22,60020,1576656877749; loc=100.107.165.22,60020,1576656877749
> 2019-12-18 16:33:06,353 INFO  [PEWorker-13] zookeeper.MetaTableLocator: Setting hbase:meta (replicaId=0) location in ZooKeeper as 100.107.165.22,60020,1576656877749
> {code}
> Looks at this logs, we can see the ServerCrashProcedure server=100.107.165.61,60020,1576553057082(carry meta) queued for quit a while blocked by other procedure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)