You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2018/03/21 22:59:00 UTC
[jira] [Resolved] (HBASE-18408) AM consumes CPU and fills up the
logs really fast when there is no RS to assign
[ https://issues.apache.org/jira/browse/HBASE-18408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack resolved HBASE-18408.
---------------------------
Resolution: Later
Fix Version/s: (was: 2.0.0)
I've not seen this. What is reported here is admittedly from a very old master. The high-level takeaway I think is that if things go wrong in AM, then it can spew loads of log.... that is probably still the case; we just go wrong less often now.
Let me resolve this as later. Lets open new issue w/ updated complaint if we run into this again.
Thanks [~elserj]
> AM consumes CPU and fills up the logs really fast when there is no RS to assign
> -------------------------------------------------------------------------------
>
> Key: HBASE-18408
> URL: https://issues.apache.org/jira/browse/HBASE-18408
> Project: HBase
> Issue Type: Bug
> Reporter: Enis Soztutar
> Priority: Critical
>
> I was testing something else when I discovered that when there is no RS to assign a region to (but master is alive), then AM/LB creates GB's of logs.
> Logs like this:
> {code}
> 2017-07-18 16:40:00,712 WARN [AssignmentThread] balancer.BaseLoadBalancer: Wanted to do round robin assignment but no servers to assign to
> 2017-07-18 16:40:00,712 WARN [AssignmentThread] assignment.AssignmentManager: unable to round-robin assignment
> org.apache.hadoop.hbase.HBaseIOException: unable to compute plans for regions=1
> at org.apache.hadoop.hbase.master.assignment.AssignmentManager.acceptPlan(AssignmentManager.java:1725)
> at org.apache.hadoop.hbase.master.assignment.AssignmentManager.processAssignQueue(AssignmentManager.java:1711)
> at org.apache.hadoop.hbase.master.assignment.AssignmentManager.access$300(AssignmentManager.java:108)
> at org.apache.hadoop.hbase.master.assignment.AssignmentManager$2.run(AssignmentManager.java:1587)
> 2017-07-18 16:40:00,865 WARN [AssignmentThread] balancer.BaseLoadBalancer: Wanted to do round robin assignment but no servers to assign to
> 2017-07-18 16:40:00,866 WARN [AssignmentThread] assignment.AssignmentManager: unable to round-robin assignment
> org.apache.hadoop.hbase.HBaseIOException: unable to compute plans for regions=1
> at org.apache.hadoop.hbase.master.assignment.AssignmentManager.acceptPlan(AssignmentManager.java:1725)
> at org.apache.hadoop.hbase.master.assignment.AssignmentManager.processAssignQueue(AssignmentManager.java:1711)
> at org.apache.hadoop.hbase.master.assignment.AssignmentManager.access$300(AssignmentManager.java:108)
> at org.apache.hadoop.hbase.master.assignment.AssignmentManager$2.run(AssignmentManager.java:1587)
> 2017-07-18 16:40:01,019 WARN [AssignmentThread] balancer.BaseLoadBalancer: Wanted to do round robin assignment but no servers to assign to
> 2017-07-18 16:40:01,019 WARN [AssignmentThread] assignment.AssignmentManager: unable to round-robin assignment
> org.apache.hadoop.hbase.HBaseIOException: unable to compute plans for regions=1
> at org.apache.hadoop.hbase.master.assignment.AssignmentManager.acceptPlan(AssignmentManager.java:1725)
> at org.apache.hadoop.hbase.master.assignment.AssignmentManager.processAssignQueue(AssignmentManager.java:1711)
> at org.apache.hadoop.hbase.master.assignment.AssignmentManager.access$300(AssignmentManager.java:108)
> at org.apache.hadoop.hbase.master.assignment.AssignmentManager$2.run(AssignmentManager.java:1587)
> 2017-07-18 16:40:01,173 WARN [AssignmentThread] balancer.BaseLoadBalancer: Wanted to do round robin assignment but no servers to assign to
> 2017-07-18 16:40:01,173 WARN [AssignmentThread] assignment.AssignmentManager: unable to round-robin assignment
> org.apache.hadoop.hbase.HBaseIOException: unable to compute plans for regions=1
> at org.apache.hadoop.hbase.master.assignment.AssignmentManager.acceptPlan(AssignmentManager.java:1725)
> at org.apache.hadoop.hbase.master.assignment.AssignmentManager.processAssignQueue(AssignmentManager.java:1711)
> at org.apache.hadoop.hbase.master.assignment.AssignmentManager.access$300(AssignmentManager.java:108)
> at org.apache.hadoop.hbase.master.assignment.AssignmentManager$2.run(AssignmentManager.java:1587)
> {code}
> Reproduction is easy:
> - Start pseudo-distributed cluster
> - Create a table
> - kill region server
> I have also noticed that we are just spinning CPU in another case consuming 100-200% (but this is in a very old code base from master) in this cycle:
> {code}
> "ProcedureExecutor-0" #106 daemon prio=5 os_prio=0 tid=0x00007fab54851800 nid=0xcf1 runnable [0x00007fab4e7b0000]
> java.lang.Thread.State: RUNNABLE
> at java.lang.Object.hashCode(Native Method)
> at java.util.concurrent.ConcurrentHashMap.replaceNode(ConcurrentHashMap.java:1106)
> at java.util.concurrent.ConcurrentHashMap.remove(ConcurrentHashMap.java:1097)
> at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.close(HRegion.java:6158)
> - locked <0x00000000c4cb62e8> (a org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl)
> at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6829)
> at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6790)
> at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2125)
> at org.apache.hadoop.hbase.client.HTable$1.call(HTable.java:425)
> at org.apache.hadoop.hbase.client.HTable$1.call(HTable.java:416)
> at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:102)
> at org.apache.hadoop.hbase.client.HTable.get(HTable.java:433)
> at org.apache.hadoop.hbase.client.HTable.get(HTable.java:399)
> at org.apache.hadoop.hbase.MetaTableAccessor.getTableState(MetaTableAccessor.java:1084)
> at org.apache.hadoop.hbase.master.TableStateManager.readMetaState(TableStateManager.java:188)
> at org.apache.hadoop.hbase.master.TableStateManager.getTableState(TableStateManager.java:172)
> at org.apache.hadoop.hbase.master.TableStateManager.isTableState(TableStateManager.java:131)
> at org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.processDeadRegion(ServerCrashProcedure.java:666)
> at org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.calcRegionsToAssign(ServerCrashProcedure.java:460)
> at org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.executeFromState(ServerCrashProcedure.java:254)
> at org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.executeFromState(ServerCrashProcedure.java:72)
> at org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:133)
> at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:523)
> at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1061)
> at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:855)
> at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:808)
> at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$400(ProcedureExecutor.java:75)
> at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$2.run(ProcedureExecutor.java:495)
> {code}
> I think this happens when meta is not hosted in master.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)