You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Sergey Shelukhin (JIRA)" <ji...@apache.org> on 2018/12/20 20:11:00 UTC

[jira] [Created] (HBASE-21625) a runnable procedure v2 does not run

Sergey Shelukhin created HBASE-21625:
----------------------------------------

             Summary: a runnable procedure v2 does not run
                 Key: HBASE-21625
                 URL: https://issues.apache.org/jira/browse/HBASE-21625
             Project: HBase
          Issue Type: Bug
            Reporter: Sergey Shelukhin


Haven't looked at the code much yet, but it seems rather fundamental. The procedure comes from meta replica assignment (HBASE-21624).
There are no other runnable procedures on master - a lot of succeeded procedures since then, and one unrelated RIT procedure waiting with timeout and being updated peroodically.

The procedure itself is 
{noformat}
157156 	157155 	RUNNABLE 	hadoop 	org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure 	Wed Dec 19 17:20:27 PST 2018 	Wed Dec 19 17:20:28 PST 2018 		[ { region => { regionId => '1', tableName => { ... }, startKey => '', endKey => '', offline => 'false', split => 'false', replicaId => '1' }, targetServer => { hostName => 'server1', port => '17020', startCode => '1545266805778' } }, {} ]
{noformat}
This is in PST so it's been like that for ~19 hours.

The only line involving this PID in the log is {noformat}
2018-12-19 17:20:27,974 INFO  [PEWorker-4] procedure2.ProcedureExecutor: Initialized subprocedures=[{pid=157156, ppid=157155, state=RUNNABLE, hasLock=false; org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure}]
{noformat}
There are no other useful logs for either this PID, parent PID, or region in question since.

All the PEWorker-s are waiting for work:
{noformat}
Thread 158 (PEWorker-16):
  State: TIMED_WAITING
  Blocked count: 1340
  Waited count: 5064
  Stack:
    sun.misc.Unsafe.park(Native Method)
    java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
    org.apache.hadoop.hbase.procedure2.AbstractProcedureScheduler.poll(AbstractProcedureScheduler.java:171)
    org.apache.hadoop.hbase.procedure2.AbstractProcedureScheduler.poll(AbstractProcedureScheduler.java:153)
    org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1949)
{noformat}

The main assignment procedure for this region is blocked on it:
{noformat}
157155 		WAITING 	hadoop 	TransitRegionStateProcedure table=hbase:meta, region=534574363, ASSIGN 	Wed Dec 19 17:20:27 PST 2018 	Wed Dec 19 17:20:27 PST 2018 		[ { state => [ '1', '2', '3' ] }, { regionId => '1', tableName => { ... }, startKey => '', endKey => '', offline => 'false', split => 'false', replicaId => '1' }, { initialState => 'REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE', lastState => 'REGION_STATE_TRANSITION_CONFIRM_OPENED', assignCandidate => { hostName => 'server1', port => '17020', startCode => '1545266805778' }, forceNewPlan => 'false' } ]


2018-12-19 17:20:27,673 INFO  [PEWorker-9] procedure.MasterProcedureScheduler: Took xlock for pid=157155, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, hasLock=false; TransitRegionStateProcedure table=hbase:meta, region=..., ASSIGN
2018-12-19 17:20:27,809 INFO  [PEWorker-9] assignment.TransitRegionStateProcedure: Starting pid=157155, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, hasLock=true; TransitRegionStateProcedure table=hbase:meta, region=..., ASSIGN; rit=OFFLINE, location=server1,17020,1545266805778; forceNewPlan=false, retain=false
{noformat}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)