You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "chenwandong (JIRA)" <ji...@apache.org> on 2019/06/26 00:50:00 UTC

[jira] [Updated] (HBASE-22626) Master assigns the region successfully, but updates the state of region failed, and then keeping the state of the region is OPENNING in zookeeper, If master restarted, those OPENNING regions will not be assign forever.

     [ https://issues.apache.org/jira/browse/HBASE-22626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

chenwandong updated HBASE-22626:
--------------------------------
    Description: 
Problem Description
 (1)One of the region server restarts causes all the regions that have been assigned to be migrated.

(2)The master checks these regions and assigns them to other region servers, and assigns other region server assignments successfully, but the update state fails.

2019-06-22 16:44:20,065 INFO [PEWorker-8] procedure2.ProcedureExecutor: Finished pid=5038, ppid=4488, state=SUCCESS; AssignProcedure table=test061910, region=379c730490b4848f3db065fb25b87452 in 10.5680sec
 ... ...
 2019-06-22 16:44:38,725 INFO [PEWorker-4] procedure.MasterProcedureScheduler: pid=5038, ppid=4488, state=SUCCESS; AssignProcedure table=test061910, region=379c730490b4848f3db065fb25b87452 checking lock on 379c730490b4848f3db065fb25b87452
 2019-06-22 16:44:38,725 ERROR [PEWorker-4] procedure2.ProcedureExecutor: CODE-BUG: Uncaught runtime exception for pid=5038, ppid=4488, state=SUCCESS; AssignProcedure table=test061910, region=379c730490b4848f3db065fb25b87452
 java.lang.UnsupportedOperationException: Unhandled state REGION_TRANSITION_FINISH; there is no rollback for assignment unless we cancel the operation by dropping/disabling the table
 at org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.rollback(RegionTransitionProcedure.java:412)
 at org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.rollback(RegionTransitionProcedure.java:95)
 at org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864)
 at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1373)
 at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1329)
 at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1198)
 at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
 at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1761)

(3)The master restarted, and the old region state will be loaded after restarting. The region for the OPENNING state is not reassigned forever, and the following log is printed.

2019-06-22 16:46:26,210 INFO [master/hdp0:16000] assignment.RegionStateStore: Load hbase:meta entry region=379c730490b4848f3db065fb25b87452, regionState=OPENING, lastHost=hdp2,16020,1561190163476, regionLocation=hdp0,16020,1561190163887, openSeqNum=69448
 ... ...
 2019-06-22 16:51:28,514 WARN [ProcExecTimeout] assignment.AssignmentManager: STUCK Region-In-Transition rit=OPENING, location=hdp0,16020,1561190163887, table=test061910, region=379c730490b4848f3db065fb25b87452
 ... ....
 2019-06-22 16:49:28,483 WARN [ProcExecTimeout] assignment.AssignmentManager: STUCK Region-In-Transition rit=OPENING, location=hdp0,16020,1561190163887, table=test061910, region=379c730490b4848f3db065fb25b87452

(4) The state in zookeeper
 test061910,00000000000000000031457280,1560943902323.379c730490b4848f3db065fb25b87452. column=info:regioninfo, timestamp=1561192609882, value=\{ENCODED => 379c730490b4848f3db065fb25b87452, NAME => 'test061910,00000000000000000031457280,1560943902323.379c730490b4848f3db065fb25b87452.', STARTKEY => '00000000000000000031457280', ENDKEY => '00000000000000000041943040'}
 test061910,00000000000000000031457280,1560943902323.379c730490b4848f3db065fb25b87452. column=info:seqnumDuringOpen, timestamp=1561190288613, value=\x00\x00\x00\x00\x00\x01\x0FH
 test061910,00000000000000000031457280,1560943902323.379c730490b4848f3db065fb25b87452. column=info:server, timestamp=1561190288613, value=hdp2:16020
 test061910,00000000000000000031457280,1560943902323.379c730490b4848f3db065fb25b87452. column=info:serverstartcode, timestamp=1561190288613, value=1561190163476
 test061910,00000000000000000031457280,1560943902323.379c730490b4848f3db065fb25b87452. column=info:sn, timestamp=1561192609882, value=hdp0,16020,1561190163887
 test061910,00000000000000000031457280,1560943902323.379c730490b4848f3db065fb25b87452. column=info:state, timestamp=1561192609882, value=OPENING

 

Problem Recurrence
 (1) Modify the state of a normal region from OPEN to OPENNING.
 (2) Restart the master and region servers, view the master log and hbase web.

  was:
Problem Description
(1)One of the region server restarts causes all the regions that have been assigned to be migrated.

(2)The master checks these regions and assigns them to other region servers, and assigns other region server assignments successfully, but the update state fails.

2019-06-22 16:44:20,065 INFO [PEWorker-8] procedure2.ProcedureExecutor: Finished pid=5038, ppid=4488, state=SUCCESS; AssignProcedure table=test061910, region=379c730490b4848f3db065fb25b87452 in 10.5680sec
... ...
2019-06-22 16:44:38,725 INFO [PEWorker-4] procedure.MasterProcedureScheduler: pid=5038, ppid=4488, state=SUCCESS; AssignProcedure table=test061910, region=379c730490b4848f3db065fb25b87452 checking lock on 379c730490b4848f3db065fb25b87452
2019-06-22 16:44:38,725 ERROR [PEWorker-4] procedure2.ProcedureExecutor: CODE-BUG: Uncaught runtime exception for pid=5038, ppid=4488, state=SUCCESS; AssignProcedure table=test061910, region=379c730490b4848f3db065fb25b87452
java.lang.UnsupportedOperationException: Unhandled state REGION_TRANSITION_FINISH; there is no rollback for assignment unless we cancel the operation by dropping/disabling the table
 at org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.rollback(RegionTransitionProcedure.java:412)
 at org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.rollback(RegionTransitionProcedure.java:95)
 at org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864)
 at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1373)
 at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1329)
 at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1198)
 at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
 at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1761)

(3)The master restarted, and the old region state will be loaded after restarting. The region for the OPENNING state is not reassigned forever, and the following log is printed.

2019-06-22 16:46:26,210 INFO [master/hdp0:16000] assignment.RegionStateStore: Load hbase:meta entry region=379c730490b4848f3db065fb25b87452, regionState=OPENING, lastHost=hdp2,16020,1561190163476, regionLocation=hdp0,16020,1561190163887, openSeqNum=69448
... ...
2019-06-22 16:51:28,514 WARN [ProcExecTimeout] assignment.AssignmentManager: STUCK Region-In-Transition rit=OPENING, location=hdp0,16020,1561190163887, table=test061910, region=379c730490b4848f3db065fb25b87452
... ....
2019-06-22 16:49:28,483 WARN [ProcExecTimeout] assignment.AssignmentManager: STUCK Region-In-Transition rit=OPENING, location=hdp0,16020,1561190163887, table=test061910, region=379c730490b4848f3db065fb25b87452

(4) The state in zookeeper
 test061910,00000000000000000031457280,1560943902323.379c730490b4848f3db065fb25b87452. column=info:regioninfo, timestamp=1561192609882, value=\{ENCODED => 379c730490b4848f3db065fb25b87452, NAME => 'test061910,00000000000000000031457280,1560943902323.379c730490b4848f3db065fb25b87452.', STARTKEY => '00000000000000000031457280', ENDKEY => '00000000000000000041943040'}
 test061910,00000000000000000031457280,1560943902323.379c730490b4848f3db065fb25b87452. column=info:seqnumDuringOpen, timestamp=1561190288613, value=\x00\x00\x00\x00\x00\x01\x0FH
 test061910,00000000000000000031457280,1560943902323.379c730490b4848f3db065fb25b87452. column=info:server, timestamp=1561190288613, value=hdp2:16020
 test061910,00000000000000000031457280,1560943902323.379c730490b4848f3db065fb25b87452. column=info:serverstartcode, timestamp=1561190288613, value=1561190163476
 test061910,00000000000000000031457280,1560943902323.379c730490b4848f3db065fb25b87452. column=info:sn, timestamp=1561192609882, value=hdp0,16020,1561190163887
 test061910,00000000000000000031457280,1560943902323.379c730490b4848f3db065fb25b87452. column=info:state, timestamp=1561192609882, value=OPENING

 

Problem Recurrence
(1) Modify the state of a normal region from OPEN to OPENNING.
(2) Restart the master, view the master log and hbase web.


> Master assigns the region successfully, but updates the state of region failed, and then keeping the state of the region is OPENNING in zookeeper,  If master restarted, those OPENNING regions will not be assign forever.
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-22626
>                 URL: https://issues.apache.org/jira/browse/HBASE-22626
>             Project: HBase
>          Issue Type: Bug
>          Components: Region Assignment
>    Affects Versions: 2.0.0
>            Reporter: chenwandong
>            Priority: Critical
>
> Problem Description
>  (1)One of the region server restarts causes all the regions that have been assigned to be migrated.
> (2)The master checks these regions and assigns them to other region servers, and assigns other region server assignments successfully, but the update state fails.
> 2019-06-22 16:44:20,065 INFO [PEWorker-8] procedure2.ProcedureExecutor: Finished pid=5038, ppid=4488, state=SUCCESS; AssignProcedure table=test061910, region=379c730490b4848f3db065fb25b87452 in 10.5680sec
>  ... ...
>  2019-06-22 16:44:38,725 INFO [PEWorker-4] procedure.MasterProcedureScheduler: pid=5038, ppid=4488, state=SUCCESS; AssignProcedure table=test061910, region=379c730490b4848f3db065fb25b87452 checking lock on 379c730490b4848f3db065fb25b87452
>  2019-06-22 16:44:38,725 ERROR [PEWorker-4] procedure2.ProcedureExecutor: CODE-BUG: Uncaught runtime exception for pid=5038, ppid=4488, state=SUCCESS; AssignProcedure table=test061910, region=379c730490b4848f3db065fb25b87452
>  java.lang.UnsupportedOperationException: Unhandled state REGION_TRANSITION_FINISH; there is no rollback for assignment unless we cancel the operation by dropping/disabling the table
>  at org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.rollback(RegionTransitionProcedure.java:412)
>  at org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.rollback(RegionTransitionProcedure.java:95)
>  at org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864)
>  at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1373)
>  at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1329)
>  at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1198)
>  at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
>  at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1761)
> (3)The master restarted, and the old region state will be loaded after restarting. The region for the OPENNING state is not reassigned forever, and the following log is printed.
> 2019-06-22 16:46:26,210 INFO [master/hdp0:16000] assignment.RegionStateStore: Load hbase:meta entry region=379c730490b4848f3db065fb25b87452, regionState=OPENING, lastHost=hdp2,16020,1561190163476, regionLocation=hdp0,16020,1561190163887, openSeqNum=69448
>  ... ...
>  2019-06-22 16:51:28,514 WARN [ProcExecTimeout] assignment.AssignmentManager: STUCK Region-In-Transition rit=OPENING, location=hdp0,16020,1561190163887, table=test061910, region=379c730490b4848f3db065fb25b87452
>  ... ....
>  2019-06-22 16:49:28,483 WARN [ProcExecTimeout] assignment.AssignmentManager: STUCK Region-In-Transition rit=OPENING, location=hdp0,16020,1561190163887, table=test061910, region=379c730490b4848f3db065fb25b87452
> (4) The state in zookeeper
>  test061910,00000000000000000031457280,1560943902323.379c730490b4848f3db065fb25b87452. column=info:regioninfo, timestamp=1561192609882, value=\{ENCODED => 379c730490b4848f3db065fb25b87452, NAME => 'test061910,00000000000000000031457280,1560943902323.379c730490b4848f3db065fb25b87452.', STARTKEY => '00000000000000000031457280', ENDKEY => '00000000000000000041943040'}
>  test061910,00000000000000000031457280,1560943902323.379c730490b4848f3db065fb25b87452. column=info:seqnumDuringOpen, timestamp=1561190288613, value=\x00\x00\x00\x00\x00\x01\x0FH
>  test061910,00000000000000000031457280,1560943902323.379c730490b4848f3db065fb25b87452. column=info:server, timestamp=1561190288613, value=hdp2:16020
>  test061910,00000000000000000031457280,1560943902323.379c730490b4848f3db065fb25b87452. column=info:serverstartcode, timestamp=1561190288613, value=1561190163476
>  test061910,00000000000000000031457280,1560943902323.379c730490b4848f3db065fb25b87452. column=info:sn, timestamp=1561192609882, value=hdp0,16020,1561190163887
>  test061910,00000000000000000031457280,1560943902323.379c730490b4848f3db065fb25b87452. column=info:state, timestamp=1561192609882, value=OPENING
>  
> Problem Recurrence
>  (1) Modify the state of a normal region from OPEN to OPENNING.
>  (2) Restart the master and region servers, view the master log and hbase web.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)