You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2018/07/25 18:23:00 UTC

[jira] [Reopened] (HBASE-20893) Data loss if splitting region while ServerCrashProcedure executing

     [ https://issues.apache.org/jira/browse/HBASE-20893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack reopened HBASE-20893:
---------------------------

Reopening to look at these logs I see running this patch on cluster (Its great it detected recovered.edits... but it looks like the patch causes us to hit CODE-BUG...  though we seem to be ok...Minimally it will freak-out an operator):

{code}

2018-07-25 06:46:56,692 ERROR [PEWorker-3] assignment.SplitTableRegionProcedure: Error trying to split region 2cb977a87bc6bdf90ef7fc71320d7b50 in the table IntegrationTestBigLinkedList (in state=SPLIT_TABLE_REGIONS_CHECK_CLOSED_REGIONS)
java.io.IOException: Recovered.edits are found in Region: {ENCODED => 2cb977a87bc6bdf90ef7fc71320d7b50, NAME => 'IntegrationTestBigLinkedList,z\xAA;\xC7M\x1Bf8\x85\xB5\x07\xD5\x9B#\xCD\xCC,1531911202047.2cb977a87bc6bdf90ef7fc71320d7b50.', STARTKEY => 'z\xAA;\xC7M\x1Bf8\x85\xB5\x07\xD5\x9B#\xCD\xCC', ENDKEY => '{\x8D\xF2?'}, abort split to prevent data loss
  at org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.checkClosedRegion(SplitTableRegionProcedure.java:151)
  at org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:259)
  at org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:92)
  at org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:184)
  at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:850)
  at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1472)
  at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1240)                                                                                                                       at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
  at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1760)                                                                                                                     2018-07-25 06:46:56,934 INFO  [PEWorker-3] procedure.MasterProcedureScheduler: pid=4106, ppid=4105, state=SUCCESS; UnassignProcedure table=IntegrationTestBigLinkedList, region=2cb977a87bc6bdf90ef7fc71320d7b50, server=ve0540.halxg.cloudera.com,16020,1532501580658 checking lock on 2cb977a87bc6bdf90ef7fc71320d7b50                                                                                                                                2018-07-25 06:46:56,934 ERROR [PEWorker-3] procedure2.ProcedureExecutor: CODE-BUG: Uncaught runtime exception for pid=4106, ppid=4105, state=SUCCESS; UnassignProcedure table=IntegrationTestBigLinkedList, region=2cb977a87bc6bdf90ef7fc71320d7b50, server=ve0540.halxg.cloudera.com,16020,1532501580658                                                                                                                                               java.lang.UnsupportedOperationException: Unhandled state REGION_TRANSITION_FINISH; there is no rollback for assignment unless we cancel the operation by dropping/disabling the table
  at org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.rollback(RegionTransitionProcedure.java:412)
  at org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.rollback(RegionTransitionProcedure.java:95)                                                                                                          at org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864)
  at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1372)
  at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1328)                                                                                                                        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1197)
  at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)                                                                                                                               at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1760)
2018-07-25 06:46:57,088 ERROR [PEWorker-3] procedure2.ProcedureExecutor: CODE-BUG: Uncaught runtime exception for pid=4106, ppid=4105, state=SUCCESS; UnassignProcedure table=IntegrationTestBigLinkedList, region=2cb977a87bc6bdf90ef7fc71320d7b50, server=ve0540.halxg.cloudera.com,16020,1532501580658                                                                                                                                               java.lang.UnsupportedOperationException: Unhandled state REGION_TRANSITION_FINISH; there is no rollback for assignment unless we cancel the operation by dropping/disabling the table                                         at org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.rollback(RegionTransitionProcedure.java:412)
  at org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.rollback(RegionTransitionProcedure.java:95)                                                                                                          at org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864)                                                                                                                                              at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1372)
  at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1328)                                                                                                                        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1197)                                                                                                                       at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
  at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1760)                                                                                                                     2018-07-25 06:46:57,196 INFO  [PEWorker-9] procedure.MasterProcedureScheduler: pid=4107, state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure table=IntegrationTestBigLinkedList, region=2cb977a87bc6bdf90ef7fc71320d7b50, target=ve0540.halxg.cloudera.com,16020,1532501580658 checking lock on 2cb977a87bc6bdf90ef7fc71320d7b50
2018-07-25 06:46:57,760 INFO  [PEWorker-3] procedure2.ProcedureExecutor: Rolled back pid=4105, state=ROLLEDBACK, exception=java.io.IOException via master-split-regions:java.io.IOException: Recovered.edits are found in Region: {ENCODED => 2cb977a87bc6bdf90ef7fc71320d7b50, NAME => 'IntegrationTestBigLinkedList,z\xAA;\xC7M\x1Bf8\x85\xB5\x07\xD5\x9B#\xCD\xCC,1531911202047.2cb977a87bc6bdf90ef7fc71320d7b50.', STARTKEY => 'z\xAA;\xC7M\x1Bf8\x85\xB5\x07\xD5\x9B#\xCD\xCC', ENDKEY => '{\x8D\xF2?'}, abort split to prevent data loss; SplitTableRegionProcedure table=IntegrationTestBigLinkedList, parent=2cb977a87bc6bdf90ef7fc71320d7b50, daughterA=8b6804c043fe3707493f052e18aca74f, daughterB=f64f248effb5b9ef66210778d9a87fd3 exec-time=1.8490sec
{code}



> Data loss if splitting region while ServerCrashProcedure executing
> ------------------------------------------------------------------
>
>                 Key: HBASE-20893
>                 URL: https://issues.apache.org/jira/browse/HBASE-20893
>             Project: HBase
>          Issue Type: Sub-task
>    Affects Versions: 3.0.0, 2.1.0, 2.0.1
>            Reporter: Allan Yang
>            Assignee: Allan Yang
>            Priority: Major
>             Fix For: 3.0.0, 2.0.2, 2.2.0, 2.1.1
>
>         Attachments: HBASE-20893.branch-2.0.001.patch, HBASE-20893.branch-2.0.002.patch, HBASE-20893.branch-2.0.003.patch, HBASE-20893.branch-2.0.004.patch, HBASE-20893.branch-2.0.005.patch
>
>
> Similar case as HBASE-20878.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)