You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Y. SREENIVASULU REDDY (Jira)" <ji...@apache.org> on 2020/03/31 08:59:00 UTC

[jira] [Updated] (HBASE-24089) Rolling Upgrade: Regions are in RIT during enabling the table after restore_snapshot

     [ https://issues.apache.org/jira/browse/HBASE-24089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Y. SREENIVASULU REDDY updated HBASE-24089:
------------------------------------------
    Description: 
During Rolling upgrade, we performed some set of operations, which leads to regions were stuck in RIT.
pre-requisites:
configure the below properties in HBase 1.3.1 version
{noformat}
 <property>
    <name>hbase.assignment.usezk</name>
    <value>true</value>
  </property>
  <property>
    <name>hbase.assignment.usezk.migrating</name>
    <value>true</value>
  </property>
{noformat}
configure the below properties in HBase 1.3.1 version
{noformat}
 <property>
    <name>hbase.mirror.table.state.to.zookeeper</name>
    <value>true</value>
  </property>
  <property>
    <name>hbase.migrate.table.state.from.zookeeper</name>
    <value>true</value>
  </property>
{noformat}

Steps to reproduce the problem.
{noformat}
1. start the hbase cluster with version 1.3.1 (1 master and 1 regionserver)
2. start the regionserver 2.2.x version [ 1 regionserver]
3. create the table with one region (ensure the table region with old version RS)
4. write some data into the table
5. flush the table.
6. create snapshot for the table
7. move the table region from old version to new version RS
8. disable the table.
9. restore snapshot on the table.
10 enable table.
{noformat}
After triggered the enable table operation, HBase 1.3.1 master assigned the region to HBase 1.3.1 Regionserver.
RS failed to open the region.
{noformat}
2020-03-18 21:10:45,103 WARN  [RS_OPEN_REGION-vm1:16040-17] zookeeper.ZKAssign: regionserver:16040-0x200431c58cf0012, quorum=vm1:2181,vm2:2181,vm3:2181, baseZNode=/hbase Attempt to transition the unassigned node for 505f0e1d96a2a06eb111bd8b923a5a87 from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING failed, the server that tried to transition was vm1,16040,1584536385246 not the expected vm2,16040,1584536781189
2020-03-18 21:10:45,104 WARN  [RS_OPEN_REGION-vm1:16040-17] coordination.ZkOpenRegionCoordination: Failed transition from OFFLINE to OPENING for region=505f0e1d96a2a06eb111bd8b923a5a87
2020-03-18 21:10:45,104 WARN  [RS_OPEN_REGION-vm1:16040-17] handler.OpenRegionHandler: Region was hijacked? Opening cancelled for encodedName=505f0e1d96a2a06eb111bd8b923a5a87
2020-03-18 21:10:45,104 INFO  [RS_OPEN_REGION-vm1:16040-17] coordination.ZkOpenRegionCoordination: Opening of region {ENCODED => 505f0e1d96a2a06eb111bd8b923a5a87, NAME => 'usertable01,user1089,1584535968752.505f0e1d96a2a06eb111bd8b923a5a87.', STARTKEY => 'user1089', ENDKEY => 'user1134'} failed, transitioning from OFFLINE to FAILED_OPEN in ZK, expecting version 0
2020-03-18 21:10:45,104 DEBUG [RS_OPEN_REGION-vm1:16040-17] zookeeper.ZKAssign: regionserver:16040-0x200431c58cf0012, quorum=vm1:2181,vm2:2181,vm3:2181, baseZNode=/hbase Transitioning 505f0e1d96a2a06eb111bd8b923a5a87 from M_ZK_REGION_OFFLINE to RS_ZK_REGION_FAILED_OPEN
{noformat}

Looked little deeper into the problem, found that HMaster failed to delete the Znode, during the table disable operation.
{noformat}
2020-03-18 21:10:02,219 DEBUG [RpcServer.FifoWFPBQ.default.handler=42,queue=2,port=16000] master.AssignmentManager: Table being disabled so deleting ZK node and removing from regions in transition, skipping assignment of region usertable01,user1089,1584535968752.505f0e1d96a2a06eb111bd8b923a5a87.
2020-03-18 21:10:02,220 WARN  [RpcServer.FifoWFPBQ.default.handler=42,queue=2,port=16000] zookeeper.ZKAssign: master:16000-0x100431c3faf0012, quorum=vm1:2181,vm2:2181,vm3:2181, baseZNode=/hbase Attempting to delete unassigned node 505f0e1d96a2a06eb111bd8b923a5a87 in RS_ZK_REGION_CLOSED state but node is in M_ZK_REGION_CLOSING state
2020-03-18 21:10:02,221 WARN  [RpcServer.FifoWFPBQ.default.handler=42,queue=2,port=16000] zookeeper.ZKAssign: master:16000-0x100431c3faf0012, quorum=vm1:2181,vm2:2181,vm3:2181, baseZNode=/hbase Attempting to delete unassigned node 505f0e1d96a2a06eb111bd8b923a5a87 in M_ZK_REGION_OFFLINE state but node is in M_ZK_REGION_CLOSING state
2020-03-18 21:10:02,221 INFO  [RpcServer.FifoWFPBQ.default.handler=42,queue=2,port=16000] master.AssignmentManager: Failed to delete the closed node for 505f0e1d96a2a06eb111bd8b923a5a87. The node type may not match
{noformat}

Region was closed successfully, and Latest RS sent RPC call back to the master about the region transition information, But master is expecting the Znode states modified by the RS, Based  on those states HM will delete the ZNode.

  was:
During Rolling upgrade, we performed some set of operations, which leads to regions were stuck in RIT.
pre-requisites:
configure the below properties in HBase 1.3.1 version
{noformat}
 <property>
    <name>hbase.assignment.usezk</name>
    <value>true</value>
  </property>
  <property>
    <name>hbase.assignment.usezk.migrating</name>
    <value>true</value>
  </property>
{noformat}
configure the below properties in HBase 1.3.1 version
{noformat}
 <property>
    <name>hbase.mirror.table.state.to.zookeeper</name>
    <value>true</value>
  </property>
  <property>
    <name>hbase.migrate.table.state.from.zookeeper</name>
    <value>true</value>
  </property>
{noformat}

Steps to reproduce the problem.
1. start the hbase cluster with version 1.3.1 (1 master and 1 regionserver)
2. start the regionserver 2.2.x version [ 1 regionserver]
3. create the table with one region (ensure the table region with old version RS)
4. write some data into the table
5. flush the table.
6. create snapshot for the table
7. move the table region from old version to new version RS
8. disable the table.
9. restore snapshot on the table.
10 enable table.

After triggered the enable table operation, HBase 1.3.1 master assigned the region to HBase 1.3.1 Regionserver.
RS failed to open the region.
{noformat}
2020-03-18 21:10:45,103 WARN  [RS_OPEN_REGION-BLR1000030601:16040-17] zookeeper.ZKAssign: regionserver:16040-0x200431c58cf0012, quorum=10.18.21.118:2181,10.18.21.109:2181,10.18.21.236:2181, baseZNode=/hbase Attempt to transition the unassigned node for 505f0e1d96a2a06eb111bd8b923a5a87 from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING failed, the server that tried to transition was blr1000030601,16040,1584536385246 not the expected blr1000030600,16040,1584536781189
2020-03-18 21:10:45,104 WARN  [RS_OPEN_REGION-BLR1000030601:16040-17] coordination.ZkOpenRegionCoordination: Failed transition from OFFLINE to OPENING for region=505f0e1d96a2a06eb111bd8b923a5a87
2020-03-18 21:10:45,104 WARN  [RS_OPEN_REGION-BLR1000030601:16040-17] handler.OpenRegionHandler: Region was hijacked? Opening cancelled for encodedName=505f0e1d96a2a06eb111bd8b923a5a87
2020-03-18 21:10:45,104 INFO  [RS_OPEN_REGION-BLR1000030601:16040-17] coordination.ZkOpenRegionCoordination: Opening of region {ENCODED => 505f0e1d96a2a06eb111bd8b923a5a87, NAME => 'usertable01,user1089,1584535968752.505f0e1d96a2a06eb111bd8b923a5a87.', STARTKEY => 'user1089', ENDKEY => 'user1134'} failed, transitioning from OFFLINE to FAILED_OPEN in ZK, expecting version 0
2020-03-18 21:10:45,104 DEBUG [RS_OPEN_REGION-BLR1000030601:16040-17] zookeeper.ZKAssign: regionserver:16040-0x200431c58cf0012, quorum=10.18.21.118:2181,10.18.21.109:2181,10.18.21.236:2181, baseZNode=/hbase Transitioning 505f0e1d96a2a06eb111bd8b923a5a87 from M_ZK_REGION_OFFLINE to RS_ZK_REGION_FAILED_OPEN
{noformat}

Looked little deeper into the problem, found that HMaster failed to delete the Znode, during the table disable operation.
{noformat}
2020-03-18 21:10:02,219 DEBUG [RpcServer.FifoWFPBQ.default.handler=42,queue=2,port=16000] master.AssignmentManager: Table being disabled so deleting ZK node and removing from regions in transition, skipping assignment of region usertable01,user1089,1584535968752.505f0e1d96a2a06eb111bd8b923a5a87.
2020-03-18 21:10:02,220 WARN  [RpcServer.FifoWFPBQ.default.handler=42,queue=2,port=16000] zookeeper.ZKAssign: master:16000-0x100431c3faf0012, quorum=10.18.21.118:2181,10.18.21.109:2181,10.18.21.236:2181, baseZNode=/hbase Attempting to delete unassigned node 505f0e1d96a2a06eb111bd8b923a5a87 in RS_ZK_REGION_CLOSED state but node is in M_ZK_REGION_CLOSING state
2020-03-18 21:10:02,221 WARN  [RpcServer.FifoWFPBQ.default.handler=42,queue=2,port=16000] zookeeper.ZKAssign: master:16000-0x100431c3faf0012, quorum=10.18.21.118:2181,10.18.21.109:2181,10.18.21.236:2181, baseZNode=/hbase Attempting to delete unassigned node 505f0e1d96a2a06eb111bd8b923a5a87 in M_ZK_REGION_OFFLINE state but node is in M_ZK_REGION_CLOSING state
2020-03-18 21:10:02,221 INFO  [RpcServer.FifoWFPBQ.default.handler=42,queue=2,port=16000] master.AssignmentManager: Failed to delete the closed node for 505f0e1d96a2a06eb111bd8b923a5a87. The node type may not match
{noformat}

Region was closed successfully, and Latest RS sent RPC call back to the master about the region transition information, But master is expecting the Znode states modified by the RS, Based  on those states HM will delete the ZNode.


> Rolling Upgrade: Regions are in RIT during enabling the table after restore_snapshot
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-24089
>                 URL: https://issues.apache.org/jira/browse/HBASE-24089
>             Project: HBase
>          Issue Type: Bug
>          Components: amv2
>    Affects Versions: 1.3.4, 1.3.6
>            Reporter: Y. SREENIVASULU REDDY
>            Priority: Major
>
> During Rolling upgrade, we performed some set of operations, which leads to regions were stuck in RIT.
> pre-requisites:
> configure the below properties in HBase 1.3.1 version
> {noformat}
>  <property>
>     <name>hbase.assignment.usezk</name>
>     <value>true</value>
>   </property>
>   <property>
>     <name>hbase.assignment.usezk.migrating</name>
>     <value>true</value>
>   </property>
> {noformat}
> configure the below properties in HBase 1.3.1 version
> {noformat}
>  <property>
>     <name>hbase.mirror.table.state.to.zookeeper</name>
>     <value>true</value>
>   </property>
>   <property>
>     <name>hbase.migrate.table.state.from.zookeeper</name>
>     <value>true</value>
>   </property>
> {noformat}
> Steps to reproduce the problem.
> {noformat}
> 1. start the hbase cluster with version 1.3.1 (1 master and 1 regionserver)
> 2. start the regionserver 2.2.x version [ 1 regionserver]
> 3. create the table with one region (ensure the table region with old version RS)
> 4. write some data into the table
> 5. flush the table.
> 6. create snapshot for the table
> 7. move the table region from old version to new version RS
> 8. disable the table.
> 9. restore snapshot on the table.
> 10 enable table.
> {noformat}
> After triggered the enable table operation, HBase 1.3.1 master assigned the region to HBase 1.3.1 Regionserver.
> RS failed to open the region.
> {noformat}
> 2020-03-18 21:10:45,103 WARN  [RS_OPEN_REGION-vm1:16040-17] zookeeper.ZKAssign: regionserver:16040-0x200431c58cf0012, quorum=vm1:2181,vm2:2181,vm3:2181, baseZNode=/hbase Attempt to transition the unassigned node for 505f0e1d96a2a06eb111bd8b923a5a87 from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING failed, the server that tried to transition was vm1,16040,1584536385246 not the expected vm2,16040,1584536781189
> 2020-03-18 21:10:45,104 WARN  [RS_OPEN_REGION-vm1:16040-17] coordination.ZkOpenRegionCoordination: Failed transition from OFFLINE to OPENING for region=505f0e1d96a2a06eb111bd8b923a5a87
> 2020-03-18 21:10:45,104 WARN  [RS_OPEN_REGION-vm1:16040-17] handler.OpenRegionHandler: Region was hijacked? Opening cancelled for encodedName=505f0e1d96a2a06eb111bd8b923a5a87
> 2020-03-18 21:10:45,104 INFO  [RS_OPEN_REGION-vm1:16040-17] coordination.ZkOpenRegionCoordination: Opening of region {ENCODED => 505f0e1d96a2a06eb111bd8b923a5a87, NAME => 'usertable01,user1089,1584535968752.505f0e1d96a2a06eb111bd8b923a5a87.', STARTKEY => 'user1089', ENDKEY => 'user1134'} failed, transitioning from OFFLINE to FAILED_OPEN in ZK, expecting version 0
> 2020-03-18 21:10:45,104 DEBUG [RS_OPEN_REGION-vm1:16040-17] zookeeper.ZKAssign: regionserver:16040-0x200431c58cf0012, quorum=vm1:2181,vm2:2181,vm3:2181, baseZNode=/hbase Transitioning 505f0e1d96a2a06eb111bd8b923a5a87 from M_ZK_REGION_OFFLINE to RS_ZK_REGION_FAILED_OPEN
> {noformat}
> Looked little deeper into the problem, found that HMaster failed to delete the Znode, during the table disable operation.
> {noformat}
> 2020-03-18 21:10:02,219 DEBUG [RpcServer.FifoWFPBQ.default.handler=42,queue=2,port=16000] master.AssignmentManager: Table being disabled so deleting ZK node and removing from regions in transition, skipping assignment of region usertable01,user1089,1584535968752.505f0e1d96a2a06eb111bd8b923a5a87.
> 2020-03-18 21:10:02,220 WARN  [RpcServer.FifoWFPBQ.default.handler=42,queue=2,port=16000] zookeeper.ZKAssign: master:16000-0x100431c3faf0012, quorum=vm1:2181,vm2:2181,vm3:2181, baseZNode=/hbase Attempting to delete unassigned node 505f0e1d96a2a06eb111bd8b923a5a87 in RS_ZK_REGION_CLOSED state but node is in M_ZK_REGION_CLOSING state
> 2020-03-18 21:10:02,221 WARN  [RpcServer.FifoWFPBQ.default.handler=42,queue=2,port=16000] zookeeper.ZKAssign: master:16000-0x100431c3faf0012, quorum=vm1:2181,vm2:2181,vm3:2181, baseZNode=/hbase Attempting to delete unassigned node 505f0e1d96a2a06eb111bd8b923a5a87 in M_ZK_REGION_OFFLINE state but node is in M_ZK_REGION_CLOSING state
> 2020-03-18 21:10:02,221 INFO  [RpcServer.FifoWFPBQ.default.handler=42,queue=2,port=16000] master.AssignmentManager: Failed to delete the closed node for 505f0e1d96a2a06eb111bd8b923a5a87. The node type may not match
> {noformat}
> Region was closed successfully, and Latest RS sent RPC call back to the master about the region transition information, But master is expecting the Znode states modified by the RS, Based  on those states HM will delete the ZNode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)