You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Sandeep Pal (Jira)" <ji...@apache.org> on 2021/02/18 18:19:00 UTC

[jira] [Updated] (HBASE-25583) Handle the NoNode exception in remove log replication and avoid crash

     [ https://issues.apache.org/jira/browse/HBASE-25583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sandeep Pal updated HBASE-25583:
--------------------------------
    Description: 
Should not crash the region server it there is a NoNode exception while removing the log
 We should look into the excpetion and if it is NoNode we shouldn't crash. There might be a possiblity the node was deleted as part of peer tear down.
{code:java}
@Override
public void removeLog(String queueId, String filename) {
try { 
  String znode = ZKUtil.joinZNode(this.myQueuesZnode, queueId); 
  znode = ZKUtil.joinZNode(znode, filename); ZKUtil.deleteNode(this.zookeeper, znode); }
catch (KeeperException e) { 
  this.abortable.abort("Failed to remove wal from queue (queueId=" + queueId + ", filename=" + filename + ")", e); }
}
{code}
This was the exception observed on region servers:
{code:java}
2021-02-16 20:11:58,567 FATAL [95922885,xyz_peer] regionserver.HRegionServer - ABORTING region server regionserver-111,60020,1613495922885: Failed to remove wal from queue (queueId=xyz_peer, filename=regionserver-111%2C60020%2C1613495922885.1613505863058)
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/replication/rs/regionserver-111,60020,1613495922885/xyz_peer/regionserver-111%2C60020%2C1613495922885.16135058630
58
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:114)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
        at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:890)
        at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:238)
        at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1341)
        at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1330)
        at org.apache.hadoop.hbase.replication.ReplicationQueuesZKImpl.removeLog(ReplicationQueuesZKImpl.java:142)
        at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.cleanOldLogs(ReplicationSourceMana
ger.java:232)
        at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.cleanOldLogs(ReplicationSourceMana
ger.java:222)
        at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.logPositionAndCleanOldLogs(Replica
tionSourceManager.java:198)
        at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceShipperThread.updateLogP
osition(ReplicationSource.java:831)
        at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceShipperThread.shipEdits(
ReplicationSource.java:746)
        at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceShipperThread.run(Replic
ationSource.java:650)
{code}

  was:
Should not crash the region server it there is a NoNode exception while removing the log
We should look into the excpetion and if it is NoNode we shouldn't crash. There might be a possiblity the node was deleted as part of peer tear down. 

`  @Override
  public void removeLog(String queueId, String filename) {
    try {
      String znode = ZKUtil.joinZNode(this.myQueuesZnode, queueId);
      znode = ZKUtil.joinZNode(znode, filename);
      ZKUtil.deleteNode(this.zookeeper, znode);
    } catch (KeeperException e) {

      this.abortable.abort("Failed to remove wal from queue (queueId=" + queueId + ", filename="
          + filename + ")", e);
    }
  }


> Handle the NoNode exception in remove log replication and avoid crash
> ---------------------------------------------------------------------
>
>                 Key: HBASE-25583
>                 URL: https://issues.apache.org/jira/browse/HBASE-25583
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Sandeep Pal
>            Assignee: Sandeep Pal
>            Priority: Major
>
> Should not crash the region server it there is a NoNode exception while removing the log
>  We should look into the excpetion and if it is NoNode we shouldn't crash. There might be a possiblity the node was deleted as part of peer tear down.
> {code:java}
> @Override
> public void removeLog(String queueId, String filename) {
> try { 
>   String znode = ZKUtil.joinZNode(this.myQueuesZnode, queueId); 
>   znode = ZKUtil.joinZNode(znode, filename); ZKUtil.deleteNode(this.zookeeper, znode); }
> catch (KeeperException e) { 
>   this.abortable.abort("Failed to remove wal from queue (queueId=" + queueId + ", filename=" + filename + ")", e); }
> }
> {code}
> This was the exception observed on region servers:
> {code:java}
> 2021-02-16 20:11:58,567 FATAL [95922885,xyz_peer] regionserver.HRegionServer - ABORTING region server regionserver-111,60020,1613495922885: Failed to remove wal from queue (queueId=xyz_peer, filename=regionserver-111%2C60020%2C1613495922885.1613505863058)
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/replication/rs/regionserver-111,60020,1613495922885/xyz_peer/regionserver-111%2C60020%2C1613495922885.16135058630
> 58
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:114)
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
>         at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:890)
>         at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:238)
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1341)
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1330)
>         at org.apache.hadoop.hbase.replication.ReplicationQueuesZKImpl.removeLog(ReplicationQueuesZKImpl.java:142)
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.cleanOldLogs(ReplicationSourceMana
> ger.java:232)
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.cleanOldLogs(ReplicationSourceMana
> ger.java:222)
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.logPositionAndCleanOldLogs(Replica
> tionSourceManager.java:198)
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceShipperThread.updateLogP
> osition(ReplicationSource.java:831)
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceShipperThread.shipEdits(
> ReplicationSource.java:746)
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceShipperThread.run(Replic
> ationSource.java:650)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)