You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Andrew Kyle Purtell (Jira)" <ji...@apache.org> on 2021/02/26 22:06:00 UTC

[jira] [Commented] (HBASE-25583) Handle the NoNode exception in remove log replication and avoid RS crash

    [ https://issues.apache.org/jira/browse/HBASE-25583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17291934#comment-17291934 ] 

Andrew Kyle Purtell commented on HBASE-25583:
---------------------------------------------

Spoke with [~sandeep.pal] privately, he'd prefer to keep this a branch-1 only issue, because it is a critical fix in that branch and not the others, no problem. 

> Handle the NoNode exception in remove log replication and avoid RS crash
> ------------------------------------------------------------------------
>
>                 Key: HBASE-25583
>                 URL: https://issues.apache.org/jira/browse/HBASE-25583
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Sandeep Pal
>            Assignee: Sandeep Pal
>            Priority: Major
>             Fix For: 1.7.0
>
>
> Should not crash the region server it there is a NoNode exception while removing the log
>  We should look into the excpetion and if it is NoNode we shouldn't crash. There might be a possiblity the node was deleted as part of peer tear down.
> {code:java}
> @Override
> public void removeLog(String queueId, String filename) {
> try { 
>   String znode = ZKUtil.joinZNode(this.myQueuesZnode, queueId); 
>   znode = ZKUtil.joinZNode(znode, filename); ZKUtil.deleteNode(this.zookeeper, znode); }
> catch (KeeperException e) { 
>   this.abortable.abort("Failed to remove wal from queue (queueId=" + queueId + ", filename=" + filename + ")", e); }
> }
> {code}
> This was the exception observed on region servers:
> {code:java}
> 2021-02-16 20:11:58,567 FATAL [95922885,xyz_peer] regionserver.HRegionServer - ABORTING region server regionserver-111,60020,1613495922885: Failed to remove wal from queue (queueId=xyz_peer, filename=regionserver-111%2C60020%2C1613495922885.1613505863058)
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/replication/rs/regionserver-111,60020,1613495922885/xyz_peer/regionserver-111%2C60020%2C1613495922885.16135058630
> 58
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:114)
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
>         at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:890)
>         at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:238)
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1341)
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1330)
>         at org.apache.hadoop.hbase.replication.ReplicationQueuesZKImpl.removeLog(ReplicationQueuesZKImpl.java:142)
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.cleanOldLogs(ReplicationSourceMana
> ger.java:232)
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.cleanOldLogs(ReplicationSourceMana
> ger.java:222)
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.logPositionAndCleanOldLogs(Replica
> tionSourceManager.java:198)
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceShipperThread.updateLogP
> osition(ReplicationSource.java:831)
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceShipperThread.shipEdits(
> ReplicationSource.java:746)
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceShipperThread.run(Replic
> ationSource.java:650)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)