You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Anoop Sam John (Jira)" <ji...@apache.org> on 2021/05/23 12:07:00 UTC

[jira] [Commented] (HBASE-25903) ReadOnlyZKClient APIs - CompletableFuture.get() calls can cause threads to hang forver when ZK client create throws Non IOException

    [ https://issues.apache.org/jira/browse/HBASE-25903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17350019#comment-17350019 ] 

Anoop Sam John commented on HBASE-25903:
----------------------------------------

Issue observed
*+DNS resolution problem for peer zk results in replication source initialize to just hang forever+*
In a replication enabled cluster, there is an occassional issue with the DNS system. When an RS starts, there is an issue with resolving the peer zk hostname. This is not a permenant issue also.
But when such situation happen, the ReplicationSource initialize is getting stuck forever and WALs getting accumulated and infinite replication lag. To come out, only way is manually restart RS.
We are on 2.1.6
HBaseInterClusterReplicationEndpoint create AsyncClusterConnection which in turn fetches peer clusterID. 
{code}
ConnectionRegistry registry = ConnectionRegistryFactory.getRegistry(conf);
String clusterId = FutureUtils.get(registry.getClusterId());
{code}
In zk clients which is not having the fis ZOOKEEPER-2184, will cause IllegalArgumentException on ZooKeeper instance creation.

In ReadOnlyZKClient#run
{code}
ZooKeeper zk;
try {
  zk = getZk();
} catch (IOException e) {
  task.connectFailed(e);
  continue;
}
task.exec(zk);
{code}
In case of IOE, we have ways to retry for fixed times and finally come out.


> ReadOnlyZKClient APIs - CompletableFuture.get() calls can cause threads to hang forver when ZK client create throws Non IOException
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-25903
>                 URL: https://issues.apache.org/jira/browse/HBASE-25903
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>            Priority: Major
>
> This is applicable for zk client versions which is not having fix for ZOOKEEPER-2184.
> Now we are on zookeeper 3.5.7 on active 2.x branches. Still its better to handle this case in our code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)