You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "zhengsicheng (Jira)" <ji...@apache.org> on 2022/08/16 12:25:00 UTC

[jira] [Comment Edited] (HBASE-27249) Remove invalid peer RegionServer crash

    [ https://issues.apache.org/jira/browse/HBASE-27249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17572199#comment-17572199 ] 

zhengsicheng edited comment on HBASE-27249 at 8/16/22 12:24 PM:
----------------------------------------------------------------

[~zhangduo] When add peer cluster, but  peer cluster zookeeper shuotdown or invalid. remove peer cause  source cluster RS abort.


was (Author: zhengsicheng):
[~zhangduo] When add peer cluster, but  peer cluster zookeeper shuotdown or invalid  cause  source cluster RS abort.

> Remove invalid peer RegionServer crash
> --------------------------------------
>
>                 Key: HBASE-27249
>                 URL: https://issues.apache.org/jira/browse/HBASE-27249
>             Project: HBase
>          Issue Type: Bug
>            Reporter: zhengsicheng
>            Assignee: zhengsicheng
>            Priority: Major
>
> add_peer 'test', CLUSTER_KEY => "zookeeper-01:2181:/hbase_01"
> remove_peer 'test'
> find add peer wrong, remove peer but regionserver crash
> The log information is as follows:
> 2022-07-18 13:26:11,016 ERROR [ReadOnlyZKClient-zookeeper-01:2181@0x44281bff-SendThread(zookeeper-01:2181)] client.StaticHostProvider: Unable to resolve address: zookeeper-01/<unresolved>:2181
> java.net.UnknownHostException: zookeeper-01
>         at java.base/java.net.InetAddress$CachedAddresses.get(InetAddress.java:800)
>         at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1507)
>         at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1366)
>         at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1300)
>         at org.apache.zookeeper.client.StaticHostProvider$1.getAllByName(StaticHostProvider.java:92)
>         at org.apache.zookeeper.client.StaticHostProvider.resolve(StaticHostProvider.java:147)
>         at org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:375)
>         at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1137)
> 2022-07-18 13:26:11,016 WARN  [ReadOnlyZKClient-zookeeper-01:2181@0x44281bff-SendThread(zookeeper-01:2181)] zookeeper.ClientCnxn: Session 0x0 for server zookeeper-01/<unresolved>:2181, unexpected error, closing socket connection and attempting reconnect
> java.lang.IllegalArgumentException: Unable to canonicalize address zookeeper-01/<unresolved>:2181 because it's not resolvable
>         at org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:71)
>         at org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:39)
>         at org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:1087)
>         at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1139)
> 2022-07-18 13:26:11,116 WARN  [ReadOnlyZKClient-zookeeper-01:2181@0x44281bff] zookeeper.ReadOnlyZKClient: 0x44281bff to zookeeper-01:2181 failed for get of /hbase_01/hbaseid, code = CONNECTIONLOSS, retries = 48
> 2022-07-18 13:26:11,119 WARN  [regionserver/ip1:16020.logRoller] regionserver.ReplicationSource: peerId=test, WAL group ip1%2C16020%2C1658118295598.ip1%2C16020%2C1658118295598.regiongroup-2 queue size: 11 exceeds value of replication.source.log.queue.warn 2
> 2022-07-18 13:26:12,055 INFO  [MemStoreFlusher.1] regionserver.HRegion: Flushing 31bbfb9b76b6795e5d44fabd113174c0 1/2 column families, dataSize=245.67 MB heapSize=257.48 MB; f1={dataSize=245.67 MB, heapSize=257.48 MB, offHeapSize=0 B}
> 2022-07-18 13:26:12,116 ERROR [ReadOnlyZKClient-zookeeper-01:2181@0x44281bff-SendThread(zookeeper-01:2181)] client.StaticHostProvider: Unable to resolve address: zookeeper-01/<unresolved>:2181
> java.net.UnknownHostException: zookeeper-01
>         at java.base/java.net.InetAddress$CachedAddresses.get(InetAddress.java:800)
>         at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1507)
>         at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1366)
>         at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1300)
>         at org.apache.zookeeper.client.StaticHostProvider$1.getAllByName(StaticHostProvider.java:92)
>         at org.apache.zookeeper.client.StaticHostProvider.resolve(StaticHostProvider.java:147)
>         at org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:375)
>         at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1137)
> 2022-07-18 13:26:30,270 INFO  [RS_REFRESH_PEER-regionserver/ip1:16020-1] regionserver.RefreshPeerCallable: Received a peer change event, peerId=test, type=REMOVE_PEER
> 2022-07-18 13:26:30,270 INFO  [RS_REFRESH_PEER-regionserver/ip1:16020-1] regionserver.ReplicationSourceManager: Number of deleted recovered sources for test: 0
> 2022-07-18 13:26:30,270 INFO  [RS_REFRESH_PEER-regionserver/ip1:16020-1] regionserver.ReplicationSource: peerId=test, Closing source test because: Replication stream was removed by a user
> 2022-07-18 13:26:30,271 WARN  [RS_REFRESH_PEER-regionserver/ip1:16020-0.replicationSource,test] client.ConnectionImplementation: Retrieve cluster id failed
> java.lang.InterruptedException
>         at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:385)
>         at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2063)
>         at org.apache.hadoop.hbase.client.ConnectionImplementation.retrieveClusterId(ConnectionImplementation.java:583)
>         at org.apache.hadoop.hbase.client.ConnectionImplementation.<init>(ConnectionImplementation.java:316)
>         at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>         at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:64)
>         at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500)
>         at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:481)
>         at org.apache.hadoop.hbase.client.ConnectionFactory.lambda$createConnection$0(ConnectionFactory.java:230)
>         at java.base/java.security.AccessController.doPrivileged(AccessController.java:691)
>         at java.base/javax.security.auth.Subject.doAs(Subject.java:425)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1830)
>         at org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:347)
>         at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:228)
>         at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:128)
>         at org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.createConnection(HBaseInterClusterReplicationEndpoint.java:140)
>         at org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.init(HBaseInterClusterReplicationEndpoint.java:172)
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.initAndStartReplicationEndpoint(ReplicationSource.java:340)
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.initialize(ReplicationSource.java:557)
>         at java.base/java.lang.Thread.run(Thread.java:832)
> 2022-07-18 13:26:30,271 INFO  [RS_REFRESH_PEER-regionserver/ip1:16020-0.replicationSource,test] zookeeper.RecoverableZooKeeper: Process identifier=connection to cluster: test connecting to ZooKeeper ensemble=zookeeper-01:2181
> 2022-07-18 13:26:30,271 INFO  [RS_REFRESH_PEER-regionserver/ip1:16020-0.replicationSource,test] zookeeper.ZooKeeper: Initiating client connection, connectString=zookeeper-01:2181 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.PendingWatcher@70ad0136
> 2022-07-18 13:26:30,271 INFO  [RS_REFRESH_PEER-regionserver/ip1:16020-0.replicationSource,test] zookeeper.ClientCnxnSocket: jute.maxbuffer value is 67108864 Bytes
> 2022-07-18 13:26:30,272 INFO  [RS_REFRESH_PEER-regionserver/ip1:16020-0.replicationSource,test] zookeeper.ClientCnxn: zookeeper.request.timeout value is 0. feature enabled=
> 2022-07-18 13:26:30,272 ERROR [RS_REFRESH_PEER-regionserver/ip1:16020-0.replicationSource,test-SendThread()] client.StaticHostProvider: Unable to resolve address: zookeeper-01/<unresolved>:2181
> java.net.UnknownHostException: zookeeper-01
>         at java.base/java.net.InetAddress$CachedAddresses.get(InetAddress.java:800)
>         at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1507)
>         at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1366)
>         at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1300)
>         at org.apache.zookeeper.client.StaticHostProvider$1.getAllByName(StaticHostProvider.java:92)
>         at org.apache.zookeeper.client.StaticHostProvider.resolve(StaticHostProvider.java:147)
>         at org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:375)
>         at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1137)
> 2022-07-18 13:26:30,272 ERROR [RS_REFRESH_PEER-regionserver/ip1:16020-0.replicationSource,test] regionserver.ReplicationSource: Unexpected exception in RS_REFRESH_PEER-regionserver/ip1:16020-0.replicationSource,test currentPath=null
> java.lang.IllegalStateException: Source should be active.
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.initialize(ReplicationSource.java:581)
>         at java.base/java.lang.Thread.run(Thread.java:832)
> 2022-07-18 13:26:30,272 WARN  [RS_REFRESH_PEER-regionserver/ip1:16020-0.replicationSource,test-SendThread(zookeeper-01:2181)] zookeeper.ClientCnxn: Session 0x0 for server zookeeper-01/<unresolved>:2181, unexpected error, closing socket connection and attempting reconnect
> java.lang.IllegalArgumentException: Unable to canonicalize address zookeeper-01/<unresolved>:2181 because it's not resolvable
>         at org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:71)
>         at org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:39)
>         at org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:1087)
>         at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1139)
> 2022-07-18 13:26:30,274 ERROR [RS_REFRESH_PEER-regionserver/ip1:16020-0.replicationSource,test] regionserver.HRegionServer: ***** ABORTING region server ip1,16020,1658118295598: Unexpected exception in RS_REFRESH_PEER-regionserver/ip1:16020-0.replicationSource,test *****
> java.lang.IllegalStateException: Source should be active.
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.initialize(ReplicationSource.java:581)
>         at java.base/java.lang.Thread.run(Thread.java:832)
> 2022-07-18 13:26:30,275 ERROR [RS_REFRESH_PEER-regionserver/ip1:16020-0.replicationSource,test] regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: [org.apache.hadoop.hbase.security.access.AccessController, org.apache.hadoop.hbase.replication.regionserver.ReplicationObserver]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)