You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "Viraj Jasani (Jira)" <ji...@apache.org> on 2023/03/09 19:06:00 UTC

[jira] [Created] (HDFS-16947) RBF NamenodeHeartbeatService to report error for not being able to register namenode in state store

Viraj Jasani created HDFS-16947:
-----------------------------------

             Summary: RBF NamenodeHeartbeatService to report error for not being able to register namenode in state store
                 Key: HDFS-16947
                 URL: https://issues.apache.org/jira/browse/HDFS-16947
             Project: Hadoop HDFS
          Issue Type: Improvement
            Reporter: Viraj Jasani
            Assignee: Viraj Jasani


Namenode heartbeat service should provide error with full stacktrace if it cannot register namenode in the state store. As of today, we only log info msg.

For zookeeper based impl, this might mean either a) curator manager is not initialized or b) if it fails to write to znode after exhausting retries. For either of these cases, reporting only INFO log might not be good enough and we might have to look for errors elsewhere.

 

Sample example:
{code:java}
2023-02-20 23:10:33,714 DEBUG [NamenodeHeartbeatService {ns} nn0-0] router.NamenodeHeartbeatService - Received service state: ACTIVE from HA namenode: {ns}-nn0:nn-0-{ns}.{cluster}:9000
2023-02-20 23:10:33,731 INFO  [NamenodeHeartbeatService {ns} nn0-0] impl.MembershipStoreImpl - Inserting new NN registration: nn-0.namenode.{cluster}:8888->{ns}:nn0:nn-0-{ns}.{cluster}:9000-ACTIVE
2023-02-20 23:10:33,731 INFO  [NamenodeHeartbeatService {ns} nn0-0] router.NamenodeHeartbeatService - Cannot register namenode in the State Store
 {code}
If we could log full stacktrace:
{code:java}
2023-02-21 00:20:24,691 ERROR [NamenodeHeartbeatService {ns} nn0-0] router.NamenodeHeartbeatService - Cannot register namenode in the State Store
org.apache.hadoop.hdfs.server.federation.store.StateStoreUnavailableException: State Store driver StateStoreZooKeeperImpl in nn-0.namenode.{cluster} is not ready.
        at org.apache.hadoop.hdfs.server.federation.store.driver.StateStoreDriver.verifyDriverReady(StateStoreDriver.java:158)
        at org.apache.hadoop.hdfs.server.federation.store.driver.impl.StateStoreZooKeeperImpl.putAll(StateStoreZooKeeperImpl.java:235)
        at org.apache.hadoop.hdfs.server.federation.store.driver.impl.StateStoreBaseImpl.put(StateStoreBaseImpl.java:74)
        at org.apache.hadoop.hdfs.server.federation.store.impl.MembershipStoreImpl.namenodeHeartbeat(MembershipStoreImpl.java:179)
        at org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.registerNamenode(MembershipNamenodeResolver.java:381)
        at org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.updateState(NamenodeHeartbeatService.java:317)
        at org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.lambda$periodicInvoke$0(NamenodeHeartbeatService.java:244)
...
... {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org