You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "Viraj Jasani (Jira)" <ji...@apache.org> on 2023/02/01 06:06:00 UTC

[jira] [Created] (HDFS-16902) Add Namenode status to BPServiceActor metrics and improve logging in offerservice

Viraj Jasani created HDFS-16902:
-----------------------------------

             Summary: Add Namenode status to BPServiceActor metrics and improve logging in offerservice
                 Key: HDFS-16902
                 URL: https://issues.apache.org/jira/browse/HDFS-16902
             Project: Hadoop HDFS
          Issue Type: Task
            Reporter: Viraj Jasani
            Assignee: Viraj Jasani


Recently came across an k8s environment where randomly some datanode pods are not able to stay connected to all namenode pods (e.g. last heartbeat time stays higher than 2 hr sometimes). When new namenode becomes active, any datanode that is not heartbeating to it would not be able to send any further block reports, leading to missing replicas sometimes, which would be resolved only with datanode pod restart.

While the issue seems env specific, BPServiceActor's offer service could use some logging improvements. It is also good to get namenode status exposed with BPServiceActorInfo to identify any lags from datanode side in recognizing updated Active namenode status with heartbeats.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org