You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-dev@hadoop.apache.org by "Viraj Jasani (Jira)" <ji...@apache.org> on 2023/02/16 21:41:00 UTC

[jira] [Resolved] (HDFS-16918) Optionally shut down datanode if it does not stay connected to active namenode

     [ https://issues.apache.org/jira/browse/HDFS-16918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Viraj Jasani resolved HDFS-16918.
---------------------------------
    Resolution: Won't Fix

> Optionally shut down datanode if it does not stay connected to active namenode
> ------------------------------------------------------------------------------
>
>                 Key: HDFS-16918
>                 URL: https://issues.apache.org/jira/browse/HDFS-16918
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Viraj Jasani
>            Assignee: Viraj Jasani
>            Priority: Major
>              Labels: pull-request-available
>
> While deploying Hdfs on Envoy proxy setup, depending on the socket timeout configured at envoy, the network connection issues or packet loss could be observed. All of envoys basically form a transparent communication mesh in which each app can send and receive packets to and from localhost and is unaware of the network topology.
> The primary purpose of Envoy is to make the network transparent to applications, in order to identify network issues reliably. However, sometimes such proxy based setup could result into socket connection issues b/ datanode and namenode.
> Many deployment frameworks provide auto-start functionality when any of the hadoop daemons are stopped. If a given datanode does not stay connected to active namenode in the cluster i.e. does not receive heartbeat response in time from active namenode (even though active namenode is not terminated), it would not be much useful. We should be able to provide configurable behavior such that if a given datanode cannot receive heartbeat response from active namenode in configurable time duration, it should terminate itself to avoid impacting the availability SLA. This is specifically helpful when the underlying deployment or observability framework (e.g. K8S) can start up the datanode automatically upon it's shutdown (unless it is being restarted as part of rolling upgrade) and help the newly brought up datanode (in case of k8s, a new pod with dynamically changing nodes) establish new socket connection to active and standby namenodes. This should be an opt-in behavior and not default one.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org