You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "Boris Bondarenko (Jira)" <ji...@apache.org> on 2021/10/28 17:06:00 UTC

[jira] [Created] (HDFS-16289) Hadoop HA checkpointer issue

Boris Bondarenko created HDFS-16289:
---------------------------------------

             Summary: Hadoop HA checkpointer issue 
                 Key: HDFS-16289
                 URL: https://issues.apache.org/jira/browse/HDFS-16289
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: dfs
    Affects Versions: 3.2.2
            Reporter: Boris Bondarenko


In HA setup active namenode will reject fsimage sync from one of the two standby namenodes all the time. This maybe an edge case, in our environment it primarily affect standby cluster. What we experienced was memory problem on standby namenodes in the scenario when the standby node was not able to complete sync cycle for a long time.

It is my understanding that the break out from the loop will only happen when doCheckpoint call succeeds otherwise it throws an exception and continues.

I can provide more details on my findings with code references if necessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org