You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "star (JIRA)" <ji...@apache.org> on 2019/03/19 03:02:00 UTC

[jira] [Created] (HDFS-14378) Simplify the design of multiple NN and both logic of edit log roll and checkpoint

star created HDFS-14378:
---------------------------

             Summary: Simplify the design of multiple NN and both logic of edit log roll and checkpoint
                 Key: HDFS-14378
                 URL: https://issues.apache.org/jira/browse/HDFS-14378
             Project: Hadoop HDFS
          Issue Type: Improvement
          Components: namenode
            Reporter: star
            Assignee: star


      HDFS-6440 introduced a mechanism to support more than 2 NNs. It implements a first-writer-win policy to avoid duplicated fsimage downloading. Variable 'isPrimaryCheckPointer' is used to hold the first-writer state, with which SNN will provide fsimage for ANN next time. Then we have three roles in NN cluster: ANN, one primary SNN, one or more normal SNN.

      Since HDFS-12248, there may be more than two primary SNN shortly after a exception occurred. It takes care with a scenario  that SNN will not upload fsimage on IOE and Interrupted exceptions. Though it will not cause any further functional issues, it is inconsistent. 

      Futher more, edit log may be rolled more frequently than necessary with multiple Standby name nodes, HDFS-14349. (I'm not so sure about this, will verify by unit tests or any could point it out.)

      Above all, I‘m wondering if we could make it simple with following changes:
 * There are only two roles:ANN, SNN
 * ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period.
 * ANN will select a SNN to download checkpoint.

SNN will just do logtail and checkpoint. Then provide a servlet for fsimage downloading as normal. SNN will not try to roll edit log or send checkpoint request to ANN.

In a word, ANN will be more active. Suggestions are welcomed.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org