You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2012/09/07 05:28:07 UTC

[jira] [Created] (HDFS-3901) QJM: send 'heartbeat' messages to JNs even when they are out-of-sync

Todd Lipcon created HDFS-3901:
---------------------------------

             Summary: QJM: send 'heartbeat' messages to JNs even when they are out-of-sync
                 Key: HDFS-3901
                 URL: https://issues.apache.org/jira/browse/HDFS-3901
             Project: Hadoop HDFS
          Issue Type: Sub-task
    Affects Versions: QuorumJournalManager (HDFS-3077)
            Reporter: Todd Lipcon
            Assignee: Todd Lipcon


Currently, if one of the JNs has fallen out of sync with the writer (eg because it went down), it will be marked as such until the next log roll. This causes the writer to no longer send any RPCs to it. This means that the JN's metrics will no longer reflect up-to-date information on how far laggy they are.

This patch will introduce a heartbeat() RPC that has no effect except to update the JN's view of the latest committed txid. When the writer is talking to an out-of-sync logger, it will send these heartbeat messages once a second.

In a future patch we can extend the heartbeat functionality so that NNs periodically check their connections to JNs if no edits arrive, such that a fenced NN won't accidentally continue to serve reads indefinitely.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-3901) QJM: send 'heartbeat' messages to JNs even when they are out-of-sync

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HDFS-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon resolved HDFS-3901.
-------------------------------

       Resolution: Fixed
    Fix Version/s: QuorumJournalManager (HDFS-3077)
     Hadoop Flags: Reviewed

Committed to branch, thanks for the reviews, Eli and ATM.
                
> QJM: send 'heartbeat' messages to JNs even when they are out-of-sync
> --------------------------------------------------------------------
>
>                 Key: HDFS-3901
>                 URL: https://issues.apache.org/jira/browse/HDFS-3901
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: QuorumJournalManager (HDFS-3077)
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: QuorumJournalManager (HDFS-3077)
>
>         Attachments: hdfs-3901.txt, hdfs-3901.txt
>
>
> Currently, if one of the JNs has fallen out of sync with the writer (eg because it went down), it will be marked as such until the next log roll. This causes the writer to no longer send any RPCs to it. This means that the JN's metrics will no longer reflect up-to-date information on how far laggy they are.
> This patch will introduce a heartbeat() RPC that has no effect except to update the JN's view of the latest committed txid. When the writer is talking to an out-of-sync logger, it will send these heartbeat messages once a second.
> In a future patch we can extend the heartbeat functionality so that NNs periodically check their connections to JNs if no edits arrive, such that a fenced NN won't accidentally continue to serve reads indefinitely.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira