You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-dev@hadoop.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2012/09/01 09:40:08 UTC

[jira] [Created] (HDFS-3885) QJM: optimize log sync when JN is lagging behind

Todd Lipcon created HDFS-3885:
---------------------------------

             Summary: QJM: optimize log sync when JN is lagging behind
                 Key: HDFS-3885
                 URL: https://issues.apache.org/jira/browse/HDFS-3885
             Project: Hadoop HDFS
          Issue Type: Sub-task
    Affects Versions: QuorumJournalManager (HDFS-3077)
            Reporter: Todd Lipcon


This is a potential optimization that we can add to the JournalNode: when one of the nodes is lagging behind the others (eg because its local disk is slower or there was a network blip), it receives edits after they've been committed to a majority. It can tell this because the committed txid included in the request info is higher than the highest txid in the actual batch to be written. In this case, we know that this batch has already been fsynced to a quorum of nodes, so we can skip the fsync() on the laggy node, helping it to catch back up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-3885) QJM: optimize log sync when JN is lagging behind

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HDFS-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon resolved HDFS-3885.
-------------------------------

       Resolution: Fixed
    Fix Version/s: QuorumJournalManager (HDFS-3077)
     Hadoop Flags: Reviewed

Committed to branch, thanks for the review.
                
> QJM: optimize log sync when JN is lagging behind
> ------------------------------------------------
>
>                 Key: HDFS-3885
>                 URL: https://issues.apache.org/jira/browse/HDFS-3885
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: QuorumJournalManager (HDFS-3077)
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: QuorumJournalManager (HDFS-3077)
>
>         Attachments: hdfs-3885.txt
>
>
> This is a potential optimization that we can add to the JournalNode: when one of the nodes is lagging behind the others (eg because its local disk is slower or there was a network blip), it receives edits after they've been committed to a majority. It can tell this because the committed txid included in the request info is higher than the highest txid in the actual batch to be written. In this case, we know that this batch has already been fsynced to a quorum of nodes, so we can skip the fsync() on the laggy node, helping it to catch back up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira