You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-issues@hadoop.apache.org by "Robert Joseph Evans (JIRA)" <ji...@apache.org> on 2012/11/29 21:16:58 UTC

[jira] [Created] (MAPREDUCE-4832) MR AM can get in a split brain situation

Robert Joseph Evans created MAPREDUCE-4832:
----------------------------------------------

             Summary: MR AM can get in a split brain situation
                 Key: MAPREDUCE-4832
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4832
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: applicationmaster
    Affects Versions: 0.23.5, 2.0.2-alpha
            Reporter: Robert Joseph Evans


It is possible for a networking issue to happen where the RM thinks an AM has gone down and launches a replacement, but the previous AM is still up and running.  If the previous AM does not need any more resources from the RM it could try to commit either tasks or jobs.  This could cause lots of problems where the second AM finishes and tries to commit too.  This could result in data corruption.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4832) MR AM can get in a split brain situation

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans updated MAPREDUCE-4832:
-------------------------------------------

    Priority: Critical  (was: Major)
    
> MR AM can get in a split brain situation
> ----------------------------------------
>
>                 Key: MAPREDUCE-4832
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4832
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster
>    Affects Versions: 2.0.2-alpha, 0.23.5
>            Reporter: Robert Joseph Evans
>            Priority: Critical
>
> It is possible for a networking issue to happen where the RM thinks an AM has gone down and launches a replacement, but the previous AM is still up and running.  If the previous AM does not need any more resources from the RM it could try to commit either tasks or jobs.  This could cause lots of problems where the second AM finishes and tries to commit too.  This could result in data corruption.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4832) MR AM can get in a split brain situation

Posted by "Jason Lowe (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507341#comment-13507341 ] 

Jason Lowe commented on MAPREDUCE-4832:
---------------------------------------

MAPREDUCE-2702 is focused on FileOutputCommitter, but doesn't address the issue when that is not the committer being used.
                
> MR AM can get in a split brain situation
> ----------------------------------------
>
>                 Key: MAPREDUCE-4832
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4832
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster
>    Affects Versions: 2.0.2-alpha, 0.23.5
>            Reporter: Robert Joseph Evans
>            Priority: Critical
>
> It is possible for a networking issue to happen where the RM thinks an AM has gone down and launches a replacement, but the previous AM is still up and running.  If the previous AM does not need any more resources from the RM it could try to commit either tasks or jobs.  This could cause lots of problems where the second AM finishes and tries to commit too.  This could result in data corruption.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4832) MR AM can get in a split brain situation

Posted by "Sharad Agarwal (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507281#comment-13507281 ] 

Sharad Agarwal commented on MAPREDUCE-4832:
-------------------------------------------

MAPREDUCE-2702 introduced a job level commit. Task commit from competing AMs should not overstep each other.

Job level commit should ensure only the first one succeeds. Are you observing this in the cluster ?
                
> MR AM can get in a split brain situation
> ----------------------------------------
>
>                 Key: MAPREDUCE-4832
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4832
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster
>    Affects Versions: 2.0.2-alpha, 0.23.5
>            Reporter: Robert Joseph Evans
>            Priority: Critical
>
> It is possible for a networking issue to happen where the RM thinks an AM has gone down and launches a replacement, but the previous AM is still up and running.  If the previous AM does not need any more resources from the RM it could try to commit either tasks or jobs.  This could cause lots of problems where the second AM finishes and tries to commit too.  This could result in data corruption.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4832) MR AM can get in a split brain situation

Posted by "Jason Lowe (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507684#comment-13507684 ] 

Jason Lowe commented on MAPREDUCE-4832:
---------------------------------------

One way to fix this is to leverage the RM as a central authority in detecting a split brain situation.  The RM will already tell AMs to die, via AMResponse.getReboot(), if they are not a valid app attempt.  We could use a concept of a "commit window" indicating a relatively short amount of time.  This window should be large enough to encompass how long the AM will take to respond to a heartbeat and turn around and start committing.  This should be at least as long as the AM heartbeat interval but not a lot longer (e.g.: 10 seconds).  Here's a walkthrough of how it would work for a task (job commit is similar):

# Task requests to commit, AM tells it to hold off for now
# AM writes a task-starting-commit event to job history file and ensures it is flushed to disk before progressing to next step
# Next time task asks to commit, AM checks the last time it received a valid heartbeat response from the RM
#* If last heartbeat was within the commit window then the AM responds that the task can start committing
#* If last heartbeat was outside the commit window then the AM responds that the task must hold off for now while it waits to receive a valid heartbeat from the RM to verify it is still a valid app attempt

And here's how subsequent attempts would handle recovery:

# AM waits for the duration of the commit window before reading the previous attempt's history file to allow any potential lingering task-starting-commit messages to be written to the job history file
# If the history file shows a task started committing but did not complete then we treat this as if the task commit failed, i.e.: we fail the task
# If the history file shows the job started committing but did not complete then we treat this as if the job commit failed, i.e.: we fail the job

The initial wait during recovery could be reduced if the RM told the attempt when the last time it heard a valid heartbeat from a prior attempt.  Then the subsequent attempt could subtract this amount of time from the commit window (which in many cases would eliminate the need to wait at all).

There's some performance concerns around sync'ing task-starting-commmit events, although we could play tradeoff games where we delay sync'ing a bit in hopes other tasks will want to commit as well so we can batch them together and amortize the sync cost.  However I do think that we need to ensure a task-is-committing-so-do-not-repeat-it marker needs to be persisted somewhere before progressing otherwise we could double-commit.  Only way we might be able to avoid that is if the committer interface allowed the AM to determine that it commits in such a way where double-commits are not a concern.  The task-starting-commit events could be skipped in that case, but that's a committer API change.
                
> MR AM can get in a split brain situation
> ----------------------------------------
>
>                 Key: MAPREDUCE-4832
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4832
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster
>    Affects Versions: 2.0.2-alpha, 0.23.5
>            Reporter: Robert Joseph Evans
>            Priority: Critical
>
> It is possible for a networking issue to happen where the RM thinks an AM has gone down and launches a replacement, but the previous AM is still up and running.  If the previous AM does not need any more resources from the RM it could try to commit either tasks or jobs.  This could cause lots of problems where the second AM finishes and tries to commit too.  This could result in data corruption.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira