You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Ravi Gummadi (Commented) (JIRA)" <ji...@apache.org> on 2011/11/21 12:48:51 UTC

[jira] [Commented] (MAPREDUCE-3440) Add tests for testing other NM components with disk failures

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13154126#comment-13154126 ] 

Ravi Gummadi commented on MAPREDUCE-3440:
-----------------------------------------

MAPREDUCE-3121 has tests for the basic functionality of DFIP:
* Node's LocalDirsHandlerService identifies dirs' failures.
* Node is marked unhealthy when major percentage of dirs go bad.
* RM stops scheduling when major percentage of dirs go bad.

But some more tests can be added to test other components when disk failures happen: Here is the list mentioned by Vinod on MAPREDUCE-3121:
* Integration test: Run a mapreduce job (so that Shuffle is also verified), offline some disks, run one more job and verify that both the apps pass.
* LogAggregation test: Verify that logs written on bad disks are ignored for aggregation (augment TestLogAggregationService) TODO:
* ContainerLaunch: Verify that
** new containers don't use bad directories(by testing the LOCAL_DIRS env in a custom map job).
** if major percentage disks turn bad,
      *** container should exit with proper exit code(should be easy with a custom application).
      *** localization for a resource fails.
                
> Add tests for testing other NM components with disk failures
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-3440
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3440
>             Project: Hadoop Map/Reduce
>          Issue Type: Test
>    Affects Versions: 0.23.0
>            Reporter: Ravi Gummadi
>
> Add more tests to test other components when disks fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira