You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Devaraj Das (JIRA)" <ji...@apache.org> on 2008/11/12 15:53:44 UTC
[jira] Updated: (HADOOP-2483) Large-scale reliability tests
[ https://issues.apache.org/jira/browse/HADOOP-2483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Devaraj Das updated HADOOP-2483:
--------------------------------
Component/s: (was: test)
mapred
Fix Version/s: 0.20.0
Assignee: Devaraj Das
> Large-scale reliability tests
> -----------------------------
>
> Key: HADOOP-2483
> URL: https://issues.apache.org/jira/browse/HADOOP-2483
> Project: Hadoop Core
> Issue Type: Test
> Components: mapred
> Reporter: Arun C Murthy
> Assignee: Devaraj Das
> Fix For: 0.20.0
>
>
> The fact that we do not have any large-scale reliability tests bothers me. I'll be first to admit that it isn't the easiest of tasks, but I'd like to start a discussion around this... especially given that the code-base is growing to an extent that interactions due to small changes are very hard to predict.
> One of the simple scripts I run for every patch I work on does something very simple: run sort500 (or greater), then it randomly picks n tasktrackers from ${HADOOP_CONF_DIR}/conf/slaves and then kills them, a similar script one kills and restarts the tasktrackers.
> This helps in checking a fair number of reliability stories: lost tasktrackers, task-failures etc. Clearly this isn't good enough to cover everything, but a start.
> Lets discuss - What do we do for HDFS? We need more for Map-Reduce!
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.