You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by "Sudharsan Sampath (JIRA)" <ji...@apache.org> on 2011/07/01 08:10:28 UTC
[jira] [Created] (MAPREDUCE-2635) Jobs hang indefinitely on
failure.
Jobs hang indefinitely on failure.
----------------------------------
Key: MAPREDUCE-2635
URL: https://issues.apache.org/jira/browse/MAPREDUCE-2635
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: jobtracker, task-controller, tasktracker
Affects Versions: 0.20.2, 0.20.1
Environment: Suse Linux cluster with 2 nodes. One running a jobtracker, namenode, datanode, tasktracker. Other running tasktracker, datanode.
Reporter: Sudharsan Sampath
Priority: Blocker
Running the following example hangs the child job indefinitely.
public class HaltCluster
{
public static void main(String[] args) throws IOException
{
JobConf jobConf = new JobConf();
prepareConf(jobConf);
if (args != null && args.length > 0)
{
jobConf.set("callonceagain", args[0]);
jobConf.setMaxMapAttempts(1);
jobConf.setJobName("ParentJob");
}
JobClient.runJob(jobConf);
}
public static void prepareConf(JobConf jobConf)
{
jobConf.setJarByClass(HaltCluster.class);
jobConf.set("mapred.job.tracker", "<<jobtracker>>");
jobConf.set("fs.default.name", "<<hdfs>>");
MultipleInputs.addInputPath(jobConf, new Path("/ignore" + System.currentTimeMillis()), MyInputFormat.class);
jobConf.setJobName("ChildJob");
jobConf.setMapperClass(MyMapper.class);
jobConf.setOutputFormat(NullOutputFormat.class);
jobConf.setNumReduceTasks(0);
}
}
public class MyMapper implements Mapper<IntWritable, Text, NullWritable, NullWritable>
{
JobConf myConf = null;
@Override
public void map(IntWritable arg0, Text arg1, OutputCollector<NullWritable, NullWritable> arg2, Reporter arg3) throws IOException
{
if (myConf != null && "true".equals(myConf.get("callonceagain")))
{
startBackGroundReporting(arg3);
HaltCluster.main(new String[] {});
}
throw new RuntimeException("Throwing exception");
}
private void startBackGroundReporting(final Reporter arg3)
{
Thread t = new Thread()
{
@Override
public void run()
{
while (true)
{
arg3.setStatus("Reporting to be alive at " + System.currentTimeMillis());
}
}
};
t.setDaemon(true);
t.start();
}
@Override
public void configure(JobConf arg0)
{
myConf = arg0;
}
@Override
public void close() throws IOException
{
// TODO Auto-generated method stub
}
}
run using the following command
java -cp <<classpath>> HaltCluster true
But if only one job is triggered as java -cp <<classpath>> HaltCluster
it fails to max number of attempts and quits as expected.
Also, when the jobs hang, running the child job once again, makes it come out of deadlock and completes the three jobs.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-2635) Jobs hang indefinitely on
failure.
Posted by "Harsh J (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Harsh J resolved MAPREDUCE-2635.
--------------------------------
Resolution: Cannot Reproduce
I'm still unable to reproduce this. And Oozie does similar things, but never have we run into such a situation. At best, this was probably a local issue. If we run into this again someday and have logs, we can reopen it.
Thanks all!
> Jobs hang indefinitely on failure.
> ----------------------------------
>
> Key: MAPREDUCE-2635
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2635
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: jobtracker, task-controller, tasktracker
> Affects Versions: 0.20.1, 0.20.2
> Environment: Suse Linux cluster with 2 nodes. One running a jobtracker, namenode, datanode, tasktracker. Other running tasktracker, datanode.
> Reporter: Sudharsan Sampath
> Priority: Blocker
>
> Running the following example hangs the child job indefinitely.
> public class HaltCluster
> {
> public static void main(String[] args) throws IOException
> {
> JobConf jobConf = new JobConf();
> prepareConf(jobConf);
> if (args != null && args.length > 0)
> {
> jobConf.set("callonceagain", args[0]);
> jobConf.setMaxMapAttempts(1);
> jobConf.setJobName("ParentJob");
> }
> JobClient.runJob(jobConf);
> }
> public static void prepareConf(JobConf jobConf)
> {
> jobConf.setJarByClass(HaltCluster.class);
> jobConf.set("mapred.job.tracker", "<<jobtracker>>");
> jobConf.set("fs.default.name", "<<hdfs>>");
> MultipleInputs.addInputPath(jobConf, new Path("/ignore" + System.currentTimeMillis()), MyInputFormat.class);
> jobConf.setJobName("ChildJob");
> jobConf.setMapperClass(MyMapper.class);
> jobConf.setOutputFormat(NullOutputFormat.class);
> jobConf.setNumReduceTasks(0);
> }
> }
> public class MyMapper implements Mapper<IntWritable, Text, NullWritable, NullWritable>
> {
> JobConf myConf = null;
> @Override
> public void map(IntWritable arg0, Text arg1, OutputCollector<NullWritable, NullWritable> arg2, Reporter arg3) throws IOException
> {
> if (myConf != null && "true".equals(myConf.get("callonceagain")))
> {
> startBackGroundReporting(arg3);
> HaltCluster.main(new String[] {});
> }
> throw new RuntimeException("Throwing exception");
> }
> private void startBackGroundReporting(final Reporter arg3)
> {
> Thread t = new Thread()
> {
> @Override
> public void run()
> {
> while (true)
> {
> arg3.setStatus("Reporting to be alive at " + System.currentTimeMillis());
> }
> }
> };
> t.setDaemon(true);
> t.start();
> }
> @Override
> public void configure(JobConf arg0)
> {
> myConf = arg0;
> }
> @Override
> public void close() throws IOException
> {
> // TODO Auto-generated method stub
> }
> }
> run using the following command
> java -cp <<classpath>> HaltCluster true
> But if only one job is triggered as java -cp <<classpath>> HaltCluster
> it fails to max number of attempts and quits as expected.
> Also, when the jobs hang, running the child job once again, makes it come out of deadlock and completes the three jobs.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira