You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by "Harsh J (JIRA)" <ji...@apache.org> on 2012/07/07 15:03:34 UTC
[jira] [Resolved] (MAPREDUCE-2635) Jobs hang indefinitely on
failure.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Harsh J resolved MAPREDUCE-2635.
--------------------------------
Resolution: Cannot Reproduce
I'm still unable to reproduce this. And Oozie does similar things, but never have we run into such a situation. At best, this was probably a local issue. If we run into this again someday and have logs, we can reopen it.
Thanks all!
> Jobs hang indefinitely on failure.
> ----------------------------------
>
> Key: MAPREDUCE-2635
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2635
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: jobtracker, task-controller, tasktracker
> Affects Versions: 0.20.1, 0.20.2
> Environment: Suse Linux cluster with 2 nodes. One running a jobtracker, namenode, datanode, tasktracker. Other running tasktracker, datanode.
> Reporter: Sudharsan Sampath
> Priority: Blocker
>
> Running the following example hangs the child job indefinitely.
> public class HaltCluster
> {
> public static void main(String[] args) throws IOException
> {
> JobConf jobConf = new JobConf();
> prepareConf(jobConf);
> if (args != null && args.length > 0)
> {
> jobConf.set("callonceagain", args[0]);
> jobConf.setMaxMapAttempts(1);
> jobConf.setJobName("ParentJob");
> }
> JobClient.runJob(jobConf);
> }
> public static void prepareConf(JobConf jobConf)
> {
> jobConf.setJarByClass(HaltCluster.class);
> jobConf.set("mapred.job.tracker", "<<jobtracker>>");
> jobConf.set("fs.default.name", "<<hdfs>>");
> MultipleInputs.addInputPath(jobConf, new Path("/ignore" + System.currentTimeMillis()), MyInputFormat.class);
> jobConf.setJobName("ChildJob");
> jobConf.setMapperClass(MyMapper.class);
> jobConf.setOutputFormat(NullOutputFormat.class);
> jobConf.setNumReduceTasks(0);
> }
> }
> public class MyMapper implements Mapper<IntWritable, Text, NullWritable, NullWritable>
> {
> JobConf myConf = null;
> @Override
> public void map(IntWritable arg0, Text arg1, OutputCollector<NullWritable, NullWritable> arg2, Reporter arg3) throws IOException
> {
> if (myConf != null && "true".equals(myConf.get("callonceagain")))
> {
> startBackGroundReporting(arg3);
> HaltCluster.main(new String[] {});
> }
> throw new RuntimeException("Throwing exception");
> }
> private void startBackGroundReporting(final Reporter arg3)
> {
> Thread t = new Thread()
> {
> @Override
> public void run()
> {
> while (true)
> {
> arg3.setStatus("Reporting to be alive at " + System.currentTimeMillis());
> }
> }
> };
> t.setDaemon(true);
> t.start();
> }
> @Override
> public void configure(JobConf arg0)
> {
> myConf = arg0;
> }
> @Override
> public void close() throws IOException
> {
> // TODO Auto-generated method stub
> }
> }
> run using the following command
> java -cp <<classpath>> HaltCluster true
> But if only one job is triggered as java -cp <<classpath>> HaltCluster
> it fails to max number of attempts and quits as expected.
> Also, when the jobs hang, running the child job once again, makes it come out of deadlock and completes the three jobs.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira