You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Harsh J (JIRA)" <ji...@apache.org> on 2012/07/07 15:03:34 UTC
[jira] [Resolved] (MAPREDUCE-2635) Jobs hang indefinitely on failure.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Harsh J resolved MAPREDUCE-2635.
--------------------------------

    Resolution: Cannot Reproduce

I'm still unable to reproduce this. And Oozie does similar things, but never have we run into such a situation. At best, this was probably a local issue. If we run into this again someday and have logs, we can reopen it.

Thanks all!
                
> Jobs hang indefinitely on failure.
> ----------------------------------
>
>                 Key: MAPREDUCE-2635
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2635
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker, task-controller, tasktracker
>    Affects Versions: 0.20.1, 0.20.2
>         Environment: Suse Linux cluster with 2 nodes. One running a jobtracker, namenode, datanode, tasktracker. Other running tasktracker, datanode.
>            Reporter: Sudharsan Sampath
>            Priority: Blocker
>
> Running the following example hangs the child job indefinitely.
> public class HaltCluster
> {
>   public static void main(String[] args) throws IOException
>   {
>     JobConf jobConf = new JobConf();
>     prepareConf(jobConf);
>     if (args != null && args.length > 0)
>     {
>       jobConf.set("callonceagain", args[0]);
>       jobConf.setMaxMapAttempts(1);
>       jobConf.setJobName("ParentJob");
>     }
>     JobClient.runJob(jobConf);
>   }
>   public static void prepareConf(JobConf jobConf)
>   {
>     jobConf.setJarByClass(HaltCluster.class);
>     jobConf.set("mapred.job.tracker", "<<jobtracker>>");
>     jobConf.set("fs.default.name", "<<hdfs>>");
>     MultipleInputs.addInputPath(jobConf, new Path("/ignore" + System.currentTimeMillis()), MyInputFormat.class);
>     jobConf.setJobName("ChildJob");
>     jobConf.setMapperClass(MyMapper.class);
>     jobConf.setOutputFormat(NullOutputFormat.class);
>     jobConf.setNumReduceTasks(0);
>   }
> }
> public class MyMapper implements Mapper<IntWritable, Text, NullWritable, NullWritable>
> {
>   JobConf myConf = null;
>   @Override
>   public void map(IntWritable arg0, Text arg1, OutputCollector<NullWritable, NullWritable> arg2, Reporter arg3) throws IOException
>   {
>     if (myConf != null && "true".equals(myConf.get("callonceagain")))
>     {
>       startBackGroundReporting(arg3);
>       HaltCluster.main(new String[] {});
>     }
>     throw new RuntimeException("Throwing exception");
>   }
>   private void startBackGroundReporting(final Reporter arg3)
>   {
>     Thread t = new Thread()
>     {
>       @Override
>       public void run()
>       {
>         while (true)
>         {
>           arg3.setStatus("Reporting to be alive at " + System.currentTimeMillis());
>         }
>       }
>     };
>     t.setDaemon(true);
>     t.start();
>   }
>   @Override
>   public void configure(JobConf arg0)
>   {
>     myConf = arg0;
>   }
>   @Override
>   public void close() throws IOException
>   {
>     // TODO Auto-generated method stub
>   }
> }
> run using the following command
> java -cp <<classpath>> HaltCluster true
> But if only one job is triggered as java -cp <<classpath>> HaltCluster
> it fails to max number of attempts and quits as expected.
> Also, when the jobs hang, running the child job once again, makes it come out of deadlock and completes the three jobs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira