You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Nitika Gupta <ng...@rocketfuel.com> on 2011/12/02 01:57:34 UTC
Re: Issue : Hadoop mapreduce job to process S3 logs gets hung at INFO mapred.JobClient: map 0% reduce 0%

Regarding the user logs of tasktracker, there is nothing interesting
there. That is the thing, tasktracker did not
pick the task that was assigned to it.

Any idea why the mapper is not picking up the task?

Thanks

Nitika

On Mon, Nov 28, 2011 at 9:53 PM, Prashant Sharma
<pr...@gmail.com> wrote:
> Can you check your userlogs/xyz_attempt_xyz.log and also jobtracker and
> datanode logs.
>
> -P
>
> On Tue, Nov 29, 2011 at 4:17 AM, Nitika Gupta <ng...@rocketfuelinc.com>wrote:
>
>> Hi All,
>>
>> I am trying to run a mapreduce job to process the Amazon S3 logs.
>> However, the code hangs at INFO mapred.JobClient: map 0% reduce 0% and
>> does not even attempt to launch the tasks. The sample code for the job
>> setup is given below:
>>
>> public int run(CommandLine cl) throws Exception
>> {
>> Configuration conf = getConf();
>> String inputPath = "";
>> String outputPath = "";
>> try
>> {
>> Job job = new Job(conf, "Dummy");
>> job.setNumReduceTasks(0);
>> job.setMapperClass(Mapper.class);
>> inputPath = cl.getOptionValue("input"); //input is an s3n path
>> outputPath = cl.getOptionValue("output");
>> FileInputFormat.setInputPaths(job, inputPath);
>> FileOutputFormat.setOutputPath(job, new Path(outputPath));
>> _log.info("Input path set as " + inputPath);
>> _log.info("Output path set as " + outputPath);
>> job.waitForCompletion(true); return 0;
>> }
>> catch (Exception ex)
>> {
>> _log.error(ex); return 1; }
>> }
>> The above code works on the staging machine. However, it fails on the
>> production machine which is same as the staging machine with more
>> capacity.
>>
>> Job Run:
>> 11/11/22 16:13:38 INFO Driver: Input path being processed is
>> s3n://abc/yyyy/mm/dd/*
>> 11/11/22 16:13:38 INFO Driver: Output path being processed is
>> s3n://xyz/yyyy/mm/dd/00/
>> 11/11/22 16:13:51 INFO mapred.FileInputFormat: Total input paths to
>> process : 399
>> 11/11/22 16:13:53 INFO mapred.JobClient: Running job:
>> job_201111151645_14535
>> 11/11/22 16:13:54 INFO mapred.JobClient:  map 0% reduce 0%
>>
>> --- At this point, it hangs. The job submission goes fine and I can
>> see messages in jobtracker logs
>> that the task assignment has happened fine. By that I mean the log says
>> " Adding task (MAP) 'attempt_201111262339_1974_r_000040_1' to tip
>> task_201111262339_1974_r_000040, for tracker
>> 'tracker_xx.xx.xx:localhost/127.0.0.1:47937' "
>> But if I go to tasktracker logs (to which task was assigned) I do not
>> see any mention of this attempt , which hints the tasktracker did not
>> pick this task(?).
>> We are using fair scheduler , if that has something to do.
>>
>> I tried to validate if it is the issue with the connection to s3. So,
>> I tried to distcp from s3 to hdfs and it went fine, which hints
>> connectivity issues are not there.
>>
>> Does anyone know what could be the possible reason for the error?
>>
>> Thanks in advance!
>>
>> Nitika
>>