You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-dev@hadoop.apache.org by Nitika Gupta <ng...@rocketfuel.com> on 2011/11/22 22:20:10 UTC

Issue : Hadoop mapreduce job to process S3 logs gets hung at INFO mapred.JobClient: map 0% reduce 0%

Hi All,

I am trying to run a mapreduce job to process the Amazon S3 logs.
However, the code hangs at INFO mapred.JobClient: map 0% reduce 0% and
does not even attempt to launch the tasks. The sample code for the job
setup is given below:

public int run(CommandLine cl) throws Exception
{
Configuration conf = getConf();
String inputPath = "";
String outputPath = "";
try
{
Job job = new Job(conf, "Dummy");
job.setNumReduceTasks(0);
job.setMapperClass(Mapper.class);
inputPath = cl.getOptionValue("input"); //input is an s3n path
outputPath = cl.getOptionValue("output");
FileInputFormat.setInputPaths(job, inputPath);
FileOutputFormat.setOutputPath(job, new Path(outputPath));
_log.info("Input path set as " + inputPath);
_log.info("Output path set as " + outputPath);
job.waitForCompletion(true); return 0;
}
catch (Exception ex)
{
_log.error(ex); return 1; }
}
The above code works on the staging machine. However, it fails on the
production machine which is same as the staging machine with more
capacity.

Job Run:
11/11/22 16:13:38 INFO Driver: Input path being processed is
s3n://abc/yyyy/mm/dd/*
11/11/22 16:13:38 INFO Driver: Output path being processed is
s3n://xyz/yyyy/mm/dd/00/
11/11/22 16:13:51 INFO mapred.FileInputFormat: Total input paths to
process : 399
11/11/22 16:13:53 INFO mapred.JobClient: Running job: job_201111151645_14535
11/11/22 16:13:54 INFO mapred.JobClient:  map 0% reduce 0%

--- It hangs at this point.

Does anyone know what could be the possible reason for the error?

Thanks in advance!

Nitika

Re: Issue : Hadoop mapreduce job to process S3 logs gets hung at INFO mapred.JobClient: map 0% reduce 0%

Posted by Nitika Gupta <ng...@rocketfuel.com>.

Hi Deepak,
I checked the s3 bucket path and it is fine. Also the output directory
does not exist.

Any other reasons why the mapred job is hanging?

Thanks

Nitika


On Tue, Nov 22, 2011 at 9:42 PM, Deepak Sharma <de...@gmail.com> wrote:
> Hi Nikita
> There are certain things that can be checked if you mapred job is failing.
> Here are some:
>
> 1.Make sure the url to s3 bucket includes terminating slash
> 2.Make sure the output directory does not pre-exists.
>
> Thanks
> Deepak
> On Wed, Nov 23, 2011 at 2:50 AM, Nitika Gupta <ng...@rocketfuel.com> wrote:
>>
>> Hi All,
>>
>> I am trying to run a mapreduce job to process the Amazon S3 logs.
>> However, the code hangs at INFO mapred.JobClient: map 0% reduce 0% and
>> does not even attempt to launch the tasks. The sample code for the job
>> setup is given below:
>>
>> public int run(CommandLine cl) throws Exception
>> {
>> Configuration conf = getConf();
>> String inputPath = "";
>> String outputPath = "";
>> try
>> {
>> Job job = new Job(conf, "Dummy");
>> job.setNumReduceTasks(0);
>> job.setMapperClass(Mapper.class);
>> inputPath = cl.getOptionValue("input"); //input is an s3n path
>> outputPath = cl.getOptionValue("output");
>> FileInputFormat.setInputPaths(job, inputPath);
>> FileOutputFormat.setOutputPath(job, new Path(outputPath));
>> _log.info("Input path set as " + inputPath);
>> _log.info("Output path set as " + outputPath);
>> job.waitForCompletion(true); return 0;
>> }
>> catch (Exception ex)
>> {
>> _log.error(ex); return 1; }
>> }
>> The above code works on the staging machine. However, it fails on the
>> production machine which is same as the staging machine with more
>> capacity.
>>
>> Job Run:
>> 11/11/22 16:13:38 INFO Driver: Input path being processed is
>> s3n://abc/yyyy/mm/dd/*
>> 11/11/22 16:13:38 INFO Driver: Output path being processed is
>> s3n://xyz/yyyy/mm/dd/00/
>> 11/11/22 16:13:51 INFO mapred.FileInputFormat: Total input paths to
>> process : 399
>> 11/11/22 16:13:53 INFO mapred.JobClient: Running job:
>> job_201111151645_14535
>> 11/11/22 16:13:54 INFO mapred.JobClient:  map 0% reduce 0%
>>
>> --- It hangs at this point.
>>
>> Does anyone know what could be the possible reason for the error?
>>
>> Thanks in advance!
>>
>> Nitika
>
>
>
> --
> Deepak Sharma
> http://www.linkedin.com/in/rikindia
>

Re: Issue : Hadoop mapreduce job to process S3 logs gets hung at INFO mapred.JobClient: map 0% reduce 0%

Posted by Deepak Sharma <de...@gmail.com>.

Hi Nikita
There are certain things that can be checked if you mapred job is failing.
Here are some:

1.Make sure the url to s3 bucket includes terminating slash
2.Make sure the output directory does not pre-exists.

Thanks
Deepak
On Wed, Nov 23, 2011 at 2:50 AM, Nitika Gupta <ng...@rocketfuel.com> wrote:

> Hi All,
>
> I am trying to run a mapreduce job to process the Amazon S3 logs.
> However, the code hangs at INFO mapred.JobClient: map 0% reduce 0% and
> does not even attempt to launch the tasks. The sample code for the job
> setup is given below:
>
> public int run(CommandLine cl) throws Exception
> {
> Configuration conf = getConf();
> String inputPath = "";
> String outputPath = "";
> try
> {
> Job job = new Job(conf, "Dummy");
> job.setNumReduceTasks(0);
> job.setMapperClass(Mapper.class);
> inputPath = cl.getOptionValue("input"); //input is an s3n path
> outputPath = cl.getOptionValue("output");
> FileInputFormat.setInputPaths(job, inputPath);
> FileOutputFormat.setOutputPath(job, new Path(outputPath));
> _log.info("Input path set as " + inputPath);
> _log.info("Output path set as " + outputPath);
> job.waitForCompletion(true); return 0;
> }
> catch (Exception ex)
> {
> _log.error(ex); return 1; }
> }
> The above code works on the staging machine. However, it fails on the
> production machine which is same as the staging machine with more
> capacity.
>
> Job Run:
> 11/11/22 16:13:38 INFO Driver: Input path being processed is
> s3n://abc/yyyy/mm/dd/*
> 11/11/22 16:13:38 INFO Driver: Output path being processed is
> s3n://xyz/yyyy/mm/dd/00/
> 11/11/22 16:13:51 INFO mapred.FileInputFormat: Total input paths to
> process : 399
> 11/11/22 16:13:53 INFO mapred.JobClient: Running job:
> job_201111151645_14535
> 11/11/22 16:13:54 INFO mapred.JobClient:  map 0% reduce 0%
>
> --- It hangs at this point.
>
> Does anyone know what could be the possible reason for the error?
>
> Thanks in advance!
>
> Nitika
>



-- 
Deepak Sharma
http://www.linkedin.com/in/rikindia