You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by murat migdisoglu <mu...@gmail.com> on 2012/06/04 17:42:29 UTC

Re: What happens when I do not output anything from my mapper - Solution

Ok,
For the ones that faces the problem, here is how I solved the problem:
First of all, there was a task created for that on hadoop:
https://issues.apache.org/jira/browse/HADOOP-4927

and
http://hadoop.apache.org/mapreduce/docs/current/mapred_tutorial.html#Lazy+Output+Creation
explains how to solve that.

So hadoop does indeed create empty part-00x files irrespective what you do
in the mapper class.

So you have to call the following static method of the lazyoutputformat:
LazyOutputFormat.setOutputFormatClass(job, SequenceFileOutputFormat.class);

Be aware, from my experience, this method should be called after you set
the outputformat class:
 job.setOutputFormatClass(SequenceFileOutputFormat.class);


On Mon, Jun 4, 2012 at 2:48 PM, murat migdisoglu <murat.migdisoglu@gmail.com
> wrote:

> Hi,
> Thanks for your answer. After I've read your emails, I decided to clear
> completely my mapper method to see If I can disable the output of the
> mapper class at all, but it seems it did not work
> So, here is my mapper method:
>
>     @Override
>     public void map(ByteBuffer key, SortedMap<ByteBuffer, IColumn>
> columns, Context context)
>     throws IOException, InterruptedException
>     {
>
>     }
>
> when I execute hadoop fs -ls, I still see many small output files as
> following:
>
> -rw-r--r--   3 mmigdiso supergroup         87 2012-06-04 12:44
> /user/mmigdiso/output/part-m-00034
> -rw-r--r--   3 mmigdiso supergroup         87 2012-06-04 12:45
> /user/mmigdiso/output/part-m-00037
> -rw-r--r--   3 mmigdiso supergroup         87 2012-06-04 12:45
> /user/mmigdiso/output/part-m-00039
> -rw-r--r--   3 mmigdiso supergroup         87 2012-06-04 12:45
> /user/mmigdiso/output/part-m-00040
> -rw-r--r--   3 mmigdiso supergroup         87 2012-06-04 12:45
> /user/mmigdiso/output/part-m-00042
>
> Do you know If I have to put something special to the context to specify
> the "empty" output?
>
> Regards
> Murat
>
>
>
>
> On Mon, Jun 4, 2012 at 2:38 PM, Devaraj k <de...@huawei.com> wrote:
>
>> Hi Murat,
>>
>> As Praveenesh explained, you can control the map outputs as you want.
>>
>> map() function will be called for each input i.e map() function invokes
>> multiple times with different inputs in the same mapper. You can check by
>> having the logs in the map function what is happening in it.
>>
>>
>> Thanks
>> Devaraj
>>
>> ________________________________________
>> From: praveenesh kumar [praveenesh@gmail.com]
>> Sent: Monday, June 04, 2012 5:57 PM
>> To: common-user@hadoop.apache.org
>> Subject: Re: What happens when I do not output anything from my mapper
>>
>> You can control your map outputs based on any condition you want. I have
>> done that - it worked for me.
>> It could be your code problem that its not working for you.
>> Can you please share your map code or cross-check whether your conditions
>> are correct ?
>>
>> Regards,
>> Praveenesh
>>
>> On Mon, Jun 4, 2012 at 5:52 PM, murat migdisoglu <
>> murat.migdisoglu@gmail.com
>> > wrote:
>>
>> > Hi,
>> > I have a small application where I have only mapper class defined(no
>> > reducer, no combiner).
>> > Within the mapper class, I have an if condition according to which I
>> decide
>> > If I want to put something in the context or not.
>> > If my condition is not match, I want that mapper does not give any
>> output
>> > to the hdfs.
>> > But apparently, this does not worj as I expected. Once I run my job, a
>> file
>> > per mapper in the hdfs with 87 kb of size.
>> >
>> > the if block that I'm using in the map method is as following:
>> > if (ip == null || ip.equals(cip)) {
>> >            Text value = new Text(mwrapper.toJson());
>> >            word.set(ip);
>> >            context.write( word, value);
>> >        } else {
>> >            log.info("ip not match [" + ip + "]");
>> >        }
>> > }
>> > }//end of mapper method
>> >
>> > How can I manage that? Does mapper always need to have an output?
>> >
>> > --
>> > "Find a job you enjoy, and you'll never work a day in your life."
>> > Confucius
>> >
>>
>
>
>
> --
> "Find a job you enjoy, and you'll never work a day in your life."
> Confucius
>
>


-- 
"Find a job you enjoy, and you'll never work a day in your life."
Confucius