You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by murat migdisoglu <mu...@gmail.com> on 2012/06/04 14:22:27 UTC

What happens when I do not output anything from my mapper

Hi,
I have a small application where I have only mapper class defined(no
reducer, no combiner).
Within the mapper class, I have an if condition according to which I decide
If I want to put something in the context or not.
If my condition is not match, I want that mapper does not give any output
to the hdfs.
But apparently, this does not worj as I expected. Once I run my job, a file
per mapper in the hdfs with 87 kb of size.

the if block that I'm using in the map method is as following:
if (ip == null || ip.equals(cip)) {
            Text value = new Text(mwrapper.toJson());
            word.set(ip);
            context.write( word, value);
        } else {
            log.info("ip not match [" + ip + "]");
        }
}
}//end of mapper method

How can I manage that? Does mapper always need to have an output?

-- 
"Find a job you enjoy, and you'll never work a day in your life."
Confucius

Re: What happens when I do not output anything from my mapper

Posted by murat migdisoglu <mu...@gmail.com>.
Hi Devaraj ,
Indeed, the previous email that I've sent you contained -ls output of
SequenceFileOutputFormat with signatures of the class in it. Hence it was
87 bytes.  Hadoop was creating "empty" files(in fact, files containing only
the signature) before I started to use LazyOutputFormat.

Regards
Murat


On Tue, Jun 5, 2012 at 7:22 AM, Devaraj k <de...@huawei.com> wrote:

> The output files should 0 kb size if you use
> FileOutputFormat/TextOutputFormat.
>
> I think your output format writer is writing some meta data in those
> files. Can you check what is the data present in those files.
>
> Can you tell me which output format are you using?
>
> Thanks
> Devaraj
>
> ________________________________________
> From: murat migdisoglu [murat.migdisoglu@gmail.com]
> Sent: Monday, June 04, 2012 6:18 PM
> To: common-user@hadoop.apache.org
> Subject: Re: What happens when I do not output anything from my mapper
>
> Hi,
> Thanks for your answer. After I've read your emails, I decided to clear
> completely my mapper method to see If I can disable the output of the
> mapper class at all, but it seems it did not work
> So, here is my mapper method:
>
>    @Override
>    public void map(ByteBuffer key, SortedMap<ByteBuffer, IColumn> columns,
> Context context)
>    throws IOException, InterruptedException
>    {
>
>    }
>
> when I execute hadoop fs -ls, I still see many small output files as
> following:
>
> -rw-r--r--   3 mmigdiso supergroup         87 2012-06-04 12:44
> /user/mmigdiso/output/part-m-00034
> -rw-r--r--   3 mmigdiso supergroup         87 2012-06-04 12:45
> /user/mmigdiso/output/part-m-00037
> -rw-r--r--   3 mmigdiso supergroup         87 2012-06-04 12:45
> /user/mmigdiso/output/part-m-00039
> -rw-r--r--   3 mmigdiso supergroup         87 2012-06-04 12:45
> /user/mmigdiso/output/part-m-00040
> -rw-r--r--   3 mmigdiso supergroup         87 2012-06-04 12:45
> /user/mmigdiso/output/part-m-00042
>
> Do you know If I have to put something special to the context to specify
> the "empty" output?
>
> Regards
> Murat
>
>
>
> On Mon, Jun 4, 2012 at 2:38 PM, Devaraj k <de...@huawei.com> wrote:
>
> > Hi Murat,
> >
> > As Praveenesh explained, you can control the map outputs as you want.
> >
> > map() function will be called for each input i.e map() function invokes
> > multiple times with different inputs in the same mapper. You can check by
> > having the logs in the map function what is happening in it.
> >
> >
> > Thanks
> > Devaraj
> >
> > ________________________________________
> > From: praveenesh kumar [praveenesh@gmail.com]
> > Sent: Monday, June 04, 2012 5:57 PM
> > To: common-user@hadoop.apache.org
> > Subject: Re: What happens when I do not output anything from my mapper
> >
> > You can control your map outputs based on any condition you want. I have
> > done that - it worked for me.
> > It could be your code problem that its not working for you.
> > Can you please share your map code or cross-check whether your conditions
> > are correct ?
> >
> > Regards,
> > Praveenesh
> >
> > On Mon, Jun 4, 2012 at 5:52 PM, murat migdisoglu <
> > murat.migdisoglu@gmail.com
> > > wrote:
> >
> > > Hi,
> > > I have a small application where I have only mapper class defined(no
> > > reducer, no combiner).
> > > Within the mapper class, I have an if condition according to which I
> > decide
> > > If I want to put something in the context or not.
> > > If my condition is not match, I want that mapper does not give any
> output
> > > to the hdfs.
> > > But apparently, this does not worj as I expected. Once I run my job, a
> > file
> > > per mapper in the hdfs with 87 kb of size.
> > >
> > > the if block that I'm using in the map method is as following:
> > > if (ip == null || ip.equals(cip)) {
> > >            Text value = new Text(mwrapper.toJson());
> > >            word.set(ip);
> > >            context.write( word, value);
> > >        } else {
> > >            log.info("ip not match [" + ip + "]");
> > >        }
> > > }
> > > }//end of mapper method
> > >
> > > How can I manage that? Does mapper always need to have an output?
> > >
> > > --
> > > "Find a job you enjoy, and you'll never work a day in your life."
> > > Confucius
> > >
> >
>
>
>
> --
> "Find a job you enjoy, and you'll never work a day in your life."
> Confucius
>



-- 
"Find a job you enjoy, and you'll never work a day in your life."
Confucius

RE: What happens when I do not output anything from my mapper

Posted by Devaraj k <de...@huawei.com>.
The output files should 0 kb size if you use FileOutputFormat/TextOutputFormat. 

I think your output format writer is writing some meta data in those files. Can you check what is the data present in those files.

Can you tell me which output format are you using?

Thanks
Devaraj

________________________________________
From: murat migdisoglu [murat.migdisoglu@gmail.com]
Sent: Monday, June 04, 2012 6:18 PM
To: common-user@hadoop.apache.org
Subject: Re: What happens when I do not output anything from my mapper

Hi,
Thanks for your answer. After I've read your emails, I decided to clear
completely my mapper method to see If I can disable the output of the
mapper class at all, but it seems it did not work
So, here is my mapper method:

    @Override
    public void map(ByteBuffer key, SortedMap<ByteBuffer, IColumn> columns,
Context context)
    throws IOException, InterruptedException
    {

    }

when I execute hadoop fs -ls, I still see many small output files as
following:

-rw-r--r--   3 mmigdiso supergroup         87 2012-06-04 12:44
/user/mmigdiso/output/part-m-00034
-rw-r--r--   3 mmigdiso supergroup         87 2012-06-04 12:45
/user/mmigdiso/output/part-m-00037
-rw-r--r--   3 mmigdiso supergroup         87 2012-06-04 12:45
/user/mmigdiso/output/part-m-00039
-rw-r--r--   3 mmigdiso supergroup         87 2012-06-04 12:45
/user/mmigdiso/output/part-m-00040
-rw-r--r--   3 mmigdiso supergroup         87 2012-06-04 12:45
/user/mmigdiso/output/part-m-00042

Do you know If I have to put something special to the context to specify
the "empty" output?

Regards
Murat



On Mon, Jun 4, 2012 at 2:38 PM, Devaraj k <de...@huawei.com> wrote:

> Hi Murat,
>
> As Praveenesh explained, you can control the map outputs as you want.
>
> map() function will be called for each input i.e map() function invokes
> multiple times with different inputs in the same mapper. You can check by
> having the logs in the map function what is happening in it.
>
>
> Thanks
> Devaraj
>
> ________________________________________
> From: praveenesh kumar [praveenesh@gmail.com]
> Sent: Monday, June 04, 2012 5:57 PM
> To: common-user@hadoop.apache.org
> Subject: Re: What happens when I do not output anything from my mapper
>
> You can control your map outputs based on any condition you want. I have
> done that - it worked for me.
> It could be your code problem that its not working for you.
> Can you please share your map code or cross-check whether your conditions
> are correct ?
>
> Regards,
> Praveenesh
>
> On Mon, Jun 4, 2012 at 5:52 PM, murat migdisoglu <
> murat.migdisoglu@gmail.com
> > wrote:
>
> > Hi,
> > I have a small application where I have only mapper class defined(no
> > reducer, no combiner).
> > Within the mapper class, I have an if condition according to which I
> decide
> > If I want to put something in the context or not.
> > If my condition is not match, I want that mapper does not give any output
> > to the hdfs.
> > But apparently, this does not worj as I expected. Once I run my job, a
> file
> > per mapper in the hdfs with 87 kb of size.
> >
> > the if block that I'm using in the map method is as following:
> > if (ip == null || ip.equals(cip)) {
> >            Text value = new Text(mwrapper.toJson());
> >            word.set(ip);
> >            context.write( word, value);
> >        } else {
> >            log.info("ip not match [" + ip + "]");
> >        }
> > }
> > }//end of mapper method
> >
> > How can I manage that? Does mapper always need to have an output?
> >
> > --
> > "Find a job you enjoy, and you'll never work a day in your life."
> > Confucius
> >
>



--
"Find a job you enjoy, and you'll never work a day in your life."
Confucius

Re: What happens when I do not output anything from my mapper

Posted by murat migdisoglu <mu...@gmail.com>.
Hi,
Thanks for your answer. After I've read your emails, I decided to clear
completely my mapper method to see If I can disable the output of the
mapper class at all, but it seems it did not work
So, here is my mapper method:

    @Override
    public void map(ByteBuffer key, SortedMap<ByteBuffer, IColumn> columns,
Context context)
    throws IOException, InterruptedException
    {

    }

when I execute hadoop fs -ls, I still see many small output files as
following:

-rw-r--r--   3 mmigdiso supergroup         87 2012-06-04 12:44
/user/mmigdiso/output/part-m-00034
-rw-r--r--   3 mmigdiso supergroup         87 2012-06-04 12:45
/user/mmigdiso/output/part-m-00037
-rw-r--r--   3 mmigdiso supergroup         87 2012-06-04 12:45
/user/mmigdiso/output/part-m-00039
-rw-r--r--   3 mmigdiso supergroup         87 2012-06-04 12:45
/user/mmigdiso/output/part-m-00040
-rw-r--r--   3 mmigdiso supergroup         87 2012-06-04 12:45
/user/mmigdiso/output/part-m-00042

Do you know If I have to put something special to the context to specify
the "empty" output?

Regards
Murat



On Mon, Jun 4, 2012 at 2:38 PM, Devaraj k <de...@huawei.com> wrote:

> Hi Murat,
>
> As Praveenesh explained, you can control the map outputs as you want.
>
> map() function will be called for each input i.e map() function invokes
> multiple times with different inputs in the same mapper. You can check by
> having the logs in the map function what is happening in it.
>
>
> Thanks
> Devaraj
>
> ________________________________________
> From: praveenesh kumar [praveenesh@gmail.com]
> Sent: Monday, June 04, 2012 5:57 PM
> To: common-user@hadoop.apache.org
> Subject: Re: What happens when I do not output anything from my mapper
>
> You can control your map outputs based on any condition you want. I have
> done that - it worked for me.
> It could be your code problem that its not working for you.
> Can you please share your map code or cross-check whether your conditions
> are correct ?
>
> Regards,
> Praveenesh
>
> On Mon, Jun 4, 2012 at 5:52 PM, murat migdisoglu <
> murat.migdisoglu@gmail.com
> > wrote:
>
> > Hi,
> > I have a small application where I have only mapper class defined(no
> > reducer, no combiner).
> > Within the mapper class, I have an if condition according to which I
> decide
> > If I want to put something in the context or not.
> > If my condition is not match, I want that mapper does not give any output
> > to the hdfs.
> > But apparently, this does not worj as I expected. Once I run my job, a
> file
> > per mapper in the hdfs with 87 kb of size.
> >
> > the if block that I'm using in the map method is as following:
> > if (ip == null || ip.equals(cip)) {
> >            Text value = new Text(mwrapper.toJson());
> >            word.set(ip);
> >            context.write( word, value);
> >        } else {
> >            log.info("ip not match [" + ip + "]");
> >        }
> > }
> > }//end of mapper method
> >
> > How can I manage that? Does mapper always need to have an output?
> >
> > --
> > "Find a job you enjoy, and you'll never work a day in your life."
> > Confucius
> >
>



-- 
"Find a job you enjoy, and you'll never work a day in your life."
Confucius

RE: What happens when I do not output anything from my mapper

Posted by Devaraj k <de...@huawei.com>.
Hi Murat,

As Praveenesh explained, you can control the map outputs as you want. 

map() function will be called for each input i.e map() function invokes multiple times with different inputs in the same mapper. You can check by having the logs in the map function what is happening in it.
   

Thanks
Devaraj

________________________________________
From: praveenesh kumar [praveenesh@gmail.com]
Sent: Monday, June 04, 2012 5:57 PM
To: common-user@hadoop.apache.org
Subject: Re: What happens when I do not output anything from my mapper

You can control your map outputs based on any condition you want. I have
done that - it worked for me.
It could be your code problem that its not working for you.
Can you please share your map code or cross-check whether your conditions
are correct ?

Regards,
Praveenesh

On Mon, Jun 4, 2012 at 5:52 PM, murat migdisoglu <murat.migdisoglu@gmail.com
> wrote:

> Hi,
> I have a small application where I have only mapper class defined(no
> reducer, no combiner).
> Within the mapper class, I have an if condition according to which I decide
> If I want to put something in the context or not.
> If my condition is not match, I want that mapper does not give any output
> to the hdfs.
> But apparently, this does not worj as I expected. Once I run my job, a file
> per mapper in the hdfs with 87 kb of size.
>
> the if block that I'm using in the map method is as following:
> if (ip == null || ip.equals(cip)) {
>            Text value = new Text(mwrapper.toJson());
>            word.set(ip);
>            context.write( word, value);
>        } else {
>            log.info("ip not match [" + ip + "]");
>        }
> }
> }//end of mapper method
>
> How can I manage that? Does mapper always need to have an output?
>
> --
> "Find a job you enjoy, and you'll never work a day in your life."
> Confucius
>

Re: What happens when I do not output anything from my mapper

Posted by praveenesh kumar <pr...@gmail.com>.
You can control your map outputs based on any condition you want. I have
done that - it worked for me.
It could be your code problem that its not working for you.
Can you please share your map code or cross-check whether your conditions
are correct ?

Regards,
Praveenesh

On Mon, Jun 4, 2012 at 5:52 PM, murat migdisoglu <murat.migdisoglu@gmail.com
> wrote:

> Hi,
> I have a small application where I have only mapper class defined(no
> reducer, no combiner).
> Within the mapper class, I have an if condition according to which I decide
> If I want to put something in the context or not.
> If my condition is not match, I want that mapper does not give any output
> to the hdfs.
> But apparently, this does not worj as I expected. Once I run my job, a file
> per mapper in the hdfs with 87 kb of size.
>
> the if block that I'm using in the map method is as following:
> if (ip == null || ip.equals(cip)) {
>            Text value = new Text(mwrapper.toJson());
>            word.set(ip);
>            context.write( word, value);
>        } else {
>            log.info("ip not match [" + ip + "]");
>        }
> }
> }//end of mapper method
>
> How can I manage that? Does mapper always need to have an output?
>
> --
> "Find a job you enjoy, and you'll never work a day in your life."
> Confucius
>