You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Yaron Gonen <ya...@gmail.com> on 2013/09/17 12:09:37 UTC

MAP_INPUT_RECORDS counter in the reducer

Hi,
Is there a way for the reducer to get the total number of input records to
the map phase?
For example, I want the reducer to normalize a sum by dividing it in the
number of records. I tried getting the value of that counter by using the
line:

context.getCounter(Task.Counter.MAP_INPUT_RECORDS).getValue();

in the reducer code, but I got 0.

Thanks!
Yaron

Re: MAP_INPUT_RECORDS counter in the reducer

Posted by Yaron Gonen <ya...@gmail.com>.
Hi again,
I've run into this link:
http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-user/201112.mbox/%3CCAFE9998.2FEF6%25evans@yahoo-inc.com%3E
Looks like a nice idea. Have someone tried something similar?

Thanks


On Wed, Sep 18, 2013 at 4:46 PM, Shahab Yunus <sh...@gmail.com>wrote:

> Yes, you are correct that copying phase starts while the maps are running
> and the reduce function is not called until everything is done but aren't
> the Reduce tasks are also already 'initialized' at this point? Which, as
> far as I know and might be wrong, will not have the map input records
> counter (and was my point)?
>
> Regards,
> Shahab
>
>
> On Tue, Sep 17, 2013 at 11:09 PM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
>> Shahab,
>>
>> One question - You mentioned - "In the normal configuration, the issue
>> here is that Reducers can start before all the Maps have finished so it is
>> not possible to get the number (or make sense of it even if you are able
>> to,)"
>>
>> I think , reducers would start copying the data form the completed map
>> tasks , but will not start the actual reduce process until data from all
>> the mappers are pulled in.
>>
>> So , the call to the counter Yorn has made might work.If invoked from the
>> reduce method.
>>
>> Thanks,
>> Rahul
>>
>>
>>
>> On Wed, Sep 18, 2013 at 7:38 AM, java8964 java8964 <ja...@hotmail.com>wrote:
>>
>>> Or you do the calculation in the reducer close() method, even though I
>>> am not sure in the reducer you can get the Mapper's count.
>>>
>>> But even you can't, here is what can do:
>>> 1) Save the JobConf reference in your Mapper conf metehod
>>> 2) Store the Map_INPUT_RECORDS counter in the configuration object as
>>> your own properties, in the close() method of the mapper
>>> 3) Retrieve that property in the reducer close() method, then you have
>>> both numbers at that time.
>>>
>>> Yong
>>>
>>> ------------------------------
>>> Date: Tue, 17 Sep 2013 09:49:06 -0400
>>> Subject: Re: MAP_INPUT_RECORDS counter in the reducer
>>> From: shahab.yunus@gmail.com
>>> To: user@hadoop.apache.org
>>>
>>>
>>> In the normal configuration, the issue here is that Reducers can start
>>> before all the Maps have finished so it is not possible to get the number
>>> (or make sense of it even if you are able to,)
>>>
>>> Having said that, you can specifically make sure that Reducers don't
>>> start until all your maps have completed. It will of course slow down your
>>> job. I don't know whether with this option it will work or not, but you can
>>> try (until experts have some advise already.)
>>>
>>> Regards,
>>> Shahab
>>>
>>>
>>> On Tue, Sep 17, 2013 at 6:09 AM, Yaron Gonen <ya...@gmail.com>wrote:
>>>
>>> Hi,
>>> Is there a way for the reducer to get the total number of input records
>>> to the map phase?
>>> For example, I want the reducer to normalize a sum by dividing it in the
>>> number of records. I tried getting the value of that counter by using the
>>> line:
>>>
>>> context.getCounter(Task.Counter.MAP_INPUT_RECORDS).getValue();
>>>
>>> in the reducer code, but I got 0.
>>>
>>> Thanks!
>>> Yaron
>>>
>>>
>>>
>>
>

Re: MAP_INPUT_RECORDS counter in the reducer

Posted by Yaron Gonen <ya...@gmail.com>.
Hi again,
I've run into this link:
http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-user/201112.mbox/%3CCAFE9998.2FEF6%25evans@yahoo-inc.com%3E
Looks like a nice idea. Have someone tried something similar?

Thanks


On Wed, Sep 18, 2013 at 4:46 PM, Shahab Yunus <sh...@gmail.com>wrote:

> Yes, you are correct that copying phase starts while the maps are running
> and the reduce function is not called until everything is done but aren't
> the Reduce tasks are also already 'initialized' at this point? Which, as
> far as I know and might be wrong, will not have the map input records
> counter (and was my point)?
>
> Regards,
> Shahab
>
>
> On Tue, Sep 17, 2013 at 11:09 PM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
>> Shahab,
>>
>> One question - You mentioned - "In the normal configuration, the issue
>> here is that Reducers can start before all the Maps have finished so it is
>> not possible to get the number (or make sense of it even if you are able
>> to,)"
>>
>> I think , reducers would start copying the data form the completed map
>> tasks , but will not start the actual reduce process until data from all
>> the mappers are pulled in.
>>
>> So , the call to the counter Yorn has made might work.If invoked from the
>> reduce method.
>>
>> Thanks,
>> Rahul
>>
>>
>>
>> On Wed, Sep 18, 2013 at 7:38 AM, java8964 java8964 <ja...@hotmail.com>wrote:
>>
>>> Or you do the calculation in the reducer close() method, even though I
>>> am not sure in the reducer you can get the Mapper's count.
>>>
>>> But even you can't, here is what can do:
>>> 1) Save the JobConf reference in your Mapper conf metehod
>>> 2) Store the Map_INPUT_RECORDS counter in the configuration object as
>>> your own properties, in the close() method of the mapper
>>> 3) Retrieve that property in the reducer close() method, then you have
>>> both numbers at that time.
>>>
>>> Yong
>>>
>>> ------------------------------
>>> Date: Tue, 17 Sep 2013 09:49:06 -0400
>>> Subject: Re: MAP_INPUT_RECORDS counter in the reducer
>>> From: shahab.yunus@gmail.com
>>> To: user@hadoop.apache.org
>>>
>>>
>>> In the normal configuration, the issue here is that Reducers can start
>>> before all the Maps have finished so it is not possible to get the number
>>> (or make sense of it even if you are able to,)
>>>
>>> Having said that, you can specifically make sure that Reducers don't
>>> start until all your maps have completed. It will of course slow down your
>>> job. I don't know whether with this option it will work or not, but you can
>>> try (until experts have some advise already.)
>>>
>>> Regards,
>>> Shahab
>>>
>>>
>>> On Tue, Sep 17, 2013 at 6:09 AM, Yaron Gonen <ya...@gmail.com>wrote:
>>>
>>> Hi,
>>> Is there a way for the reducer to get the total number of input records
>>> to the map phase?
>>> For example, I want the reducer to normalize a sum by dividing it in the
>>> number of records. I tried getting the value of that counter by using the
>>> line:
>>>
>>> context.getCounter(Task.Counter.MAP_INPUT_RECORDS).getValue();
>>>
>>> in the reducer code, but I got 0.
>>>
>>> Thanks!
>>> Yaron
>>>
>>>
>>>
>>
>

Re: MAP_INPUT_RECORDS counter in the reducer

Posted by Yaron Gonen <ya...@gmail.com>.
Hi again,
I've run into this link:
http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-user/201112.mbox/%3CCAFE9998.2FEF6%25evans@yahoo-inc.com%3E
Looks like a nice idea. Have someone tried something similar?

Thanks


On Wed, Sep 18, 2013 at 4:46 PM, Shahab Yunus <sh...@gmail.com>wrote:

> Yes, you are correct that copying phase starts while the maps are running
> and the reduce function is not called until everything is done but aren't
> the Reduce tasks are also already 'initialized' at this point? Which, as
> far as I know and might be wrong, will not have the map input records
> counter (and was my point)?
>
> Regards,
> Shahab
>
>
> On Tue, Sep 17, 2013 at 11:09 PM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
>> Shahab,
>>
>> One question - You mentioned - "In the normal configuration, the issue
>> here is that Reducers can start before all the Maps have finished so it is
>> not possible to get the number (or make sense of it even if you are able
>> to,)"
>>
>> I think , reducers would start copying the data form the completed map
>> tasks , but will not start the actual reduce process until data from all
>> the mappers are pulled in.
>>
>> So , the call to the counter Yorn has made might work.If invoked from the
>> reduce method.
>>
>> Thanks,
>> Rahul
>>
>>
>>
>> On Wed, Sep 18, 2013 at 7:38 AM, java8964 java8964 <ja...@hotmail.com>wrote:
>>
>>> Or you do the calculation in the reducer close() method, even though I
>>> am not sure in the reducer you can get the Mapper's count.
>>>
>>> But even you can't, here is what can do:
>>> 1) Save the JobConf reference in your Mapper conf metehod
>>> 2) Store the Map_INPUT_RECORDS counter in the configuration object as
>>> your own properties, in the close() method of the mapper
>>> 3) Retrieve that property in the reducer close() method, then you have
>>> both numbers at that time.
>>>
>>> Yong
>>>
>>> ------------------------------
>>> Date: Tue, 17 Sep 2013 09:49:06 -0400
>>> Subject: Re: MAP_INPUT_RECORDS counter in the reducer
>>> From: shahab.yunus@gmail.com
>>> To: user@hadoop.apache.org
>>>
>>>
>>> In the normal configuration, the issue here is that Reducers can start
>>> before all the Maps have finished so it is not possible to get the number
>>> (or make sense of it even if you are able to,)
>>>
>>> Having said that, you can specifically make sure that Reducers don't
>>> start until all your maps have completed. It will of course slow down your
>>> job. I don't know whether with this option it will work or not, but you can
>>> try (until experts have some advise already.)
>>>
>>> Regards,
>>> Shahab
>>>
>>>
>>> On Tue, Sep 17, 2013 at 6:09 AM, Yaron Gonen <ya...@gmail.com>wrote:
>>>
>>> Hi,
>>> Is there a way for the reducer to get the total number of input records
>>> to the map phase?
>>> For example, I want the reducer to normalize a sum by dividing it in the
>>> number of records. I tried getting the value of that counter by using the
>>> line:
>>>
>>> context.getCounter(Task.Counter.MAP_INPUT_RECORDS).getValue();
>>>
>>> in the reducer code, but I got 0.
>>>
>>> Thanks!
>>> Yaron
>>>
>>>
>>>
>>
>

Re: MAP_INPUT_RECORDS counter in the reducer

Posted by Yaron Gonen <ya...@gmail.com>.
Hi again,
I've run into this link:
http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-user/201112.mbox/%3CCAFE9998.2FEF6%25evans@yahoo-inc.com%3E
Looks like a nice idea. Have someone tried something similar?

Thanks


On Wed, Sep 18, 2013 at 4:46 PM, Shahab Yunus <sh...@gmail.com>wrote:

> Yes, you are correct that copying phase starts while the maps are running
> and the reduce function is not called until everything is done but aren't
> the Reduce tasks are also already 'initialized' at this point? Which, as
> far as I know and might be wrong, will not have the map input records
> counter (and was my point)?
>
> Regards,
> Shahab
>
>
> On Tue, Sep 17, 2013 at 11:09 PM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
>> Shahab,
>>
>> One question - You mentioned - "In the normal configuration, the issue
>> here is that Reducers can start before all the Maps have finished so it is
>> not possible to get the number (or make sense of it even if you are able
>> to,)"
>>
>> I think , reducers would start copying the data form the completed map
>> tasks , but will not start the actual reduce process until data from all
>> the mappers are pulled in.
>>
>> So , the call to the counter Yorn has made might work.If invoked from the
>> reduce method.
>>
>> Thanks,
>> Rahul
>>
>>
>>
>> On Wed, Sep 18, 2013 at 7:38 AM, java8964 java8964 <ja...@hotmail.com>wrote:
>>
>>> Or you do the calculation in the reducer close() method, even though I
>>> am not sure in the reducer you can get the Mapper's count.
>>>
>>> But even you can't, here is what can do:
>>> 1) Save the JobConf reference in your Mapper conf metehod
>>> 2) Store the Map_INPUT_RECORDS counter in the configuration object as
>>> your own properties, in the close() method of the mapper
>>> 3) Retrieve that property in the reducer close() method, then you have
>>> both numbers at that time.
>>>
>>> Yong
>>>
>>> ------------------------------
>>> Date: Tue, 17 Sep 2013 09:49:06 -0400
>>> Subject: Re: MAP_INPUT_RECORDS counter in the reducer
>>> From: shahab.yunus@gmail.com
>>> To: user@hadoop.apache.org
>>>
>>>
>>> In the normal configuration, the issue here is that Reducers can start
>>> before all the Maps have finished so it is not possible to get the number
>>> (or make sense of it even if you are able to,)
>>>
>>> Having said that, you can specifically make sure that Reducers don't
>>> start until all your maps have completed. It will of course slow down your
>>> job. I don't know whether with this option it will work or not, but you can
>>> try (until experts have some advise already.)
>>>
>>> Regards,
>>> Shahab
>>>
>>>
>>> On Tue, Sep 17, 2013 at 6:09 AM, Yaron Gonen <ya...@gmail.com>wrote:
>>>
>>> Hi,
>>> Is there a way for the reducer to get the total number of input records
>>> to the map phase?
>>> For example, I want the reducer to normalize a sum by dividing it in the
>>> number of records. I tried getting the value of that counter by using the
>>> line:
>>>
>>> context.getCounter(Task.Counter.MAP_INPUT_RECORDS).getValue();
>>>
>>> in the reducer code, but I got 0.
>>>
>>> Thanks!
>>> Yaron
>>>
>>>
>>>
>>
>

Re: MAP_INPUT_RECORDS counter in the reducer

Posted by Shahab Yunus <sh...@gmail.com>.
Yes, you are correct that copying phase starts while the maps are running
and the reduce function is not called until everything is done but aren't
the Reduce tasks are also already 'initialized' at this point? Which, as
far as I know and might be wrong, will not have the map input records
counter (and was my point)?

Regards,
Shahab


On Tue, Sep 17, 2013 at 11:09 PM, Rahul Bhattacharjee <
rahul.rec.dgp@gmail.com> wrote:

> Shahab,
>
> One question - You mentioned - "In the normal configuration, the issue
> here is that Reducers can start before all the Maps have finished so it is
> not possible to get the number (or make sense of it even if you are able
> to,)"
>
> I think , reducers would start copying the data form the completed map
> tasks , but will not start the actual reduce process until data from all
> the mappers are pulled in.
>
> So , the call to the counter Yorn has made might work.If invoked from the
> reduce method.
>
> Thanks,
> Rahul
>
>
>
> On Wed, Sep 18, 2013 at 7:38 AM, java8964 java8964 <ja...@hotmail.com>wrote:
>
>> Or you do the calculation in the reducer close() method, even though I am
>> not sure in the reducer you can get the Mapper's count.
>>
>> But even you can't, here is what can do:
>> 1) Save the JobConf reference in your Mapper conf metehod
>> 2) Store the Map_INPUT_RECORDS counter in the configuration object as
>> your own properties, in the close() method of the mapper
>> 3) Retrieve that property in the reducer close() method, then you have
>> both numbers at that time.
>>
>> Yong
>>
>> ------------------------------
>> Date: Tue, 17 Sep 2013 09:49:06 -0400
>> Subject: Re: MAP_INPUT_RECORDS counter in the reducer
>> From: shahab.yunus@gmail.com
>> To: user@hadoop.apache.org
>>
>>
>> In the normal configuration, the issue here is that Reducers can start
>> before all the Maps have finished so it is not possible to get the number
>> (or make sense of it even if you are able to,)
>>
>> Having said that, you can specifically make sure that Reducers don't
>> start until all your maps have completed. It will of course slow down your
>> job. I don't know whether with this option it will work or not, but you can
>> try (until experts have some advise already.)
>>
>> Regards,
>> Shahab
>>
>>
>> On Tue, Sep 17, 2013 at 6:09 AM, Yaron Gonen <ya...@gmail.com>wrote:
>>
>> Hi,
>> Is there a way for the reducer to get the total number of input records
>> to the map phase?
>> For example, I want the reducer to normalize a sum by dividing it in the
>> number of records. I tried getting the value of that counter by using the
>> line:
>>
>> context.getCounter(Task.Counter.MAP_INPUT_RECORDS).getValue();
>>
>> in the reducer code, but I got 0.
>>
>> Thanks!
>> Yaron
>>
>>
>>
>

Re: MAP_INPUT_RECORDS counter in the reducer

Posted by Shahab Yunus <sh...@gmail.com>.
Yes, you are correct that copying phase starts while the maps are running
and the reduce function is not called until everything is done but aren't
the Reduce tasks are also already 'initialized' at this point? Which, as
far as I know and might be wrong, will not have the map input records
counter (and was my point)?

Regards,
Shahab


On Tue, Sep 17, 2013 at 11:09 PM, Rahul Bhattacharjee <
rahul.rec.dgp@gmail.com> wrote:

> Shahab,
>
> One question - You mentioned - "In the normal configuration, the issue
> here is that Reducers can start before all the Maps have finished so it is
> not possible to get the number (or make sense of it even if you are able
> to,)"
>
> I think , reducers would start copying the data form the completed map
> tasks , but will not start the actual reduce process until data from all
> the mappers are pulled in.
>
> So , the call to the counter Yorn has made might work.If invoked from the
> reduce method.
>
> Thanks,
> Rahul
>
>
>
> On Wed, Sep 18, 2013 at 7:38 AM, java8964 java8964 <ja...@hotmail.com>wrote:
>
>> Or you do the calculation in the reducer close() method, even though I am
>> not sure in the reducer you can get the Mapper's count.
>>
>> But even you can't, here is what can do:
>> 1) Save the JobConf reference in your Mapper conf metehod
>> 2) Store the Map_INPUT_RECORDS counter in the configuration object as
>> your own properties, in the close() method of the mapper
>> 3) Retrieve that property in the reducer close() method, then you have
>> both numbers at that time.
>>
>> Yong
>>
>> ------------------------------
>> Date: Tue, 17 Sep 2013 09:49:06 -0400
>> Subject: Re: MAP_INPUT_RECORDS counter in the reducer
>> From: shahab.yunus@gmail.com
>> To: user@hadoop.apache.org
>>
>>
>> In the normal configuration, the issue here is that Reducers can start
>> before all the Maps have finished so it is not possible to get the number
>> (or make sense of it even if you are able to,)
>>
>> Having said that, you can specifically make sure that Reducers don't
>> start until all your maps have completed. It will of course slow down your
>> job. I don't know whether with this option it will work or not, but you can
>> try (until experts have some advise already.)
>>
>> Regards,
>> Shahab
>>
>>
>> On Tue, Sep 17, 2013 at 6:09 AM, Yaron Gonen <ya...@gmail.com>wrote:
>>
>> Hi,
>> Is there a way for the reducer to get the total number of input records
>> to the map phase?
>> For example, I want the reducer to normalize a sum by dividing it in the
>> number of records. I tried getting the value of that counter by using the
>> line:
>>
>> context.getCounter(Task.Counter.MAP_INPUT_RECORDS).getValue();
>>
>> in the reducer code, but I got 0.
>>
>> Thanks!
>> Yaron
>>
>>
>>
>

Re: MAP_INPUT_RECORDS counter in the reducer

Posted by Shahab Yunus <sh...@gmail.com>.
Yes, you are correct that copying phase starts while the maps are running
and the reduce function is not called until everything is done but aren't
the Reduce tasks are also already 'initialized' at this point? Which, as
far as I know and might be wrong, will not have the map input records
counter (and was my point)?

Regards,
Shahab


On Tue, Sep 17, 2013 at 11:09 PM, Rahul Bhattacharjee <
rahul.rec.dgp@gmail.com> wrote:

> Shahab,
>
> One question - You mentioned - "In the normal configuration, the issue
> here is that Reducers can start before all the Maps have finished so it is
> not possible to get the number (or make sense of it even if you are able
> to,)"
>
> I think , reducers would start copying the data form the completed map
> tasks , but will not start the actual reduce process until data from all
> the mappers are pulled in.
>
> So , the call to the counter Yorn has made might work.If invoked from the
> reduce method.
>
> Thanks,
> Rahul
>
>
>
> On Wed, Sep 18, 2013 at 7:38 AM, java8964 java8964 <ja...@hotmail.com>wrote:
>
>> Or you do the calculation in the reducer close() method, even though I am
>> not sure in the reducer you can get the Mapper's count.
>>
>> But even you can't, here is what can do:
>> 1) Save the JobConf reference in your Mapper conf metehod
>> 2) Store the Map_INPUT_RECORDS counter in the configuration object as
>> your own properties, in the close() method of the mapper
>> 3) Retrieve that property in the reducer close() method, then you have
>> both numbers at that time.
>>
>> Yong
>>
>> ------------------------------
>> Date: Tue, 17 Sep 2013 09:49:06 -0400
>> Subject: Re: MAP_INPUT_RECORDS counter in the reducer
>> From: shahab.yunus@gmail.com
>> To: user@hadoop.apache.org
>>
>>
>> In the normal configuration, the issue here is that Reducers can start
>> before all the Maps have finished so it is not possible to get the number
>> (or make sense of it even if you are able to,)
>>
>> Having said that, you can specifically make sure that Reducers don't
>> start until all your maps have completed. It will of course slow down your
>> job. I don't know whether with this option it will work or not, but you can
>> try (until experts have some advise already.)
>>
>> Regards,
>> Shahab
>>
>>
>> On Tue, Sep 17, 2013 at 6:09 AM, Yaron Gonen <ya...@gmail.com>wrote:
>>
>> Hi,
>> Is there a way for the reducer to get the total number of input records
>> to the map phase?
>> For example, I want the reducer to normalize a sum by dividing it in the
>> number of records. I tried getting the value of that counter by using the
>> line:
>>
>> context.getCounter(Task.Counter.MAP_INPUT_RECORDS).getValue();
>>
>> in the reducer code, but I got 0.
>>
>> Thanks!
>> Yaron
>>
>>
>>
>

Re: MAP_INPUT_RECORDS counter in the reducer

Posted by Shahab Yunus <sh...@gmail.com>.
Yes, you are correct that copying phase starts while the maps are running
and the reduce function is not called until everything is done but aren't
the Reduce tasks are also already 'initialized' at this point? Which, as
far as I know and might be wrong, will not have the map input records
counter (and was my point)?

Regards,
Shahab


On Tue, Sep 17, 2013 at 11:09 PM, Rahul Bhattacharjee <
rahul.rec.dgp@gmail.com> wrote:

> Shahab,
>
> One question - You mentioned - "In the normal configuration, the issue
> here is that Reducers can start before all the Maps have finished so it is
> not possible to get the number (or make sense of it even if you are able
> to,)"
>
> I think , reducers would start copying the data form the completed map
> tasks , but will not start the actual reduce process until data from all
> the mappers are pulled in.
>
> So , the call to the counter Yorn has made might work.If invoked from the
> reduce method.
>
> Thanks,
> Rahul
>
>
>
> On Wed, Sep 18, 2013 at 7:38 AM, java8964 java8964 <ja...@hotmail.com>wrote:
>
>> Or you do the calculation in the reducer close() method, even though I am
>> not sure in the reducer you can get the Mapper's count.
>>
>> But even you can't, here is what can do:
>> 1) Save the JobConf reference in your Mapper conf metehod
>> 2) Store the Map_INPUT_RECORDS counter in the configuration object as
>> your own properties, in the close() method of the mapper
>> 3) Retrieve that property in the reducer close() method, then you have
>> both numbers at that time.
>>
>> Yong
>>
>> ------------------------------
>> Date: Tue, 17 Sep 2013 09:49:06 -0400
>> Subject: Re: MAP_INPUT_RECORDS counter in the reducer
>> From: shahab.yunus@gmail.com
>> To: user@hadoop.apache.org
>>
>>
>> In the normal configuration, the issue here is that Reducers can start
>> before all the Maps have finished so it is not possible to get the number
>> (or make sense of it even if you are able to,)
>>
>> Having said that, you can specifically make sure that Reducers don't
>> start until all your maps have completed. It will of course slow down your
>> job. I don't know whether with this option it will work or not, but you can
>> try (until experts have some advise already.)
>>
>> Regards,
>> Shahab
>>
>>
>> On Tue, Sep 17, 2013 at 6:09 AM, Yaron Gonen <ya...@gmail.com>wrote:
>>
>> Hi,
>> Is there a way for the reducer to get the total number of input records
>> to the map phase?
>> For example, I want the reducer to normalize a sum by dividing it in the
>> number of records. I tried getting the value of that counter by using the
>> line:
>>
>> context.getCounter(Task.Counter.MAP_INPUT_RECORDS).getValue();
>>
>> in the reducer code, but I got 0.
>>
>> Thanks!
>> Yaron
>>
>>
>>
>

Re: MAP_INPUT_RECORDS counter in the reducer

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
Shahab,

One question - You mentioned - "In the normal configuration, the issue here
is that Reducers can start before all the Maps have finished so it is not
possible to get the number (or make sense of it even if you are able to,)"

I think , reducers would start copying the data form the completed map
tasks , but will not start the actual reduce process until data from all
the mappers are pulled in.

So , the call to the counter Yorn has made might work.If invoked from the
reduce method.

Thanks,
Rahul



On Wed, Sep 18, 2013 at 7:38 AM, java8964 java8964 <ja...@hotmail.com>wrote:

> Or you do the calculation in the reducer close() method, even though I am
> not sure in the reducer you can get the Mapper's count.
>
> But even you can't, here is what can do:
> 1) Save the JobConf reference in your Mapper conf metehod
> 2) Store the Map_INPUT_RECORDS counter in the configuration object as your
> own properties, in the close() method of the mapper
> 3) Retrieve that property in the reducer close() method, then you have
> both numbers at that time.
>
> Yong
>
> ------------------------------
> Date: Tue, 17 Sep 2013 09:49:06 -0400
> Subject: Re: MAP_INPUT_RECORDS counter in the reducer
> From: shahab.yunus@gmail.com
> To: user@hadoop.apache.org
>
>
> In the normal configuration, the issue here is that Reducers can start
> before all the Maps have finished so it is not possible to get the number
> (or make sense of it even if you are able to,)
>
> Having said that, you can specifically make sure that Reducers don't start
> until all your maps have completed. It will of course slow down your job. I
> don't know whether with this option it will work or not, but you can try
> (until experts have some advise already.)
>
> Regards,
> Shahab
>
>
> On Tue, Sep 17, 2013 at 6:09 AM, Yaron Gonen <ya...@gmail.com>wrote:
>
> Hi,
> Is there a way for the reducer to get the total number of input records to
> the map phase?
> For example, I want the reducer to normalize a sum by dividing it in the
> number of records. I tried getting the value of that counter by using the
> line:
>
> context.getCounter(Task.Counter.MAP_INPUT_RECORDS).getValue();
>
> in the reducer code, but I got 0.
>
> Thanks!
> Yaron
>
>
>

Re: MAP_INPUT_RECORDS counter in the reducer

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
Shahab,

One question - You mentioned - "In the normal configuration, the issue here
is that Reducers can start before all the Maps have finished so it is not
possible to get the number (or make sense of it even if you are able to,)"

I think , reducers would start copying the data form the completed map
tasks , but will not start the actual reduce process until data from all
the mappers are pulled in.

So , the call to the counter Yorn has made might work.If invoked from the
reduce method.

Thanks,
Rahul



On Wed, Sep 18, 2013 at 7:38 AM, java8964 java8964 <ja...@hotmail.com>wrote:

> Or you do the calculation in the reducer close() method, even though I am
> not sure in the reducer you can get the Mapper's count.
>
> But even you can't, here is what can do:
> 1) Save the JobConf reference in your Mapper conf metehod
> 2) Store the Map_INPUT_RECORDS counter in the configuration object as your
> own properties, in the close() method of the mapper
> 3) Retrieve that property in the reducer close() method, then you have
> both numbers at that time.
>
> Yong
>
> ------------------------------
> Date: Tue, 17 Sep 2013 09:49:06 -0400
> Subject: Re: MAP_INPUT_RECORDS counter in the reducer
> From: shahab.yunus@gmail.com
> To: user@hadoop.apache.org
>
>
> In the normal configuration, the issue here is that Reducers can start
> before all the Maps have finished so it is not possible to get the number
> (or make sense of it even if you are able to,)
>
> Having said that, you can specifically make sure that Reducers don't start
> until all your maps have completed. It will of course slow down your job. I
> don't know whether with this option it will work or not, but you can try
> (until experts have some advise already.)
>
> Regards,
> Shahab
>
>
> On Tue, Sep 17, 2013 at 6:09 AM, Yaron Gonen <ya...@gmail.com>wrote:
>
> Hi,
> Is there a way for the reducer to get the total number of input records to
> the map phase?
> For example, I want the reducer to normalize a sum by dividing it in the
> number of records. I tried getting the value of that counter by using the
> line:
>
> context.getCounter(Task.Counter.MAP_INPUT_RECORDS).getValue();
>
> in the reducer code, but I got 0.
>
> Thanks!
> Yaron
>
>
>

Re: MAP_INPUT_RECORDS counter in the reducer

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
Shahab,

One question - You mentioned - "In the normal configuration, the issue here
is that Reducers can start before all the Maps have finished so it is not
possible to get the number (or make sense of it even if you are able to,)"

I think , reducers would start copying the data form the completed map
tasks , but will not start the actual reduce process until data from all
the mappers are pulled in.

So , the call to the counter Yorn has made might work.If invoked from the
reduce method.

Thanks,
Rahul



On Wed, Sep 18, 2013 at 7:38 AM, java8964 java8964 <ja...@hotmail.com>wrote:

> Or you do the calculation in the reducer close() method, even though I am
> not sure in the reducer you can get the Mapper's count.
>
> But even you can't, here is what can do:
> 1) Save the JobConf reference in your Mapper conf metehod
> 2) Store the Map_INPUT_RECORDS counter in the configuration object as your
> own properties, in the close() method of the mapper
> 3) Retrieve that property in the reducer close() method, then you have
> both numbers at that time.
>
> Yong
>
> ------------------------------
> Date: Tue, 17 Sep 2013 09:49:06 -0400
> Subject: Re: MAP_INPUT_RECORDS counter in the reducer
> From: shahab.yunus@gmail.com
> To: user@hadoop.apache.org
>
>
> In the normal configuration, the issue here is that Reducers can start
> before all the Maps have finished so it is not possible to get the number
> (or make sense of it even if you are able to,)
>
> Having said that, you can specifically make sure that Reducers don't start
> until all your maps have completed. It will of course slow down your job. I
> don't know whether with this option it will work or not, but you can try
> (until experts have some advise already.)
>
> Regards,
> Shahab
>
>
> On Tue, Sep 17, 2013 at 6:09 AM, Yaron Gonen <ya...@gmail.com>wrote:
>
> Hi,
> Is there a way for the reducer to get the total number of input records to
> the map phase?
> For example, I want the reducer to normalize a sum by dividing it in the
> number of records. I tried getting the value of that counter by using the
> line:
>
> context.getCounter(Task.Counter.MAP_INPUT_RECORDS).getValue();
>
> in the reducer code, but I got 0.
>
> Thanks!
> Yaron
>
>
>

Re: MAP_INPUT_RECORDS counter in the reducer

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
Shahab,

One question - You mentioned - "In the normal configuration, the issue here
is that Reducers can start before all the Maps have finished so it is not
possible to get the number (or make sense of it even if you are able to,)"

I think , reducers would start copying the data form the completed map
tasks , but will not start the actual reduce process until data from all
the mappers are pulled in.

So , the call to the counter Yorn has made might work.If invoked from the
reduce method.

Thanks,
Rahul



On Wed, Sep 18, 2013 at 7:38 AM, java8964 java8964 <ja...@hotmail.com>wrote:

> Or you do the calculation in the reducer close() method, even though I am
> not sure in the reducer you can get the Mapper's count.
>
> But even you can't, here is what can do:
> 1) Save the JobConf reference in your Mapper conf metehod
> 2) Store the Map_INPUT_RECORDS counter in the configuration object as your
> own properties, in the close() method of the mapper
> 3) Retrieve that property in the reducer close() method, then you have
> both numbers at that time.
>
> Yong
>
> ------------------------------
> Date: Tue, 17 Sep 2013 09:49:06 -0400
> Subject: Re: MAP_INPUT_RECORDS counter in the reducer
> From: shahab.yunus@gmail.com
> To: user@hadoop.apache.org
>
>
> In the normal configuration, the issue here is that Reducers can start
> before all the Maps have finished so it is not possible to get the number
> (or make sense of it even if you are able to,)
>
> Having said that, you can specifically make sure that Reducers don't start
> until all your maps have completed. It will of course slow down your job. I
> don't know whether with this option it will work or not, but you can try
> (until experts have some advise already.)
>
> Regards,
> Shahab
>
>
> On Tue, Sep 17, 2013 at 6:09 AM, Yaron Gonen <ya...@gmail.com>wrote:
>
> Hi,
> Is there a way for the reducer to get the total number of input records to
> the map phase?
> For example, I want the reducer to normalize a sum by dividing it in the
> number of records. I tried getting the value of that counter by using the
> line:
>
> context.getCounter(Task.Counter.MAP_INPUT_RECORDS).getValue();
>
> in the reducer code, but I got 0.
>
> Thanks!
> Yaron
>
>
>

RE: MAP_INPUT_RECORDS counter in the reducer

Posted by java8964 java8964 <ja...@hotmail.com>.
Or you do the calculation in the reducer close() method, even though I am not sure in the reducer you can get the Mapper's count.
But even you can't, here is what can do:1) Save the JobConf reference in your Mapper conf metehod2) Store the Map_INPUT_RECORDS counter in the configuration object as your own properties, in the close() method of the mapper3) Retrieve that property in the reducer close() method, then you have both numbers at that time.
Yong

Date: Tue, 17 Sep 2013 09:49:06 -0400
Subject: Re: MAP_INPUT_RECORDS counter in the reducer
From: shahab.yunus@gmail.com
To: user@hadoop.apache.org

In the normal configuration, the issue here is that Reducers can start before all the Maps have finished so it is not possible to get the number (or make sense of it even if you are able to,)

Having said that, you can specifically make sure that Reducers don't start until all your maps have completed. It will of course slow down your job. I don't know whether with this option it will work or not, but you can try (until experts have some advise already.)

Regards,Shahab

On Tue, Sep 17, 2013 at 6:09 AM, Yaron Gonen <ya...@gmail.com> wrote:

Hi,Is there a way for the reducer to get the total number of input records to the map phase?
For example, I want the reducer to normalize a sum by dividing it in the number of records. I tried getting the value of that counter by using the line:

context.getCounter(Task.Counter.MAP_INPUT_RECORDS).getValue();

in the reducer code, but I got 0.


Thanks!Yaron

 		 	   		  

RE: MAP_INPUT_RECORDS counter in the reducer

Posted by java8964 java8964 <ja...@hotmail.com>.
Or you do the calculation in the reducer close() method, even though I am not sure in the reducer you can get the Mapper's count.
But even you can't, here is what can do:1) Save the JobConf reference in your Mapper conf metehod2) Store the Map_INPUT_RECORDS counter in the configuration object as your own properties, in the close() method of the mapper3) Retrieve that property in the reducer close() method, then you have both numbers at that time.
Yong

Date: Tue, 17 Sep 2013 09:49:06 -0400
Subject: Re: MAP_INPUT_RECORDS counter in the reducer
From: shahab.yunus@gmail.com
To: user@hadoop.apache.org

In the normal configuration, the issue here is that Reducers can start before all the Maps have finished so it is not possible to get the number (or make sense of it even if you are able to,)

Having said that, you can specifically make sure that Reducers don't start until all your maps have completed. It will of course slow down your job. I don't know whether with this option it will work or not, but you can try (until experts have some advise already.)

Regards,Shahab

On Tue, Sep 17, 2013 at 6:09 AM, Yaron Gonen <ya...@gmail.com> wrote:

Hi,Is there a way for the reducer to get the total number of input records to the map phase?
For example, I want the reducer to normalize a sum by dividing it in the number of records. I tried getting the value of that counter by using the line:

context.getCounter(Task.Counter.MAP_INPUT_RECORDS).getValue();

in the reducer code, but I got 0.


Thanks!Yaron

 		 	   		  

RE: MAP_INPUT_RECORDS counter in the reducer

Posted by java8964 java8964 <ja...@hotmail.com>.
Or you do the calculation in the reducer close() method, even though I am not sure in the reducer you can get the Mapper's count.
But even you can't, here is what can do:1) Save the JobConf reference in your Mapper conf metehod2) Store the Map_INPUT_RECORDS counter in the configuration object as your own properties, in the close() method of the mapper3) Retrieve that property in the reducer close() method, then you have both numbers at that time.
Yong

Date: Tue, 17 Sep 2013 09:49:06 -0400
Subject: Re: MAP_INPUT_RECORDS counter in the reducer
From: shahab.yunus@gmail.com
To: user@hadoop.apache.org

In the normal configuration, the issue here is that Reducers can start before all the Maps have finished so it is not possible to get the number (or make sense of it even if you are able to,)

Having said that, you can specifically make sure that Reducers don't start until all your maps have completed. It will of course slow down your job. I don't know whether with this option it will work or not, but you can try (until experts have some advise already.)

Regards,Shahab

On Tue, Sep 17, 2013 at 6:09 AM, Yaron Gonen <ya...@gmail.com> wrote:

Hi,Is there a way for the reducer to get the total number of input records to the map phase?
For example, I want the reducer to normalize a sum by dividing it in the number of records. I tried getting the value of that counter by using the line:

context.getCounter(Task.Counter.MAP_INPUT_RECORDS).getValue();

in the reducer code, but I got 0.


Thanks!Yaron

 		 	   		  

RE: MAP_INPUT_RECORDS counter in the reducer

Posted by java8964 java8964 <ja...@hotmail.com>.
Or you do the calculation in the reducer close() method, even though I am not sure in the reducer you can get the Mapper's count.
But even you can't, here is what can do:1) Save the JobConf reference in your Mapper conf metehod2) Store the Map_INPUT_RECORDS counter in the configuration object as your own properties, in the close() method of the mapper3) Retrieve that property in the reducer close() method, then you have both numbers at that time.
Yong

Date: Tue, 17 Sep 2013 09:49:06 -0400
Subject: Re: MAP_INPUT_RECORDS counter in the reducer
From: shahab.yunus@gmail.com
To: user@hadoop.apache.org

In the normal configuration, the issue here is that Reducers can start before all the Maps have finished so it is not possible to get the number (or make sense of it even if you are able to,)

Having said that, you can specifically make sure that Reducers don't start until all your maps have completed. It will of course slow down your job. I don't know whether with this option it will work or not, but you can try (until experts have some advise already.)

Regards,Shahab

On Tue, Sep 17, 2013 at 6:09 AM, Yaron Gonen <ya...@gmail.com> wrote:

Hi,Is there a way for the reducer to get the total number of input records to the map phase?
For example, I want the reducer to normalize a sum by dividing it in the number of records. I tried getting the value of that counter by using the line:

context.getCounter(Task.Counter.MAP_INPUT_RECORDS).getValue();

in the reducer code, but I got 0.


Thanks!Yaron

 		 	   		  

Re: MAP_INPUT_RECORDS counter in the reducer

Posted by Shahab Yunus <sh...@gmail.com>.
In the normal configuration, the issue here is that Reducers can start
before all the Maps have finished so it is not possible to get the number
(or make sense of it even if you are able to,)

Having said that, you can specifically make sure that Reducers don't start
until all your maps have completed. It will of course slow down your job. I
don't know whether with this option it will work or not, but you can try
(until experts have some advise already.)

Regards,
Shahab


On Tue, Sep 17, 2013 at 6:09 AM, Yaron Gonen <ya...@gmail.com> wrote:

> Hi,
> Is there a way for the reducer to get the total number of input records to
> the map phase?
> For example, I want the reducer to normalize a sum by dividing it in the
> number of records. I tried getting the value of that counter by using the
> line:
>
> context.getCounter(Task.Counter.MAP_INPUT_RECORDS).getValue();
>
> in the reducer code, but I got 0.
>
> Thanks!
> Yaron
>

Re: MAP_INPUT_RECORDS counter in the reducer

Posted by Shahab Yunus <sh...@gmail.com>.
In the normal configuration, the issue here is that Reducers can start
before all the Maps have finished so it is not possible to get the number
(or make sense of it even if you are able to,)

Having said that, you can specifically make sure that Reducers don't start
until all your maps have completed. It will of course slow down your job. I
don't know whether with this option it will work or not, but you can try
(until experts have some advise already.)

Regards,
Shahab


On Tue, Sep 17, 2013 at 6:09 AM, Yaron Gonen <ya...@gmail.com> wrote:

> Hi,
> Is there a way for the reducer to get the total number of input records to
> the map phase?
> For example, I want the reducer to normalize a sum by dividing it in the
> number of records. I tried getting the value of that counter by using the
> line:
>
> context.getCounter(Task.Counter.MAP_INPUT_RECORDS).getValue();
>
> in the reducer code, but I got 0.
>
> Thanks!
> Yaron
>

Re: MAP_INPUT_RECORDS counter in the reducer

Posted by Shahab Yunus <sh...@gmail.com>.
In the normal configuration, the issue here is that Reducers can start
before all the Maps have finished so it is not possible to get the number
(or make sense of it even if you are able to,)

Having said that, you can specifically make sure that Reducers don't start
until all your maps have completed. It will of course slow down your job. I
don't know whether with this option it will work or not, but you can try
(until experts have some advise already.)

Regards,
Shahab


On Tue, Sep 17, 2013 at 6:09 AM, Yaron Gonen <ya...@gmail.com> wrote:

> Hi,
> Is there a way for the reducer to get the total number of input records to
> the map phase?
> For example, I want the reducer to normalize a sum by dividing it in the
> number of records. I tried getting the value of that counter by using the
> line:
>
> context.getCounter(Task.Counter.MAP_INPUT_RECORDS).getValue();
>
> in the reducer code, but I got 0.
>
> Thanks!
> Yaron
>

Re: MAP_INPUT_RECORDS counter in the reducer

Posted by Shahab Yunus <sh...@gmail.com>.
In the normal configuration, the issue here is that Reducers can start
before all the Maps have finished so it is not possible to get the number
(or make sense of it even if you are able to,)

Having said that, you can specifically make sure that Reducers don't start
until all your maps have completed. It will of course slow down your job. I
don't know whether with this option it will work or not, but you can try
(until experts have some advise already.)

Regards,
Shahab


On Tue, Sep 17, 2013 at 6:09 AM, Yaron Gonen <ya...@gmail.com> wrote:

> Hi,
> Is there a way for the reducer to get the total number of input records to
> the map phase?
> For example, I want the reducer to normalize a sum by dividing it in the
> number of records. I tried getting the value of that counter by using the
> line:
>
> context.getCounter(Task.Counter.MAP_INPUT_RECORDS).getValue();
>
> in the reducer code, but I got 0.
>
> Thanks!
> Yaron
>