You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by rabbit_cheng <ra...@126.com> on 2011/11/30 10:36:15 UTC

how to access a mapper counter in reducer

 I have created a counter in mapper to count something, I wanna get the counter's value in reducer phase, the code segment is as follow:

public class MM extends Mapper<LongWritable, Text, Text, Text> {
    static enum TEST{ pt }
    @Override
    public void map(LongWritable key, Text values, Context context) throws IOException, InterruptedException {
        context.getCounter(TEST.pt).increment(1);
    }
}
public class KMeansReducer extends Reducer<Text, Text, Text, Text> {
    @Override
    protected void setup(Context context) throws IOException,  InterruptedException {
        long ptValue=context.getCounter(MM.TEST.pt).getValue();
    }
}
but what I get is always 0, i.e., the value of variable ptValue is always 0.
Does anybody know how to access a mapper counter in reducer?


Re: how to access a mapper counter in reducer

Posted by Robert Evans <ev...@yahoo-inc.com>.
Praveen,

http://wiki.apache.org/hadoop/HowToContribute

is a good place to help you get started with creating the patch.  Once you have code written I am happy to help review it.

--Bobby Evans

On 12/9/11 9:00 PM, "Praveen Sripati" <pr...@gmail.com> wrote:

Robert,

I will take a shot at it. I think it would be about writing a custom comparator and a partitioner, reading some config parameters and sending the counters as key/value pairs to the reducers. It shouldn't be that difficult.

If I am stuck, I will post in the forum. I will also know how to create a patch.

Regards,
Praveen

On Thu, Dec 8, 2011 at 9:45 PM, Robert Evans <ev...@yahoo-inc.com> wrote:
Sorry I have not responded sooner I have had a number of fires at work to put out, and I haven't been keeping up with the user mailing lists.  The code I did before was very specific to the task I was working on, and it was an ugly hack because I did not bother with the comparator, I already knew there was only a small predefined set of keys so I just output one set of metadata data for each key.

I would be happy to put something like this into the map/reduce framework.  I have filed https://issues.apache.org/jira/browse/MAPREDUCE-3520 for this. I just don't know when I will have the time to do that, especially with my work on the 0.23 release.  I'll also talk to my management to see if they want to allow me to work on this during work, or if it will have to be in my spare time.  Please feel free to comment on the JIRA or vote for it if you feel that it is something that you want done.  Or if you feel comfortable helping out perhaps you could take a first crack at it.

Thanks,

Bobby Evans


On 12/6/11 9:14 AM, "Mapred Learn" <mapred.learn@gmail.com <ht...@gmail.com> > wrote:

Hi Praveen,
Could you share here so that we can use ?

Thanks,

Sent from my iPhone

On Dec 6, 2011, at 6:29 AM, Praveen Sripati <praveensripati@gmail.com <ht...@gmail.com> > wrote:

Robert,

> I have made the above thing work.

Any plans to make it into the Hadoop framework. There had been similar queries about it in other forums also. Need any help testing/documenting or anything, please let me know.

Regards,
Praveen

On Sat, Dec 3, 2011 at 2:34 AM, Robert Evans <evans@yahoo-inc.com <ht...@yahoo-inc.com>  <ma...@yahoo-inc.com> > wrote:
Anurag,

The current set of counter APIs from within Map or Reduce process are write only.  They are not intended to be used for reading data from other tasks.  They are there to be used for collecting statistics about the job as a whole.  If you use too many of them the performance of the system as a whole can get very bad, because they are stored on the JobTracker in memory.  Also there is the potential that a map task that has finished "successfully" can later fail if the node it is running on dies before all of the map output can be fetched by all of the reducers.  This could result in a reducer reading in counter data that is only partial or out of date.  You may be able to access it through the job API  but I would not recommend it and I think there may be some issues with security if you have security enabled, but I don't know for sure.

If you have an optimization that really needs summary data from each mapper in all reducers then you should do it a map/reduce way.   Output a special key/value pair when a mapper finishes for each reducer with the statistics in it.  You can know how many reducers there are because that is set in the configuration.  You then need a special partitioner to recognize those summary key/value pairs and make sure that they each go to the proper reducer.  You also need a special compairitor to make sure that these special keys are the very first ones read by the reducer so it can have the data before processing anything else.

I would also recommend that you don't try to store this data in HDFS.  You can very easily do a DDOS on the namenode on a large cluster, and then your ops will yell at you as they did with me before I stopped doing it.  I have made the above thing work.  It is just a lot of work to do it right.

--Bobby Evans



On 12/1/11 1:18 PM, "Markus Jelsma" <markus.jelsma@openindex.io <ht...@openindex.io>  <ma...@openindex.io> > wrote:

Can access it via the Job API?

http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapreduce/Job.html#getCounters%28%29 <http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapreduce/Job.html#getCounters%28%29>

> Hi,

> I have a similar query.

>

> Infact, I sent it yesterday and waiting for anybody's response who might

> have done it.

>

>

> Thanks,

> Anurag Tangri

>

> 2011/11/30 rabbit_cheng <rabbit_cheng@126.com <ht...@126.com>  <ma...@126.com> >


>

> >  I have created a counter in mapper to count something, I wanna get the

> >

> > counter's value in reducer phase, the code segment is as follow:

> >

> > public class MM extends Mapper<LongWritable, Text, Text, Text> {

> >

> >     static enum TEST{ pt }

> >     @Override

> >     public void map(LongWritable key, Text values, Context context)

> >     throws

> >

> > IOException, InterruptedException {

> >

> >         context.getCounter(TEST.pt <http://TEST.pt> ).increment(1);


> >

> >     }

> >

> > }

> > public class KMeansReducer extends Reducer<Text, Text, Text, Text> {

> >

> >     @Override

> >     protected void setup(Context context) throws IOException,

> >

> > InterruptedException {

> >

> >         long ptValue=context.getCounter(MM.TEST.pt <http://MM.TEST.pt>  <http://MM.TEST.pt>  <http://MM.TEST.pt>  <http://mm.test.pt/ <http://mm.test.pt/> >


> >

> > ).getValue();

> >

> >     }

> >

> > }

> > but what I get is always 0, i.e., the value of variable ptValue is always

> > 0.

> > Does anybody know how to access a mapper counter in reducer?






Re: how to access a mapper counter in reducer

Posted by Praveen Sripati <pr...@gmail.com>.
Robert,

I will take a shot at it. I think it would be about writing a custom
comparator and a partitioner, reading some config parameters and sending
the counters as key/value pairs to the reducers. It shouldn't be that
difficult.

If I am stuck, I will post in the forum. I will also know how to create a
patch.

Regards,
Praveen

On Thu, Dec 8, 2011 at 9:45 PM, Robert Evans <ev...@yahoo-inc.com> wrote:

>  Sorry I have not responded sooner I have had a number of fires at work
> to put out, and I haven’t been keeping up with the user mailing lists.  The
> code I did before was very specific to the task I was working on, and it
> was an ugly hack because I did not bother with the comparator, I already
> knew there was only a small predefined set of keys so I just output one set
> of metadata data for each key.
>
> I would be happy to put something like this into the map/reduce framework.
>  I have filed https://issues.apache.org/jira/browse/MAPREDUCE-3520 for
> this. I just don’t know when I will have the time to do that, especially
> with my work on the 0.23 release.  I’ll also talk to my management to see
> if they want to allow me to work on this during work, or if it will have to
> be in my spare time.  Please feel free to comment on the JIRA or vote for
> it if you feel that it is something that you want done.  Or if you feel
> comfortable helping out perhaps you could take a first crack at it.
>
> Thanks,
>
> Bobby Evans
>
>
> On 12/6/11 9:14 AM, "Mapred Learn" <ma...@gmail.com> wrote:
>
> Hi Praveen,
> Could you share here so that we can use ?
>
> Thanks,
>
> Sent from my iPhone
>
> On Dec 6, 2011, at 6:29 AM, Praveen Sripati <pr...@gmail.com>
> wrote:
>
> Robert,
>
> > I have made the above thing work.
>
> Any plans to make it into the Hadoop framework. There had been similar
> queries about it in other forums also. Need any help testing/documenting or
> anything, please let me know.
>
> Regards,
> Praveen
>
> On Sat, Dec 3, 2011 at 2:34 AM, Robert Evans <evans@yahoo-inc.com <
> mailto:evans@yahoo-inc.com <ev...@yahoo-inc.com>> > wrote:
>
> Anurag,
>
> The current set of counter APIs from within Map or Reduce process are
> write only.  They are not intended to be used for reading data from other
> tasks.  They are there to be used for collecting statistics about the job
> as a whole.  If you use too many of them the performance of the system as a
> whole can get very bad, because they are stored on the JobTracker in
> memory.  Also there is the potential that a map task that has finished
> “successfully” can later fail if the node it is running on dies before all
> of the map output can be fetched by all of the reducers.  This could result
> in a reducer reading in counter data that is only partial or out of date.
>  You may be able to access it through the job API  but I would not
> recommend it and I think there may be some issues with security if you have
> security enabled, but I don’t know for sure.
>
> If you have an optimization that really needs summary data from each
> mapper in all reducers then you should do it a map/reduce way.   Output a
> special key/value pair when a mapper finishes for each reducer with the
> statistics in it.  You can know how many reducers there are because that is
> set in the configuration.  You then need a special partitioner to recognize
> those summary key/value pairs and make sure that they each go to the proper
> reducer.  You also need a special compairitor to make sure that these
> special keys are the very first ones read by the reducer so it can have the
> data before processing anything else.
>
> I would also recommend that you don’t try to store this data in HDFS.  You
> can very easily do a DDOS on the namenode on a large cluster, and then your
> ops will yell at you as they did with me before I stopped doing it.  I have
> made the above thing work.  It is just a lot of work to do it right.
>
> --Bobby Evans
>
>
>
> On 12/1/11 1:18 PM, "Markus Jelsma" <markus.jelsma@openindex.io <
> mailto:markus.jelsma@openindex.io <ma...@openindex.io>> > wrote:
>
> Can access it via the Job API?
>
>
> http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapreduce/Job.html#getCounters%28%29<
> http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapreduce/Job.html#getCounters%28%29>
>
>
> > Hi,
>
> > I have a similar query.
>
> >
>
> > Infact, I sent it yesterday and waiting for anybody's response who might
>
> > have done it.
>
> >
>
> >
>
> > Thanks,
>
> > Anurag Tangri
>
> >
>
> > 2011/11/30 rabbit_cheng <rabbit_cheng@126.com <
> mailto:rabbit_cheng@126.com <ra...@126.com>> >
>
>
> >
>
> > >  I have created a counter in mapper to count something, I wanna get the
>
> > >
>
> > > counter's value in reducer phase, the code segment is as follow:
>
> > >
>
> > > public class MM extends Mapper<LongWritable, Text, Text, Text> {
>
> > >
>
> > >     static enum TEST{ pt }
>
> > >     @Override
>
> > >     public void map(LongWritable key, Text values, Context context)
>
> > >     throws
>
> > >
>
> > > IOException, InterruptedException {
>
> > >
>
> > >         context.getCounter(TEST.pt <http://TEST.pt> ).increment(1);
>
>
> > >
>
> > >     }
>
> > >
>
> > > }
>
> > > public class KMeansReducer extends Reducer<Text, Text, Text, Text> {
>
> > >
>
> > >     @Override
>
> > >     protected void setup(Context context) throws IOException,
>
> > >
>
> > > InterruptedException {
>
> > >
>
> > >         long ptValue=context.getCounter(MM.TEST.pt <http://MM.TEST.pt>
>  <http://MM.TEST.pt>  <http://mm.test.pt/ <http://mm.test.pt/> >
>
>
> > >
>
> > > ).getValue();
>
> > >
>
> > >     }
>
> > >
>
> > > }
>
> > > but what I get is always 0, i.e., the value of variable ptValue is
> always
>
> > > 0.
>
> > > Does anybody know how to access a mapper counter in reducer?
>
>
>
>

Re: how to access a mapper counter in reducer

Posted by Robert Evans <ev...@yahoo-inc.com>.
Sorry I have not responded sooner I have had a number of fires at work to put out, and I haven't been keeping up with the user mailing lists.  The code I did before was very specific to the task I was working on, and it was an ugly hack because I did not bother with the comparator, I already knew there was only a small predefined set of keys so I just output one set of metadata data for each key.

I would be happy to put something like this into the map/reduce framework.  I have filed https://issues.apache.org/jira/browse/MAPREDUCE-3520 for this. I just don't know when I will have the time to do that, especially with my work on the 0.23 release.  I'll also talk to my management to see if they want to allow me to work on this during work, or if it will have to be in my spare time.  Please feel free to comment on the JIRA or vote for it if you feel that it is something that you want done.  Or if you feel comfortable helping out perhaps you could take a first crack at it.

Thanks,

Bobby Evans

On 12/6/11 9:14 AM, "Mapred Learn" <ma...@gmail.com> wrote:

Hi Praveen,
Could you share here so that we can use ?

Thanks,

Sent from my iPhone

On Dec 6, 2011, at 6:29 AM, Praveen Sripati <pr...@gmail.com> wrote:

Robert,

> I have made the above thing work.

Any plans to make it into the Hadoop framework. There had been similar queries about it in other forums also. Need any help testing/documenting or anything, please let me know.

Regards,
Praveen

On Sat, Dec 3, 2011 at 2:34 AM, Robert Evans <evans@yahoo-inc.com <ma...@yahoo-inc.com> > wrote:
Anurag,

The current set of counter APIs from within Map or Reduce process are write only.  They are not intended to be used for reading data from other tasks.  They are there to be used for collecting statistics about the job as a whole.  If you use too many of them the performance of the system as a whole can get very bad, because they are stored on the JobTracker in memory.  Also there is the potential that a map task that has finished "successfully" can later fail if the node it is running on dies before all of the map output can be fetched by all of the reducers.  This could result in a reducer reading in counter data that is only partial or out of date.  You may be able to access it through the job API  but I would not recommend it and I think there may be some issues with security if you have security enabled, but I don't know for sure.

If you have an optimization that really needs summary data from each mapper in all reducers then you should do it a map/reduce way.   Output a special key/value pair when a mapper finishes for each reducer with the statistics in it.  You can know how many reducers there are because that is set in the configuration.  You then need a special partitioner to recognize those summary key/value pairs and make sure that they each go to the proper reducer.  You also need a special compairitor to make sure that these special keys are the very first ones read by the reducer so it can have the data before processing anything else.

I would also recommend that you don't try to store this data in HDFS.  You can very easily do a DDOS on the namenode on a large cluster, and then your ops will yell at you as they did with me before I stopped doing it.  I have made the above thing work.  It is just a lot of work to do it right.

--Bobby Evans



On 12/1/11 1:18 PM, "Markus Jelsma" <markus.jelsma@openindex.io <ma...@openindex.io> > wrote:

Can access it via the Job API?

http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapreduce/Job.html#getCounters%28%29 <http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapreduce/Job.html#getCounters%28%29>

> Hi,

> I have a similar query.

>

> Infact, I sent it yesterday and waiting for anybody's response who might

> have done it.

>

>

> Thanks,

> Anurag Tangri

>

> 2011/11/30 rabbit_cheng <rabbit_cheng@126.com <ma...@126.com> >

>

> >  I have created a counter in mapper to count something, I wanna get the

> >

> > counter's value in reducer phase, the code segment is as follow:

> >

> > public class MM extends Mapper<LongWritable, Text, Text, Text> {

> >

> >     static enum TEST{ pt }

> >     @Override

> >     public void map(LongWritable key, Text values, Context context)

> >     throws

> >

> > IOException, InterruptedException {

> >

> >         context.getCounter(TEST.pt <http://TEST.pt> ).increment(1);

> >

> >     }

> >

> > }

> > public class KMeansReducer extends Reducer<Text, Text, Text, Text> {

> >

> >     @Override

> >     protected void setup(Context context) throws IOException,

> >

> > InterruptedException {

> >

> >         long ptValue=context.getCounter(MM.TEST.pt <http://MM.TEST.pt>  <http://MM.TEST.pt>  <http://mm.test.pt/ <http://mm.test.pt/> >

> >

> > ).getValue();

> >

> >     }

> >

> > }

> > but what I get is always 0, i.e., the value of variable ptValue is always

> > 0.

> > Does anybody know how to access a mapper counter in reducer?




Re: how to access a mapper counter in reducer

Posted by Mapred Learn <ma...@gmail.com>.
Hi Praveen,
Could you share here so that we can use ?

Thanks,

Sent from my iPhone

On Dec 6, 2011, at 6:29 AM, Praveen Sripati <pr...@gmail.com> wrote:

> Robert,
> 
> > I have made the above thing work.
> 
> Any plans to make it into the Hadoop framework. There had been similar queries about it in other forums also. Need any help testing/documenting or anything, please let me know.
> 
> Regards,
> Praveen
> 
> On Sat, Dec 3, 2011 at 2:34 AM, Robert Evans <ev...@yahoo-inc.com> wrote:
> Anurag,
> 
> The current set of counter APIs from within Map or Reduce process are write only.  They are not intended to be used for reading data from other tasks.  They are there to be used for collecting statistics about the job as a whole.  If you use too many of them the performance of the system as a whole can get very bad, because they are stored on the JobTracker in memory.  Also there is the potential that a map task that has finished “successfully” can later fail if the node it is running on dies before all of the map output can be fetched by all of the reducers.  This could result in a reducer reading in counter data that is only partial or out of date.  You may be able to access it through the job API  but I would not recommend it and I think there may be some issues with security if you have security enabled, but I don’t know for sure.
> 
> If you have an optimization that really needs summary data from each mapper in all reducers then you should do it a map/reduce way.   Output a special key/value pair when a mapper finishes for each reducer with the statistics in it.  You can know how many reducers there are because that is set in the configuration.  You then need a special partitioner to recognize those summary key/value pairs and make sure that they each go to the proper reducer.  You also need a special compairitor to make sure that these special keys are the very first ones read by the reducer so it can have the data before processing anything else.
> 
> I would also recommend that you don’t try to store this data in HDFS.  You can very easily do a DDOS on the namenode on a large cluster, and then your ops will yell at you as they did with me before I stopped doing it.  I have made the above thing work.  It is just a lot of work to do it right. 
> 
> --Bobby Evans
> 
> 
> 
> On 12/1/11 1:18 PM, "Markus Jelsma" <ma...@openindex.io> wrote:
> 
> Can access it via the Job API?
> 
> http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapreduce/Job.html#getCounters%28%29
> 
> > Hi,
> 
> > I have a similar query.
> 
> > 
> 
> > Infact, I sent it yesterday and waiting for anybody's response who might
> 
> > have done it.
> 
> > 
> 
> > 
> 
> > Thanks,
> 
> > Anurag Tangri
> 
> > 
> 
> > 2011/11/30 rabbit_cheng <ra...@126.com>
> 
> > 
> 
> > >  I have created a counter in mapper to count something, I wanna get the
> 
> > > 
> 
> > > counter's value in reducer phase, the code segment is as follow:
> 
> > > 
> 
> > > public class MM extends Mapper<LongWritable, Text, Text, Text> {
> 
> > > 
> 
> > >     static enum TEST{ pt }
> 
> > >     @Override
> 
> > >     public void map(LongWritable key, Text values, Context context)
> 
> > >     throws
> 
> > > 
> 
> > > IOException, InterruptedException {
> 
> > > 
> 
> > >         context.getCounter(TEST.pt).increment(1);
> 
> > >     
> 
> > >     }
> 
> > > 
> 
> > > }
> 
> > > public class KMeansReducer extends Reducer<Text, Text, Text, Text> {
> 
> > > 
> 
> > >     @Override
> 
> > >     protected void setup(Context context) throws IOException,
> 
> > > 
> 
> > > InterruptedException {
> 
> > > 
> 
> > >         long ptValue=context.getCounter(MM.TEST.pt <http://mm.test.pt/>
> 
> > > 
> 
> > > ).getValue();
> 
> > > 
> 
> > >     }
> 
> > > 
> 
> > > }
> 
> > > but what I get is always 0, i.e., the value of variable ptValue is always
> 
> > > 0.
> 
> > > Does anybody know how to access a mapper counter in reducer?
> 
> 

Re: how to access a mapper counter in reducer

Posted by Praveen Sripati <pr...@gmail.com>.
Robert,

> I have made the above thing work.

Any plans to make it into the Hadoop framework. There had been similar
queries about it in other forums also. Need any help testing/documenting or
anything, please let me know.

Regards,
Praveen

On Sat, Dec 3, 2011 at 2:34 AM, Robert Evans <ev...@yahoo-inc.com> wrote:

>  Anurag,
>
> The current set of counter APIs from within Map or Reduce process are
> write only.  They are not intended to be used for reading data from other
> tasks.  They are there to be used for collecting statistics about the job
> as a whole.  If you use too many of them the performance of the system as a
> whole can get very bad, because they are stored on the JobTracker in
> memory.  Also there is the potential that a map task that has finished
> “successfully” can later fail if the node it is running on dies before all
> of the map output can be fetched by all of the reducers.  This could result
> in a reducer reading in counter data that is only partial or out of date.
>  You may be able to access it through the job API  but I would not
> recommend it and I think there may be some issues with security if you have
> security enabled, but I don’t know for sure.
>
> If you have an optimization that really needs summary data from each
> mapper in all reducers then you should do it a map/reduce way.   Output a
> special key/value pair when a mapper finishes for each reducer with the
> statistics in it.  You can know how many reducers there are because that is
> set in the configuration.  You then need a special partitioner to recognize
> those summary key/value pairs and make sure that they each go to the proper
> reducer.  You also need a special compairitor to make sure that these
> special keys are the very first ones read by the reducer so it can have the
> data before processing anything else.
>
> I would also recommend that you don’t try to store this data in HDFS.  You
> can very easily do a DDOS on the namenode on a large cluster, and then your
> ops will yell at you as they did with me before I stopped doing it.  I have
> made the above thing work.  It is just a lot of work to do it right.
>
> --Bobby Evans
>
>
>
> On 12/1/11 1:18 PM, "Markus Jelsma" <ma...@openindex.io> wrote:
>
> Can access it via the Job API?
>
>
> http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapreduce/Job.html#getCounters%28%29
>
> > Hi,
>
> > I have a similar query.
>
> >
>
> > Infact, I sent it yesterday and waiting for anybody's response who might
>
> > have done it.
>
> >
>
> >
>
> > Thanks,
>
> > Anurag Tangri
>
> >
>
> > 2011/11/30 rabbit_cheng <ra...@126.com>
>
> >
>
> > >  I have created a counter in mapper to count something, I wanna get the
>
> > >
>
> > > counter's value in reducer phase, the code segment is as follow:
>
> > >
>
> > > public class MM extends Mapper<LongWritable, Text, Text, Text> {
>
> > >
>
> > >     static enum TEST{ pt }
>
> > >     @Override
>
> > >     public void map(LongWritable key, Text values, Context context)
>
> > >     throws
>
> > >
>
> > > IOException, InterruptedException {
>
> > >
>
> > >         context.getCounter(TEST.pt).increment(1);
>
> > >
>
> > >     }
>
> > >
>
> > > }
>
> > > public class KMeansReducer extends Reducer<Text, Text, Text, Text> {
>
> > >
>
> > >     @Override
>
> > >     protected void setup(Context context) throws IOException,
>
> > >
>
> > > InterruptedException {
>
> > >
>
> > >         long ptValue=context.getCounter(MM.TEST.pt <http://mm.test.pt/
> >
>
> > >
>
> > > ).getValue();
>
> > >
>
> > >     }
>
> > >
>
> > > }
>
> > > but what I get is always 0, i.e., the value of variable ptValue is
> always
>
> > > 0.
>
> > > Does anybody know how to access a mapper counter in reducer?
>
>

Re: how to access a mapper counter in reducer

Posted by Robert Evans <ev...@yahoo-inc.com>.
Anurag,

The current set of counter APIs from within Map or Reduce process are write only.  They are not intended to be used for reading data from other tasks.  They are there to be used for collecting statistics about the job as a whole.  If you use too many of them the performance of the system as a whole can get very bad, because they are stored on the JobTracker in memory.  Also there is the potential that a map task that has finished "successfully" can later fail if the node it is running on dies before all of the map output can be fetched by all of the reducers.  This could result in a reducer reading in counter data that is only partial or out of date.  You may be able to access it through the job API  but I would not recommend it and I think there may be some issues with security if you have security enabled, but I don't know for sure.

If you have an optimization that really needs summary data from each mapper in all reducers then you should do it a map/reduce way.   Output a special key/value pair when a mapper finishes for each reducer with the statistics in it.  You can know how many reducers there are because that is set in the configuration.  You then need a special partitioner to recognize those summary key/value pairs and make sure that they each go to the proper reducer.  You also need a special compairitor to make sure that these special keys are the very first ones read by the reducer so it can have the data before processing anything else.

I would also recommend that you don't try to store this data in HDFS.  You can very easily do a DDOS on the namenode on a large cluster, and then your ops will yell at you as they did with me before I stopped doing it.  I have made the above thing work.  It is just a lot of work to do it right.

--Bobby Evans


On 12/1/11 1:18 PM, "Markus Jelsma" <ma...@openindex.io> wrote:

Can access it via the Job API?

http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapreduce/Job.html#getCounters%28%29

> Hi,

> I have a similar query.

>

> Infact, I sent it yesterday and waiting for anybody's response who might

> have done it.

>

>

> Thanks,

> Anurag Tangri

>

> 2011/11/30 rabbit_cheng <ra...@126.com>

>

> >  I have created a counter in mapper to count something, I wanna get the

> >

> > counter's value in reducer phase, the code segment is as follow:

> >

> > public class MM extends Mapper<LongWritable, Text, Text, Text> {

> >

> >     static enum TEST{ pt }

> >     @Override

> >     public void map(LongWritable key, Text values, Context context)

> >     throws

> >

> > IOException, InterruptedException {

> >

> >         context.getCounter(TEST.pt).increment(1);

> >

> >     }

> >

> > }

> > public class KMeansReducer extends Reducer<Text, Text, Text, Text> {

> >

> >     @Override

> >     protected void setup(Context context) throws IOException,

> >

> > InterruptedException {

> >

> >         long ptValue=context.getCounter(MM.TEST.pt <http://mm.test.pt/>

> >

> > ).getValue();

> >

> >     }

> >

> > }

> > but what I get is always 0, i.e., the value of variable ptValue is always

> > 0.

> > Does anybody know how to access a mapper counter in reducer?


Re: how to access a mapper counter in reducer

Posted by Markus Jelsma <ma...@openindex.io>.
Can access it via the Job API?

http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapreduce/Job.html#getCounters%28%29

> Hi,
> I have a similar query.
> 
> Infact, I sent it yesterday and waiting for anybody's response who might
> have done it.
> 
> 
> Thanks,
> Anurag Tangri
> 
> 2011/11/30 rabbit_cheng <ra...@126.com>
> 
> >  I have created a counter in mapper to count something, I wanna get the
> > 
> > counter's value in reducer phase, the code segment is as follow:
> > 
> > public class MM extends Mapper<LongWritable, Text, Text, Text> {
> > 
> >     static enum TEST{ pt }
> >     @Override
> >     public void map(LongWritable key, Text values, Context context)
> >     throws
> > 
> > IOException, InterruptedException {
> > 
> >         context.getCounter(TEST.pt).increment(1);
> >     
> >     }
> > 
> > }
> > public class KMeansReducer extends Reducer<Text, Text, Text, Text> {
> > 
> >     @Override
> >     protected void setup(Context context) throws IOException,
> > 
> > InterruptedException {
> > 
> >         long ptValue=context.getCounter(MM.TEST.pt <http://mm.test.pt/>
> > 
> > ).getValue();
> > 
> >     }
> > 
> > }
> > but what I get is always 0, i.e., the value of variable ptValue is always
> > 0.
> > Does anybody know how to access a mapper counter in reducer?

Re: how to access a mapper counter in reducer

Posted by Mapred Learn <ma...@gmail.com>.
Hi,
I have a similar query.

Infact, I sent it yesterday and waiting for anybody's response who might
have done it.


Thanks,
Anurag Tangri

2011/11/30 rabbit_cheng <ra...@126.com>

>  I have created a counter in mapper to count something, I wanna get the
> counter's value in reducer phase, the code segment is as follow:
>
> public class MM extends Mapper<LongWritable, Text, Text, Text> {
>     static enum TEST{ pt }
>     @Override
>     public void map(LongWritable key, Text values, Context context) throws
> IOException, InterruptedException {
>         context.getCounter(TEST.pt).increment(1);
>     }
> }
> public class KMeansReducer extends Reducer<Text, Text, Text, Text> {
>     @Override
>     protected void setup(Context context) throws IOException,
> InterruptedException {
>         long ptValue=context.getCounter(MM.TEST.pt <http://mm.test.pt/>
> ).getValue();
>     }
> }
> but what I get is always 0, i.e., the value of variable ptValue is always
> 0.
> Does anybody know how to access a mapper counter in reducer?
>
>
>
>