You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Prasad GS <gs...@gmail.com> on 2012/11/06 11:45:00 UTC

combiner/reducer context in java class

Hi,

I'm setting my combiner and reducer to the same java class. Is there any
API that could tell me the context in which the java class is running after
the hadoop job is submitted to the cluster i.e whether the class is running
as a combiner or a reducer. I need this information to change the
OutputCollector in the java class. Also I do not want to duplicate the same
code as combiner and reducer with only the OutputCollector changed.

Thanks,
Prasad

Re: combiner/reducer context in java class

Posted by Harsh J <ha...@cloudera.com>.

Hi Bertrand,

I believe the framework does give a few combiner statistics of its own
(like in/out records and such). If your combiner class is separate,
then instantiating counters in it with apt naming should address the
need, since the class itself will be separately instantiated.

Even if we looked at the task ID, its currently hard to tell if its
within a combiner mode or not. I can only think of hacky ways like
polling from within if the combiner input records counter is changing
with each call (then its in combiner) or remains as-is (then its a
reducer). The separate class way is much more elegant here since you
do want a difference in behavior, and you have inheritance at your
disposal to prevent duplication.

On Tue, Nov 6, 2012 at 5:12 PM, Bertrand Dechoux <de...@gmail.com> wrote:
> I agree that the behaviour shouldn't be dynamically changed at runtime with
> regard to the class being use as a Combiner or a Reducer but someone may
> want to produce counters in order to have an overview of what is happening
> (sanity check). But you really would like to be able to not aggregate the
> same counters between the Combiner and the Reducer. How would someone do
> that? ie you can introduce a combine/reduce keyword in the counters name but
> how would you detect which instantiation is used in which case? I guess
> somehow with the task name it might be possible.. Is there a better way?
>
> BUT if you look at the jobtracker counters summary there is a distinction
> between map and reduce values. Maybe it is enough in this case? (I have
> never used counters inside a combiner so I don't know.)
>
> Regards
>
> Bertrand
>
>
> On Tue, Nov 6, 2012 at 12:29 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Hi Prasad,
>>
>> My reply inline.
>>
>> On Tue, Nov 6, 2012 at 4:15 PM, Prasad GS <gs...@gmail.com> wrote:
>> > Hi,
>> >
>> > I'm setting my combiner and reducer to the same java class. Is there any
>> > API
>> > that could tell me the context in which the java class is running after
>> > the
>> > hadoop job is submitted to the cluster i.e whether the class is running
>> > as a
>> > combiner or a reducer.
>>
>> A combiner may run both at the map end and at the reduce end. Even if
>> it is possible to do it, it isn't a healthy idea to have the method's
>> logic detect if its running as a reducer or as a combiner.
>>
>> > I need this information to change the OutputCollector
>> > in the java class. Also I do not want to duplicate the same code as
>> > combiner
>> > and reducer with only the OutputCollector changed.
>>
>> Why do you think it would require duplication? Your logic can be built
>> in smaller, independent, reusable functions within the same class, and
>> just applied differently for an implementation of Reducer class and an
>> implementation of the Combiner class. This way, you repeat nothing.
>>
>> > Thanks,
>> > Prasad
>>
>>
>>
>> --
>> Harsh J
>
>
>
>
> --
> Bertrand Dechoux



-- 
Harsh J

Re: combiner/reducer context in java class

Posted by Harsh J <ha...@cloudera.com>.

Hi Bertrand,

I believe the framework does give a few combiner statistics of its own
(like in/out records and such). If your combiner class is separate,
then instantiating counters in it with apt naming should address the
need, since the class itself will be separately instantiated.

Even if we looked at the task ID, its currently hard to tell if its
within a combiner mode or not. I can only think of hacky ways like
polling from within if the combiner input records counter is changing
with each call (then its in combiner) or remains as-is (then its a
reducer). The separate class way is much more elegant here since you
do want a difference in behavior, and you have inheritance at your
disposal to prevent duplication.

On Tue, Nov 6, 2012 at 5:12 PM, Bertrand Dechoux <de...@gmail.com> wrote:
> I agree that the behaviour shouldn't be dynamically changed at runtime with
> regard to the class being use as a Combiner or a Reducer but someone may
> want to produce counters in order to have an overview of what is happening
> (sanity check). But you really would like to be able to not aggregate the
> same counters between the Combiner and the Reducer. How would someone do
> that? ie you can introduce a combine/reduce keyword in the counters name but
> how would you detect which instantiation is used in which case? I guess
> somehow with the task name it might be possible.. Is there a better way?
>
> BUT if you look at the jobtracker counters summary there is a distinction
> between map and reduce values. Maybe it is enough in this case? (I have
> never used counters inside a combiner so I don't know.)
>
> Regards
>
> Bertrand
>
>
> On Tue, Nov 6, 2012 at 12:29 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Hi Prasad,
>>
>> My reply inline.
>>
>> On Tue, Nov 6, 2012 at 4:15 PM, Prasad GS <gs...@gmail.com> wrote:
>> > Hi,
>> >
>> > I'm setting my combiner and reducer to the same java class. Is there any
>> > API
>> > that could tell me the context in which the java class is running after
>> > the
>> > hadoop job is submitted to the cluster i.e whether the class is running
>> > as a
>> > combiner or a reducer.
>>
>> A combiner may run both at the map end and at the reduce end. Even if
>> it is possible to do it, it isn't a healthy idea to have the method's
>> logic detect if its running as a reducer or as a combiner.
>>
>> > I need this information to change the OutputCollector
>> > in the java class. Also I do not want to duplicate the same code as
>> > combiner
>> > and reducer with only the OutputCollector changed.
>>
>> Why do you think it would require duplication? Your logic can be built
>> in smaller, independent, reusable functions within the same class, and
>> just applied differently for an implementation of Reducer class and an
>> implementation of the Combiner class. This way, you repeat nothing.
>>
>> > Thanks,
>> > Prasad
>>
>>
>>
>> --
>> Harsh J
>
>
>
>
> --
> Bertrand Dechoux



-- 
Harsh J

Re: combiner/reducer context in java class

Posted by Harsh J <ha...@cloudera.com>.

Hi Bertrand,

I believe the framework does give a few combiner statistics of its own
(like in/out records and such). If your combiner class is separate,
then instantiating counters in it with apt naming should address the
need, since the class itself will be separately instantiated.

Even if we looked at the task ID, its currently hard to tell if its
within a combiner mode or not. I can only think of hacky ways like
polling from within if the combiner input records counter is changing
with each call (then its in combiner) or remains as-is (then its a
reducer). The separate class way is much more elegant here since you
do want a difference in behavior, and you have inheritance at your
disposal to prevent duplication.

On Tue, Nov 6, 2012 at 5:12 PM, Bertrand Dechoux <de...@gmail.com> wrote:
> I agree that the behaviour shouldn't be dynamically changed at runtime with
> regard to the class being use as a Combiner or a Reducer but someone may
> want to produce counters in order to have an overview of what is happening
> (sanity check). But you really would like to be able to not aggregate the
> same counters between the Combiner and the Reducer. How would someone do
> that? ie you can introduce a combine/reduce keyword in the counters name but
> how would you detect which instantiation is used in which case? I guess
> somehow with the task name it might be possible.. Is there a better way?
>
> BUT if you look at the jobtracker counters summary there is a distinction
> between map and reduce values. Maybe it is enough in this case? (I have
> never used counters inside a combiner so I don't know.)
>
> Regards
>
> Bertrand
>
>
> On Tue, Nov 6, 2012 at 12:29 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Hi Prasad,
>>
>> My reply inline.
>>
>> On Tue, Nov 6, 2012 at 4:15 PM, Prasad GS <gs...@gmail.com> wrote:
>> > Hi,
>> >
>> > I'm setting my combiner and reducer to the same java class. Is there any
>> > API
>> > that could tell me the context in which the java class is running after
>> > the
>> > hadoop job is submitted to the cluster i.e whether the class is running
>> > as a
>> > combiner or a reducer.
>>
>> A combiner may run both at the map end and at the reduce end. Even if
>> it is possible to do it, it isn't a healthy idea to have the method's
>> logic detect if its running as a reducer or as a combiner.
>>
>> > I need this information to change the OutputCollector
>> > in the java class. Also I do not want to duplicate the same code as
>> > combiner
>> > and reducer with only the OutputCollector changed.
>>
>> Why do you think it would require duplication? Your logic can be built
>> in smaller, independent, reusable functions within the same class, and
>> just applied differently for an implementation of Reducer class and an
>> implementation of the Combiner class. This way, you repeat nothing.
>>
>> > Thanks,
>> > Prasad
>>
>>
>>
>> --
>> Harsh J
>
>
>
>
> --
> Bertrand Dechoux



-- 
Harsh J

Re: combiner/reducer context in java class

Posted by Harsh J <ha...@cloudera.com>.

Hi Bertrand,

I believe the framework does give a few combiner statistics of its own
(like in/out records and such). If your combiner class is separate,
then instantiating counters in it with apt naming should address the
need, since the class itself will be separately instantiated.

Even if we looked at the task ID, its currently hard to tell if its
within a combiner mode or not. I can only think of hacky ways like
polling from within if the combiner input records counter is changing
with each call (then its in combiner) or remains as-is (then its a
reducer). The separate class way is much more elegant here since you
do want a difference in behavior, and you have inheritance at your
disposal to prevent duplication.

On Tue, Nov 6, 2012 at 5:12 PM, Bertrand Dechoux <de...@gmail.com> wrote:
> I agree that the behaviour shouldn't be dynamically changed at runtime with
> regard to the class being use as a Combiner or a Reducer but someone may
> want to produce counters in order to have an overview of what is happening
> (sanity check). But you really would like to be able to not aggregate the
> same counters between the Combiner and the Reducer. How would someone do
> that? ie you can introduce a combine/reduce keyword in the counters name but
> how would you detect which instantiation is used in which case? I guess
> somehow with the task name it might be possible.. Is there a better way?
>
> BUT if you look at the jobtracker counters summary there is a distinction
> between map and reduce values. Maybe it is enough in this case? (I have
> never used counters inside a combiner so I don't know.)
>
> Regards
>
> Bertrand
>
>
> On Tue, Nov 6, 2012 at 12:29 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Hi Prasad,
>>
>> My reply inline.
>>
>> On Tue, Nov 6, 2012 at 4:15 PM, Prasad GS <gs...@gmail.com> wrote:
>> > Hi,
>> >
>> > I'm setting my combiner and reducer to the same java class. Is there any
>> > API
>> > that could tell me the context in which the java class is running after
>> > the
>> > hadoop job is submitted to the cluster i.e whether the class is running
>> > as a
>> > combiner or a reducer.
>>
>> A combiner may run both at the map end and at the reduce end. Even if
>> it is possible to do it, it isn't a healthy idea to have the method's
>> logic detect if its running as a reducer or as a combiner.
>>
>> > I need this information to change the OutputCollector
>> > in the java class. Also I do not want to duplicate the same code as
>> > combiner
>> > and reducer with only the OutputCollector changed.
>>
>> Why do you think it would require duplication? Your logic can be built
>> in smaller, independent, reusable functions within the same class, and
>> just applied differently for an implementation of Reducer class and an
>> implementation of the Combiner class. This way, you repeat nothing.
>>
>> > Thanks,
>> > Prasad
>>
>>
>>
>> --
>> Harsh J
>
>
>
>
> --
> Bertrand Dechoux



-- 
Harsh J

Re: combiner/reducer context in java class

Posted by Bertrand Dechoux <de...@gmail.com>.

I agree that the behaviour shouldn't be dynamically changed at runtime with
regard to the class being use as a Combiner or a Reducer but someone may
want to produce counters in order to have an overview of what is happening
(sanity check). But you really would like to be able to not aggregate the
same counters between the Combiner and the Reducer. How would someone do
that? ie you can introduce a combine/reduce keyword in the counters name
but how would you detect which instantiation is used in which case? I guess
somehow with the task name it might be possible.. Is there a better way?

BUT if you look at the jobtracker counters summary there is a distinction
between map and reduce values. Maybe it is enough in this case? (I have
never used counters inside a combiner so I don't know.)

Regards

Bertrand

On Tue, Nov 6, 2012 at 12:29 PM, Harsh J <ha...@cloudera.com> wrote:

> Hi Prasad,
>
> My reply inline.
>
> On Tue, Nov 6, 2012 at 4:15 PM, Prasad GS <gs...@gmail.com> wrote:
> > Hi,
> >
> > I'm setting my combiner and reducer to the same java class. Is there any
> API
> > that could tell me the context in which the java class is running after
> the
> > hadoop job is submitted to the cluster i.e whether the class is running
> as a
> > combiner or a reducer.
>
> A combiner may run both at the map end and at the reduce end. Even if
> it is possible to do it, it isn't a healthy idea to have the method's
> logic detect if its running as a reducer or as a combiner.
>
> > I need this information to change the OutputCollector
> > in the java class. Also I do not want to duplicate the same code as
> combiner
> > and reducer with only the OutputCollector changed.
>
> Why do you think it would require duplication? Your logic can be built
> in smaller, independent, reusable functions within the same class, and
> just applied differently for an implementation of Reducer class and an
> implementation of the Combiner class. This way, you repeat nothing.
>
> > Thanks,
> > Prasad
>
>
>
> --
> Harsh J
>

-- 
Bertrand Dechoux

Re: combiner/reducer context in java class

Posted by Bertrand Dechoux <de...@gmail.com>.

I agree that the behaviour shouldn't be dynamically changed at runtime with
regard to the class being use as a Combiner or a Reducer but someone may
want to produce counters in order to have an overview of what is happening
(sanity check). But you really would like to be able to not aggregate the
same counters between the Combiner and the Reducer. How would someone do
that? ie you can introduce a combine/reduce keyword in the counters name
but how would you detect which instantiation is used in which case? I guess
somehow with the task name it might be possible.. Is there a better way?

BUT if you look at the jobtracker counters summary there is a distinction
between map and reduce values. Maybe it is enough in this case? (I have
never used counters inside a combiner so I don't know.)

Regards

Bertrand

On Tue, Nov 6, 2012 at 12:29 PM, Harsh J <ha...@cloudera.com> wrote:

> Hi Prasad,
>
> My reply inline.
>
> On Tue, Nov 6, 2012 at 4:15 PM, Prasad GS <gs...@gmail.com> wrote:
> > Hi,
> >
> > I'm setting my combiner and reducer to the same java class. Is there any
> API
> > that could tell me the context in which the java class is running after
> the
> > hadoop job is submitted to the cluster i.e whether the class is running
> as a
> > combiner or a reducer.
>
> A combiner may run both at the map end and at the reduce end. Even if
> it is possible to do it, it isn't a healthy idea to have the method's
> logic detect if its running as a reducer or as a combiner.
>
> > I need this information to change the OutputCollector
> > in the java class. Also I do not want to duplicate the same code as
> combiner
> > and reducer with only the OutputCollector changed.
>
> Why do you think it would require duplication? Your logic can be built
> in smaller, independent, reusable functions within the same class, and
> just applied differently for an implementation of Reducer class and an
> implementation of the Combiner class. This way, you repeat nothing.
>
> > Thanks,
> > Prasad
>
>
>
> --
> Harsh J
>

-- 
Bertrand Dechoux

Re: combiner/reducer context in java class

Posted by Bertrand Dechoux <de...@gmail.com>.

I agree that the behaviour shouldn't be dynamically changed at runtime with
regard to the class being use as a Combiner or a Reducer but someone may
want to produce counters in order to have an overview of what is happening
(sanity check). But you really would like to be able to not aggregate the
same counters between the Combiner and the Reducer. How would someone do
that? ie you can introduce a combine/reduce keyword in the counters name
but how would you detect which instantiation is used in which case? I guess
somehow with the task name it might be possible.. Is there a better way?

BUT if you look at the jobtracker counters summary there is a distinction
between map and reduce values. Maybe it is enough in this case? (I have
never used counters inside a combiner so I don't know.)

Regards

Bertrand

On Tue, Nov 6, 2012 at 12:29 PM, Harsh J <ha...@cloudera.com> wrote:

> Hi Prasad,
>
> My reply inline.
>
> On Tue, Nov 6, 2012 at 4:15 PM, Prasad GS <gs...@gmail.com> wrote:
> > Hi,
> >
> > I'm setting my combiner and reducer to the same java class. Is there any
> API
> > that could tell me the context in which the java class is running after
> the
> > hadoop job is submitted to the cluster i.e whether the class is running
> as a
> > combiner or a reducer.
>
> A combiner may run both at the map end and at the reduce end. Even if
> it is possible to do it, it isn't a healthy idea to have the method's
> logic detect if its running as a reducer or as a combiner.
>
> > I need this information to change the OutputCollector
> > in the java class. Also I do not want to duplicate the same code as
> combiner
> > and reducer with only the OutputCollector changed.
>
> Why do you think it would require duplication? Your logic can be built
> in smaller, independent, reusable functions within the same class, and
> just applied differently for an implementation of Reducer class and an
> implementation of the Combiner class. This way, you repeat nothing.
>
> > Thanks,
> > Prasad
>
>
>
> --
> Harsh J
>

-- 
Bertrand Dechoux

Re: combiner/reducer context in java class

Posted by Bertrand Dechoux <de...@gmail.com>.

I agree that the behaviour shouldn't be dynamically changed at runtime with
regard to the class being use as a Combiner or a Reducer but someone may
want to produce counters in order to have an overview of what is happening
(sanity check). But you really would like to be able to not aggregate the
same counters between the Combiner and the Reducer. How would someone do
that? ie you can introduce a combine/reduce keyword in the counters name
but how would you detect which instantiation is used in which case? I guess
somehow with the task name it might be possible.. Is there a better way?

BUT if you look at the jobtracker counters summary there is a distinction
between map and reduce values. Maybe it is enough in this case? (I have
never used counters inside a combiner so I don't know.)

Regards

Bertrand

On Tue, Nov 6, 2012 at 12:29 PM, Harsh J <ha...@cloudera.com> wrote:

> Hi Prasad,
>
> My reply inline.
>
> On Tue, Nov 6, 2012 at 4:15 PM, Prasad GS <gs...@gmail.com> wrote:
> > Hi,
> >
> > I'm setting my combiner and reducer to the same java class. Is there any
> API
> > that could tell me the context in which the java class is running after
> the
> > hadoop job is submitted to the cluster i.e whether the class is running
> as a
> > combiner or a reducer.
>
> A combiner may run both at the map end and at the reduce end. Even if
> it is possible to do it, it isn't a healthy idea to have the method's
> logic detect if its running as a reducer or as a combiner.
>
> > I need this information to change the OutputCollector
> > in the java class. Also I do not want to duplicate the same code as
> combiner
> > and reducer with only the OutputCollector changed.
>
> Why do you think it would require duplication? Your logic can be built
> in smaller, independent, reusable functions within the same class, and
> just applied differently for an implementation of Reducer class and an
> implementation of the Combiner class. This way, you repeat nothing.
>
> > Thanks,
> > Prasad
>
>
>
> --
> Harsh J
>

-- 
Bertrand Dechoux

Re: combiner/reducer context in java class

Posted by Harsh J <ha...@cloudera.com>.

Hi Prasad,

My reply inline.

On Tue, Nov 6, 2012 at 4:15 PM, Prasad GS <gs...@gmail.com> wrote:
> Hi,
>
> I'm setting my combiner and reducer to the same java class. Is there any API
> that could tell me the context in which the java class is running after the
> hadoop job is submitted to the cluster i.e whether the class is running as a
> combiner or a reducer.

A combiner may run both at the map end and at the reduce end. Even if
it is possible to do it, it isn't a healthy idea to have the method's
logic detect if its running as a reducer or as a combiner.

> I need this information to change the OutputCollector
> in the java class. Also I do not want to duplicate the same code as combiner
> and reducer with only the OutputCollector changed.

Why do you think it would require duplication? Your logic can be built
in smaller, independent, reusable functions within the same class, and
just applied differently for an implementation of Reducer class and an
implementation of the Combiner class. This way, you repeat nothing.

> Thanks,
> Prasad

-- 
Harsh J

Re: combiner/reducer context in java class

Posted by Harsh J <ha...@cloudera.com>.

Hi Prasad,

My reply inline.

On Tue, Nov 6, 2012 at 4:15 PM, Prasad GS <gs...@gmail.com> wrote:
> Hi,
>
> I'm setting my combiner and reducer to the same java class. Is there any API
> that could tell me the context in which the java class is running after the
> hadoop job is submitted to the cluster i.e whether the class is running as a
> combiner or a reducer.

A combiner may run both at the map end and at the reduce end. Even if
it is possible to do it, it isn't a healthy idea to have the method's
logic detect if its running as a reducer or as a combiner.

> I need this information to change the OutputCollector
> in the java class. Also I do not want to duplicate the same code as combiner
> and reducer with only the OutputCollector changed.

Why do you think it would require duplication? Your logic can be built
in smaller, independent, reusable functions within the same class, and
just applied differently for an implementation of Reducer class and an
implementation of the Combiner class. This way, you repeat nothing.

> Thanks,
> Prasad

-- 
Harsh J

Re: combiner/reducer context in java class

Posted by Harsh J <ha...@cloudera.com>.

Hi Prasad,

My reply inline.

On Tue, Nov 6, 2012 at 4:15 PM, Prasad GS <gs...@gmail.com> wrote:
> Hi,
>
> I'm setting my combiner and reducer to the same java class. Is there any API
> that could tell me the context in which the java class is running after the
> hadoop job is submitted to the cluster i.e whether the class is running as a
> combiner or a reducer.

A combiner may run both at the map end and at the reduce end. Even if
it is possible to do it, it isn't a healthy idea to have the method's
logic detect if its running as a reducer or as a combiner.

> I need this information to change the OutputCollector
> in the java class. Also I do not want to duplicate the same code as combiner
> and reducer with only the OutputCollector changed.

Why do you think it would require duplication? Your logic can be built
in smaller, independent, reusable functions within the same class, and
just applied differently for an implementation of Reducer class and an
implementation of the Combiner class. This way, you repeat nothing.

> Thanks,
> Prasad

-- 
Harsh J

Re: combiner/reducer context in java class

Posted by Harsh J <ha...@cloudera.com>.

Hi Prasad,

My reply inline.

On Tue, Nov 6, 2012 at 4:15 PM, Prasad GS <gs...@gmail.com> wrote:
> Hi,
>
> I'm setting my combiner and reducer to the same java class. Is there any API
> that could tell me the context in which the java class is running after the
> hadoop job is submitted to the cluster i.e whether the class is running as a
> combiner or a reducer.

A combiner may run both at the map end and at the reduce end. Even if
it is possible to do it, it isn't a healthy idea to have the method's
logic detect if its running as a reducer or as a combiner.

> I need this information to change the OutputCollector
> in the java class. Also I do not want to duplicate the same code as combiner
> and reducer with only the OutputCollector changed.

Why do you think it would require duplication? Your logic can be built
in smaller, independent, reusable functions within the same class, and
just applied differently for an implementation of Reducer class and an
implementation of the Combiner class. This way, you repeat nothing.

> Thanks,
> Prasad

-- 
Harsh J