You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Sergey Gerasimov <ge...@mlab.cs.msu.su> on 2013/12/02 14:58:47 UTC

passing configuration parameter to comparator

Hello,

 

What is the best way to pass job configuration parameter to class like
GroupingComparator which is instantiated by hadoop. I know there is setup
method in map class and probably I can initialize some static variable in
setup and use it in GroupingComparator, not sure that is correct (not sure
there is guarantee that GroupingComparator will be instantiated after first
call of map on this node) But what is preferred pattern for the case? Maybe
there is some unified way to access job config from anywhere?

 

Thanks!

Sergey.


Re: passing configuration parameter to comparator

Posted by Harsh J <ha...@cloudera.com>.
The comparators are also initialised via the ReflectionUtils code, so
they do try to pass configuration onto the instantiated object if the
class implements the org.apache.hadoop.conf.Configurable interface or
extends the org.apache.hadoop.conf.Configured class (which implements
the interface for you). This will let you access the configuration.

As to the order, the grouping comparator is used only on the reducer
side (see also [1]) and is therefore invoked before the first reduce()
is run. As to the KeyComparator for intermediate data (and combiners),
yes it is initialised before the first map() call.

[1] - A combiner, which runs on the map side, has so far never invoked
the GroupingComparator class, hence the statement that a mapper may
never invoke it. However,
https://issues.apache.org/jira/browse/MAPREDUCE-3310 may alter this
current behaviour (if a combiner is explicitly involved).

On Mon, Dec 2, 2013 at 7:28 PM, Sergey Gerasimov
<ge...@mlab.cs.msu.su> wrote:
> Hello,
>
>
>
> What is the best way to pass job configuration parameter to class like
> GroupingComparator which is instantiated by hadoop. I know there is setup
> method in map class and probably I can initialize some static variable in
> setup and use it in GroupingComparator, not sure that is correct (not sure
> there is guarantee that GroupingComparator will be instantiated after first
> call of map on this node) But what is preferred pattern for the case? Maybe
> there is some unified way to access job config from anywhere?
>
>
>
> Thanks!
>
> Sergey.



-- 
Harsh J

Re: passing configuration parameter to comparator

Posted by Harsh J <ha...@cloudera.com>.
The comparators are also initialised via the ReflectionUtils code, so
they do try to pass configuration onto the instantiated object if the
class implements the org.apache.hadoop.conf.Configurable interface or
extends the org.apache.hadoop.conf.Configured class (which implements
the interface for you). This will let you access the configuration.

As to the order, the grouping comparator is used only on the reducer
side (see also [1]) and is therefore invoked before the first reduce()
is run. As to the KeyComparator for intermediate data (and combiners),
yes it is initialised before the first map() call.

[1] - A combiner, which runs on the map side, has so far never invoked
the GroupingComparator class, hence the statement that a mapper may
never invoke it. However,
https://issues.apache.org/jira/browse/MAPREDUCE-3310 may alter this
current behaviour (if a combiner is explicitly involved).

On Mon, Dec 2, 2013 at 7:28 PM, Sergey Gerasimov
<ge...@mlab.cs.msu.su> wrote:
> Hello,
>
>
>
> What is the best way to pass job configuration parameter to class like
> GroupingComparator which is instantiated by hadoop. I know there is setup
> method in map class and probably I can initialize some static variable in
> setup and use it in GroupingComparator, not sure that is correct (not sure
> there is guarantee that GroupingComparator will be instantiated after first
> call of map on this node) But what is preferred pattern for the case? Maybe
> there is some unified way to access job config from anywhere?
>
>
>
> Thanks!
>
> Sergey.



-- 
Harsh J

Re: passing configuration parameter to comparator

Posted by Harsh J <ha...@cloudera.com>.
The comparators are also initialised via the ReflectionUtils code, so
they do try to pass configuration onto the instantiated object if the
class implements the org.apache.hadoop.conf.Configurable interface or
extends the org.apache.hadoop.conf.Configured class (which implements
the interface for you). This will let you access the configuration.

As to the order, the grouping comparator is used only on the reducer
side (see also [1]) and is therefore invoked before the first reduce()
is run. As to the KeyComparator for intermediate data (and combiners),
yes it is initialised before the first map() call.

[1] - A combiner, which runs on the map side, has so far never invoked
the GroupingComparator class, hence the statement that a mapper may
never invoke it. However,
https://issues.apache.org/jira/browse/MAPREDUCE-3310 may alter this
current behaviour (if a combiner is explicitly involved).

On Mon, Dec 2, 2013 at 7:28 PM, Sergey Gerasimov
<ge...@mlab.cs.msu.su> wrote:
> Hello,
>
>
>
> What is the best way to pass job configuration parameter to class like
> GroupingComparator which is instantiated by hadoop. I know there is setup
> method in map class and probably I can initialize some static variable in
> setup and use it in GroupingComparator, not sure that is correct (not sure
> there is guarantee that GroupingComparator will be instantiated after first
> call of map on this node) But what is preferred pattern for the case? Maybe
> there is some unified way to access job config from anywhere?
>
>
>
> Thanks!
>
> Sergey.



-- 
Harsh J

Re: passing configuration parameter to comparator

Posted by Harsh J <ha...@cloudera.com>.
The comparators are also initialised via the ReflectionUtils code, so
they do try to pass configuration onto the instantiated object if the
class implements the org.apache.hadoop.conf.Configurable interface or
extends the org.apache.hadoop.conf.Configured class (which implements
the interface for you). This will let you access the configuration.

As to the order, the grouping comparator is used only on the reducer
side (see also [1]) and is therefore invoked before the first reduce()
is run. As to the KeyComparator for intermediate data (and combiners),
yes it is initialised before the first map() call.

[1] - A combiner, which runs on the map side, has so far never invoked
the GroupingComparator class, hence the statement that a mapper may
never invoke it. However,
https://issues.apache.org/jira/browse/MAPREDUCE-3310 may alter this
current behaviour (if a combiner is explicitly involved).

On Mon, Dec 2, 2013 at 7:28 PM, Sergey Gerasimov
<ge...@mlab.cs.msu.su> wrote:
> Hello,
>
>
>
> What is the best way to pass job configuration parameter to class like
> GroupingComparator which is instantiated by hadoop. I know there is setup
> method in map class and probably I can initialize some static variable in
> setup and use it in GroupingComparator, not sure that is correct (not sure
> there is guarantee that GroupingComparator will be instantiated after first
> call of map on this node) But what is preferred pattern for the case? Maybe
> there is some unified way to access job config from anywhere?
>
>
>
> Thanks!
>
> Sergey.



-- 
Harsh J