You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Sigurd Spieckermann <si...@gmail.com> on 2012/09/28 15:32:33 UTC

Usefulness of ChainMapper/ChainReducer

Hi guys,

I have stumbled upon ChainMapper and ChainReducer and I am wondering why
they exist. I imagine that everything you can implement with ChainMapper
and ChainReducer can be implemented with just a Mapper and a Reducer
containing all the code of the respective chain-implementations. Or am I
missing certain aspects about why they are more than just convenience
concepts?

Thanks for clarifying this!
Sigurd

Re: Usefulness of ChainMapper/ChainReducer

Posted by John Armstrong <jr...@ccri.com>.
On Fri 28 Sep 2012 09:39:13 AM EDT, Harsh J wrote:
> Modularity!

Exactly! Write a mapper that operates as a filter on something about 
your keys, then use it in whatever jobs you want.  Your job needs to 
operate on data subset A? chain it with the filter mapper that picks 
out A.  Your next one needs to operate on subset B? chain it with the 
filter that picks out B!


Re: Usefulness of ChainMapper/ChainReducer

Posted by John Armstrong <jr...@ccri.com>.
On Fri 28 Sep 2012 09:39:13 AM EDT, Harsh J wrote:
> Modularity!

Exactly! Write a mapper that operates as a filter on something about 
your keys, then use it in whatever jobs you want.  Your job needs to 
operate on data subset A? chain it with the filter mapper that picks 
out A.  Your next one needs to operate on subset B? chain it with the 
filter that picks out B!


Re: Usefulness of ChainMapper/ChainReducer

Posted by John Armstrong <jr...@ccri.com>.
On Fri 28 Sep 2012 09:39:13 AM EDT, Harsh J wrote:
> Modularity!

Exactly! Write a mapper that operates as a filter on something about 
your keys, then use it in whatever jobs you want.  Your job needs to 
operate on data subset A? chain it with the filter mapper that picks 
out A.  Your next one needs to operate on subset B? chain it with the 
filter that picks out B!


Re: Usefulness of ChainMapper/ChainReducer

Posted by John Armstrong <jr...@ccri.com>.
On Fri 28 Sep 2012 09:39:13 AM EDT, Harsh J wrote:
> Modularity!

Exactly! Write a mapper that operates as a filter on something about 
your keys, then use it in whatever jobs you want.  Your job needs to 
operate on data subset A? chain it with the filter mapper that picks 
out A.  Your next one needs to operate on subset B? chain it with the 
filter that picks out B!


Re: Usefulness of ChainMapper/ChainReducer

Posted by Harsh J <ha...@cloudera.com>.
Hi,

Modularity!

I've always had the same question before. However, Tom White put that
thought to rest:

"It’s possible to make map and reduce functions even more composable
than we have done. A mapper commonly performs input format parsing,
projection (selecting the relevant fields), and filtering (removing
records that are not of interest). In the mappers you have seen so
far, we have implemented all of these functions in a single mapper.
However, there is a case for splitting these into distinct mappers and
chaining them into a single mapper using the ChainMapper library class
that comes with Hadoop. Combined with a ChainReducer, you can run a
chain of mappers, followed by a reducer and another chain of mappers
in a single MapReduce job." - Tom White, Hadoop: Definitive Guide (2nd
Ed.)

Personally though, I've not really used it much. They aren't anything
more than convenience methods. Not "real" chaining at the framework
level.

On Fri, Sep 28, 2012 at 7:02 PM, Sigurd Spieckermann
<si...@gmail.com> wrote:
> Hi guys,
>
> I have stumbled upon ChainMapper and ChainReducer and I am wondering why
> they exist. I imagine that everything you can implement with ChainMapper and
> ChainReducer can be implemented with just a Mapper and a Reducer containing
> all the code of the respective chain-implementations. Or am I missing
> certain aspects about why they are more than just convenience concepts?
>
> Thanks for clarifying this!
> Sigurd



-- 
Harsh J

Re: Usefulness of ChainMapper/ChainReducer

Posted by Harsh J <ha...@cloudera.com>.
Hi,

Modularity!

I've always had the same question before. However, Tom White put that
thought to rest:

"It’s possible to make map and reduce functions even more composable
than we have done. A mapper commonly performs input format parsing,
projection (selecting the relevant fields), and filtering (removing
records that are not of interest). In the mappers you have seen so
far, we have implemented all of these functions in a single mapper.
However, there is a case for splitting these into distinct mappers and
chaining them into a single mapper using the ChainMapper library class
that comes with Hadoop. Combined with a ChainReducer, you can run a
chain of mappers, followed by a reducer and another chain of mappers
in a single MapReduce job." - Tom White, Hadoop: Definitive Guide (2nd
Ed.)

Personally though, I've not really used it much. They aren't anything
more than convenience methods. Not "real" chaining at the framework
level.

On Fri, Sep 28, 2012 at 7:02 PM, Sigurd Spieckermann
<si...@gmail.com> wrote:
> Hi guys,
>
> I have stumbled upon ChainMapper and ChainReducer and I am wondering why
> they exist. I imagine that everything you can implement with ChainMapper and
> ChainReducer can be implemented with just a Mapper and a Reducer containing
> all the code of the respective chain-implementations. Or am I missing
> certain aspects about why they are more than just convenience concepts?
>
> Thanks for clarifying this!
> Sigurd



-- 
Harsh J

Re: Usefulness of ChainMapper/ChainReducer

Posted by Harsh J <ha...@cloudera.com>.
Hi,

Modularity!

I've always had the same question before. However, Tom White put that
thought to rest:

"It’s possible to make map and reduce functions even more composable
than we have done. A mapper commonly performs input format parsing,
projection (selecting the relevant fields), and filtering (removing
records that are not of interest). In the mappers you have seen so
far, we have implemented all of these functions in a single mapper.
However, there is a case for splitting these into distinct mappers and
chaining them into a single mapper using the ChainMapper library class
that comes with Hadoop. Combined with a ChainReducer, you can run a
chain of mappers, followed by a reducer and another chain of mappers
in a single MapReduce job." - Tom White, Hadoop: Definitive Guide (2nd
Ed.)

Personally though, I've not really used it much. They aren't anything
more than convenience methods. Not "real" chaining at the framework
level.

On Fri, Sep 28, 2012 at 7:02 PM, Sigurd Spieckermann
<si...@gmail.com> wrote:
> Hi guys,
>
> I have stumbled upon ChainMapper and ChainReducer and I am wondering why
> they exist. I imagine that everything you can implement with ChainMapper and
> ChainReducer can be implemented with just a Mapper and a Reducer containing
> all the code of the respective chain-implementations. Or am I missing
> certain aspects about why they are more than just convenience concepts?
>
> Thanks for clarifying this!
> Sigurd



-- 
Harsh J

Re: Usefulness of ChainMapper/ChainReducer

Posted by Harsh J <ha...@cloudera.com>.
Hi,

Modularity!

I've always had the same question before. However, Tom White put that
thought to rest:

"It’s possible to make map and reduce functions even more composable
than we have done. A mapper commonly performs input format parsing,
projection (selecting the relevant fields), and filtering (removing
records that are not of interest). In the mappers you have seen so
far, we have implemented all of these functions in a single mapper.
However, there is a case for splitting these into distinct mappers and
chaining them into a single mapper using the ChainMapper library class
that comes with Hadoop. Combined with a ChainReducer, you can run a
chain of mappers, followed by a reducer and another chain of mappers
in a single MapReduce job." - Tom White, Hadoop: Definitive Guide (2nd
Ed.)

Personally though, I've not really used it much. They aren't anything
more than convenience methods. Not "real" chaining at the framework
level.

On Fri, Sep 28, 2012 at 7:02 PM, Sigurd Spieckermann
<si...@gmail.com> wrote:
> Hi guys,
>
> I have stumbled upon ChainMapper and ChainReducer and I am wondering why
> they exist. I imagine that everything you can implement with ChainMapper and
> ChainReducer can be implemented with just a Mapper and a Reducer containing
> all the code of the respective chain-implementations. Or am I missing
> certain aspects about why they are more than just convenience concepts?
>
> Thanks for clarifying this!
> Sigurd



-- 
Harsh J