You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by Andrew Pennebaker <ap...@42six.com> on 2013/08/27 17:35:19 UTC

Simplifying MapReduce API

There seems to be an abundance of boilerplate patterns in MapReduce:

* Write a class extending Map (1), implementing Mapper (2), with a map
method (3)
* Write a class extending Reduce (4), implementing Reducer (5), with a
reduce method (6)

Could we achieve the same behavior with a single Job interface requiring
map() and reduce() methods?

Re: Simplifying MapReduce API

Posted by Mohammad Tariq <do...@gmail.com>.

Just to add to the above comments, you just have to extend the classes *
Mapper* and *Reducer* as per the new API.

Warm Regards,
Tariq
cloudfront.blogspot.com


On Wed, Aug 28, 2013 at 1:26 AM, Don Nelson <di...@gmail.com> wrote:

> I agree with @Shahab - it's simple enough to declare both interfaces in
> one class if that's what you want to do.  But given the distributed
> behavior of Hadoop, it's likely that your mappers will be running on
> different nodes than your reducers anyway - why ship around duplicate code?
>
>
> On Tue, Aug 27, 2013 at 9:48 AM, Shahab Yunus <sh...@gmail.com>wrote:
>
>> For starters (experts might have more complex reasons), what if your
>> respective map and reduce logic becomes complex enough to demand separate
>> classes? Why tie the clients to implement both by moving these in one Job
>> interface. In the current design you can always implement both (map and
>> reduce) interfaces if your logic is simple enough and go the other route,
>> of separate classes if that is required. I think it is more flexible this
>> way (you can always build up from and on top of granular design, rather
>> than other way around.)
>>
>> I hope I understood your concern correctly...
>>
>> Regards,
>> Shahab
>>
>>
>> On Tue, Aug 27, 2013 at 11:35 AM, Andrew Pennebaker <
>> apennebaker@42six.com> wrote:
>>
>>> There seems to be an abundance of boilerplate patterns in MapReduce:
>>>
>>> * Write a class extending Map (1), implementing Mapper (2), with a map
>>> method (3)
>>> * Write a class extending Reduce (4), implementing Reducer (5), with a
>>> reduce method (6)
>>>
>>> Could we achieve the same behavior with a single Job interface requiring
>>> map() and reduce() methods?
>>>
>>
>>
>
>
> --
>
> "A child of five could understand this.  Fetch me a child of five."
>

Re: Simplifying MapReduce API

Posted by Mohammad Tariq <do...@gmail.com>.

Just to add to the above comments, you just have to extend the classes *
Mapper* and *Reducer* as per the new API.

Warm Regards,
Tariq
cloudfront.blogspot.com


On Wed, Aug 28, 2013 at 1:26 AM, Don Nelson <di...@gmail.com> wrote:

> I agree with @Shahab - it's simple enough to declare both interfaces in
> one class if that's what you want to do.  But given the distributed
> behavior of Hadoop, it's likely that your mappers will be running on
> different nodes than your reducers anyway - why ship around duplicate code?
>
>
> On Tue, Aug 27, 2013 at 9:48 AM, Shahab Yunus <sh...@gmail.com>wrote:
>
>> For starters (experts might have more complex reasons), what if your
>> respective map and reduce logic becomes complex enough to demand separate
>> classes? Why tie the clients to implement both by moving these in one Job
>> interface. In the current design you can always implement both (map and
>> reduce) interfaces if your logic is simple enough and go the other route,
>> of separate classes if that is required. I think it is more flexible this
>> way (you can always build up from and on top of granular design, rather
>> than other way around.)
>>
>> I hope I understood your concern correctly...
>>
>> Regards,
>> Shahab
>>
>>
>> On Tue, Aug 27, 2013 at 11:35 AM, Andrew Pennebaker <
>> apennebaker@42six.com> wrote:
>>
>>> There seems to be an abundance of boilerplate patterns in MapReduce:
>>>
>>> * Write a class extending Map (1), implementing Mapper (2), with a map
>>> method (3)
>>> * Write a class extending Reduce (4), implementing Reducer (5), with a
>>> reduce method (6)
>>>
>>> Could we achieve the same behavior with a single Job interface requiring
>>> map() and reduce() methods?
>>>
>>
>>
>
>
> --
>
> "A child of five could understand this.  Fetch me a child of five."
>

Re: Simplifying MapReduce API

Posted by Mohammad Tariq <do...@gmail.com>.

Just to add to the above comments, you just have to extend the classes *
Mapper* and *Reducer* as per the new API.

Warm Regards,
Tariq
cloudfront.blogspot.com


On Wed, Aug 28, 2013 at 1:26 AM, Don Nelson <di...@gmail.com> wrote:

> I agree with @Shahab - it's simple enough to declare both interfaces in
> one class if that's what you want to do.  But given the distributed
> behavior of Hadoop, it's likely that your mappers will be running on
> different nodes than your reducers anyway - why ship around duplicate code?
>
>
> On Tue, Aug 27, 2013 at 9:48 AM, Shahab Yunus <sh...@gmail.com>wrote:
>
>> For starters (experts might have more complex reasons), what if your
>> respective map and reduce logic becomes complex enough to demand separate
>> classes? Why tie the clients to implement both by moving these in one Job
>> interface. In the current design you can always implement both (map and
>> reduce) interfaces if your logic is simple enough and go the other route,
>> of separate classes if that is required. I think it is more flexible this
>> way (you can always build up from and on top of granular design, rather
>> than other way around.)
>>
>> I hope I understood your concern correctly...
>>
>> Regards,
>> Shahab
>>
>>
>> On Tue, Aug 27, 2013 at 11:35 AM, Andrew Pennebaker <
>> apennebaker@42six.com> wrote:
>>
>>> There seems to be an abundance of boilerplate patterns in MapReduce:
>>>
>>> * Write a class extending Map (1), implementing Mapper (2), with a map
>>> method (3)
>>> * Write a class extending Reduce (4), implementing Reducer (5), with a
>>> reduce method (6)
>>>
>>> Could we achieve the same behavior with a single Job interface requiring
>>> map() and reduce() methods?
>>>
>>
>>
>
>
> --
>
> "A child of five could understand this.  Fetch me a child of five."
>

Re: Simplifying MapReduce API

Posted by Mohammad Tariq <do...@gmail.com>.

Just to add to the above comments, you just have to extend the classes *
Mapper* and *Reducer* as per the new API.

Warm Regards,
Tariq
cloudfront.blogspot.com


On Wed, Aug 28, 2013 at 1:26 AM, Don Nelson <di...@gmail.com> wrote:

> I agree with @Shahab - it's simple enough to declare both interfaces in
> one class if that's what you want to do.  But given the distributed
> behavior of Hadoop, it's likely that your mappers will be running on
> different nodes than your reducers anyway - why ship around duplicate code?
>
>
> On Tue, Aug 27, 2013 at 9:48 AM, Shahab Yunus <sh...@gmail.com>wrote:
>
>> For starters (experts might have more complex reasons), what if your
>> respective map and reduce logic becomes complex enough to demand separate
>> classes? Why tie the clients to implement both by moving these in one Job
>> interface. In the current design you can always implement both (map and
>> reduce) interfaces if your logic is simple enough and go the other route,
>> of separate classes if that is required. I think it is more flexible this
>> way (you can always build up from and on top of granular design, rather
>> than other way around.)
>>
>> I hope I understood your concern correctly...
>>
>> Regards,
>> Shahab
>>
>>
>> On Tue, Aug 27, 2013 at 11:35 AM, Andrew Pennebaker <
>> apennebaker@42six.com> wrote:
>>
>>> There seems to be an abundance of boilerplate patterns in MapReduce:
>>>
>>> * Write a class extending Map (1), implementing Mapper (2), with a map
>>> method (3)
>>> * Write a class extending Reduce (4), implementing Reducer (5), with a
>>> reduce method (6)
>>>
>>> Could we achieve the same behavior with a single Job interface requiring
>>> map() and reduce() methods?
>>>
>>
>>
>
>
> --
>
> "A child of five could understand this.  Fetch me a child of five."
>

Re: Simplifying MapReduce API

Posted by Don Nelson <di...@gmail.com>.

I agree with @Shahab - it's simple enough to declare both interfaces in one
class if that's what you want to do.  But given the distributed behavior of
Hadoop, it's likely that your mappers will be running on different nodes
than your reducers anyway - why ship around duplicate code?


On Tue, Aug 27, 2013 at 9:48 AM, Shahab Yunus <sh...@gmail.com>wrote:

> For starters (experts might have more complex reasons), what if your
> respective map and reduce logic becomes complex enough to demand separate
> classes? Why tie the clients to implement both by moving these in one Job
> interface. In the current design you can always implement both (map and
> reduce) interfaces if your logic is simple enough and go the other route,
> of separate classes if that is required. I think it is more flexible this
> way (you can always build up from and on top of granular design, rather
> than other way around.)
>
> I hope I understood your concern correctly...
>
> Regards,
> Shahab
>
>
> On Tue, Aug 27, 2013 at 11:35 AM, Andrew Pennebaker <apennebaker@42six.com
> > wrote:
>
>> There seems to be an abundance of boilerplate patterns in MapReduce:
>>
>> * Write a class extending Map (1), implementing Mapper (2), with a map
>> method (3)
>> * Write a class extending Reduce (4), implementing Reducer (5), with a
>> reduce method (6)
>>
>> Could we achieve the same behavior with a single Job interface requiring
>> map() and reduce() methods?
>>
>
>


-- 

"A child of five could understand this.  Fetch me a child of five."

Re: Simplifying MapReduce API

Posted by Don Nelson <di...@gmail.com>.

I agree with @Shahab - it's simple enough to declare both interfaces in one
class if that's what you want to do.  But given the distributed behavior of
Hadoop, it's likely that your mappers will be running on different nodes
than your reducers anyway - why ship around duplicate code?


On Tue, Aug 27, 2013 at 9:48 AM, Shahab Yunus <sh...@gmail.com>wrote:

> For starters (experts might have more complex reasons), what if your
> respective map and reduce logic becomes complex enough to demand separate
> classes? Why tie the clients to implement both by moving these in one Job
> interface. In the current design you can always implement both (map and
> reduce) interfaces if your logic is simple enough and go the other route,
> of separate classes if that is required. I think it is more flexible this
> way (you can always build up from and on top of granular design, rather
> than other way around.)
>
> I hope I understood your concern correctly...
>
> Regards,
> Shahab
>
>
> On Tue, Aug 27, 2013 at 11:35 AM, Andrew Pennebaker <apennebaker@42six.com
> > wrote:
>
>> There seems to be an abundance of boilerplate patterns in MapReduce:
>>
>> * Write a class extending Map (1), implementing Mapper (2), with a map
>> method (3)
>> * Write a class extending Reduce (4), implementing Reducer (5), with a
>> reduce method (6)
>>
>> Could we achieve the same behavior with a single Job interface requiring
>> map() and reduce() methods?
>>
>
>


-- 

"A child of five could understand this.  Fetch me a child of five."

Re: Simplifying MapReduce API

Posted by Don Nelson <di...@gmail.com>.

I agree with @Shahab - it's simple enough to declare both interfaces in one
class if that's what you want to do.  But given the distributed behavior of
Hadoop, it's likely that your mappers will be running on different nodes
than your reducers anyway - why ship around duplicate code?


On Tue, Aug 27, 2013 at 9:48 AM, Shahab Yunus <sh...@gmail.com>wrote:

> For starters (experts might have more complex reasons), what if your
> respective map and reduce logic becomes complex enough to demand separate
> classes? Why tie the clients to implement both by moving these in one Job
> interface. In the current design you can always implement both (map and
> reduce) interfaces if your logic is simple enough and go the other route,
> of separate classes if that is required. I think it is more flexible this
> way (you can always build up from and on top of granular design, rather
> than other way around.)
>
> I hope I understood your concern correctly...
>
> Regards,
> Shahab
>
>
> On Tue, Aug 27, 2013 at 11:35 AM, Andrew Pennebaker <apennebaker@42six.com
> > wrote:
>
>> There seems to be an abundance of boilerplate patterns in MapReduce:
>>
>> * Write a class extending Map (1), implementing Mapper (2), with a map
>> method (3)
>> * Write a class extending Reduce (4), implementing Reducer (5), with a
>> reduce method (6)
>>
>> Could we achieve the same behavior with a single Job interface requiring
>> map() and reduce() methods?
>>
>
>


-- 

"A child of five could understand this.  Fetch me a child of five."

Re: Simplifying MapReduce API

Posted by Don Nelson <di...@gmail.com>.

I agree with @Shahab - it's simple enough to declare both interfaces in one
class if that's what you want to do.  But given the distributed behavior of
Hadoop, it's likely that your mappers will be running on different nodes
than your reducers anyway - why ship around duplicate code?


On Tue, Aug 27, 2013 at 9:48 AM, Shahab Yunus <sh...@gmail.com>wrote:

> For starters (experts might have more complex reasons), what if your
> respective map and reduce logic becomes complex enough to demand separate
> classes? Why tie the clients to implement both by moving these in one Job
> interface. In the current design you can always implement both (map and
> reduce) interfaces if your logic is simple enough and go the other route,
> of separate classes if that is required. I think it is more flexible this
> way (you can always build up from and on top of granular design, rather
> than other way around.)
>
> I hope I understood your concern correctly...
>
> Regards,
> Shahab
>
>
> On Tue, Aug 27, 2013 at 11:35 AM, Andrew Pennebaker <apennebaker@42six.com
> > wrote:
>
>> There seems to be an abundance of boilerplate patterns in MapReduce:
>>
>> * Write a class extending Map (1), implementing Mapper (2), with a map
>> method (3)
>> * Write a class extending Reduce (4), implementing Reducer (5), with a
>> reduce method (6)
>>
>> Could we achieve the same behavior with a single Job interface requiring
>> map() and reduce() methods?
>>
>
>


-- 

"A child of five could understand this.  Fetch me a child of five."

Re: Simplifying MapReduce API

Posted by Shahab Yunus <sh...@gmail.com>.

For starters (experts might have more complex reasons), what if your
respective map and reduce logic becomes complex enough to demand separate
classes? Why tie the clients to implement both by moving these in one Job
interface. In the current design you can always implement both (map and
reduce) interfaces if your logic is simple enough and go the other route,
of separate classes if that is required. I think it is more flexible this
way (you can always build up from and on top of granular design, rather
than other way around.)

I hope I understood your concern correctly...

Regards,
Shahab

On Tue, Aug 27, 2013 at 11:35 AM, Andrew Pennebaker
<ap...@42six.com>wrote:

> There seems to be an abundance of boilerplate patterns in MapReduce:
>
> * Write a class extending Map (1), implementing Mapper (2), with a map
> method (3)
> * Write a class extending Reduce (4), implementing Reducer (5), with a
> reduce method (6)
>
> Could we achieve the same behavior with a single Job interface requiring
> map() and reduce() methods?
>

Re: Simplifying MapReduce API

Posted by Shahab Yunus <sh...@gmail.com>.

For starters (experts might have more complex reasons), what if your
respective map and reduce logic becomes complex enough to demand separate
classes? Why tie the clients to implement both by moving these in one Job
interface. In the current design you can always implement both (map and
reduce) interfaces if your logic is simple enough and go the other route,
of separate classes if that is required. I think it is more flexible this
way (you can always build up from and on top of granular design, rather
than other way around.)

I hope I understood your concern correctly...

Regards,
Shahab

On Tue, Aug 27, 2013 at 11:35 AM, Andrew Pennebaker
<ap...@42six.com>wrote:

> There seems to be an abundance of boilerplate patterns in MapReduce:
>
> * Write a class extending Map (1), implementing Mapper (2), with a map
> method (3)
> * Write a class extending Reduce (4), implementing Reducer (5), with a
> reduce method (6)
>
> Could we achieve the same behavior with a single Job interface requiring
> map() and reduce() methods?
>

Re: Simplifying MapReduce API

Posted by Shahab Yunus <sh...@gmail.com>.

For starters (experts might have more complex reasons), what if your
respective map and reduce logic becomes complex enough to demand separate
classes? Why tie the clients to implement both by moving these in one Job
interface. In the current design you can always implement both (map and
reduce) interfaces if your logic is simple enough and go the other route,
of separate classes if that is required. I think it is more flexible this
way (you can always build up from and on top of granular design, rather
than other way around.)

I hope I understood your concern correctly...

Regards,
Shahab

On Tue, Aug 27, 2013 at 11:35 AM, Andrew Pennebaker
<ap...@42six.com>wrote:

> There seems to be an abundance of boilerplate patterns in MapReduce:
>
> * Write a class extending Map (1), implementing Mapper (2), with a map
> method (3)
> * Write a class extending Reduce (4), implementing Reducer (5), with a
> reduce method (6)
>
> Could we achieve the same behavior with a single Job interface requiring
> map() and reduce() methods?
>

Re: Simplifying MapReduce API

Posted by Shahab Yunus <sh...@gmail.com>.

For starters (experts might have more complex reasons), what if your
respective map and reduce logic becomes complex enough to demand separate
classes? Why tie the clients to implement both by moving these in one Job
interface. In the current design you can always implement both (map and
reduce) interfaces if your logic is simple enough and go the other route,
of separate classes if that is required. I think it is more flexible this
way (you can always build up from and on top of granular design, rather
than other way around.)

I hope I understood your concern correctly...

Regards,
Shahab

On Tue, Aug 27, 2013 at 11:35 AM, Andrew Pennebaker
<ap...@42six.com>wrote:

> There seems to be an abundance of boilerplate patterns in MapReduce:
>
> * Write a class extending Map (1), implementing Mapper (2), with a map
> method (3)
> * Write a class extending Reduce (4), implementing Reducer (5), with a
> reduce method (6)
>
> Could we achieve the same behavior with a single Job interface requiring
> map() and reduce() methods?
>