You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by Anh Pham <ph...@gmail.com> on 2013/10/16 10:16:56 UTC

How to modify the Map-Reduce execution order?

(Please correct me if I am wrong) So the original chain is:
InputSplits-->Mapper--> [Sorting/Shuffling, etc]-->Reducer-->...

Now I don't want the input splits to get to the Mappers first, but to go to
some other new stage instead (we can call it Pre-Mapper for example, this
class will be created by myself).

So the new order will be: InputSplits -> Pre-Mapper->Mapper ->...

I'm currently reading the source code. However, I still cannot find any
clue (what classes I should touch). Any suggestion is welcome. Thank you
very much :)

Re: How to modify the Map-Reduce execution order?

Posted by Arun C Murthy <ac...@hortonworks.com>.
You might be interested in Apache Tez which provides native support for these sort of scenarios by being a super-set of MapReduce:

http://tez.incubator.apache.org/

Arun

On Oct 16, 2013, at 1:16 AM, Anh Pham <ph...@gmail.com> wrote:

> (Please correct me if I am wrong) So the original chain is:
> InputSplits-->Mapper--> [Sorting/Shuffling, etc]-->Reducer-->...
> 
> Now I don't want the input splits to get to the Mappers first, but to go to
> some other new stage instead (we can call it Pre-Mapper for example, this
> class will be created by myself).
> 
> So the new order will be: InputSplits -> Pre-Mapper->Mapper ->...
> 
> I'm currently reading the source code. However, I still cannot find any
> clue (what classes I should touch). Any suggestion is welcome. Thank you
> very much :)

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: How to modify the Map-Reduce execution order?

Posted by Tsuyoshi OZAWA <oz...@gmail.com>.
Hi Anh,

How about using ChainMapper? Is the helpful for you?
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/ChainMapper.html

Thanks, Tsuyoshi

On Wed, Oct 16, 2013 at 1:16 AM, Anh Pham <ph...@gmail.com> wrote:
> (Please correct me if I am wrong) So the original chain is:
> InputSplits-->Mapper--> [Sorting/Shuffling, etc]-->Reducer-->...
>
> Now I don't want the input splits to get to the Mappers first, but to go to
> some other new stage instead (we can call it Pre-Mapper for example, this
> class will be created by myself).
>
> So the new order will be: InputSplits -> Pre-Mapper->Mapper ->...
>
> I'm currently reading the source code. However, I still cannot find any
> clue (what classes I should touch). Any suggestion is welcome. Thank you
> very much :)



-- 
- Tsuyoshi