You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Injun Joe <ll...@yahoo.com.hk> on 2011/04/15 21:05:18 UTC

successive mappers

Hi,
I am coding a map-reduce program which involves several map-reduce steps. The work that my program does is only in the mapper, so I was thinking to have no reduce steps but successive mappers. The logic can be written like this for mappers at iteration 0 and 1:

1. Take input.
2. Map 0:
   Determine if a key-value pair satisfies condition C. 
    - If it satisfies condition then output the key-value pair to a file in directory E.
    - If it does not then transform key-value pair and output the key-value pair to directory D.
3. Map 1:
   - Change input directory to directory D
   - Perform same steps as map 0.

So, the problem is that I have not been able to find a way to output key-value pairs to different directories. All I have been able to specify is the map output directory by TextOutputFormat.setOutputPath. 

Any help would be appreciated.

Thanks a lot
I

Re: successive mappers

Posted by Injun Joe <ll...@yahoo.com.hk>.

"that some map instances may not require further processing. So if I try to do 
everything in a single mapper instance"

should be read as 

"that some key value pairs may not require further processing. So if I try to do 
everything in a single mapper ".

________________________________
From: Injun Joe <ll...@yahoo.com.hk>
To: mapreduce-user@hadoop.apache.org
Sent: Fri, April 15, 2011 4:41:03 PM
Subject: Re: successive mappers

The problem with doing all of them in a single mapper is that some map instances 
may not require further processing. So if I try to do everything in a single 
mapper instance, I will have a lot of cpus lying idle while others take the 
load. 

________________________________
From: Robert Evans <ev...@yahoo-inc.com>
To: "mapreduce-user@hadoop.apache.org" <ma...@hadoop.apache.org>
Sent: Fri, April 15, 2011 4:20:14 PM
Subject: Re: successive mappers

I,

Take a look at the Multiple output format classes

http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapred/lib/MultipleTextOutputFormat.html

Is a good example.  You should be able to create a custom output format class 
that matches your needs.  Although, if all you are doing is map processing then 
why are you outputting intermediate results instead of processing them all in a 
single mapper?  It should be a lot faster if you don’t need the intermediate 
results.

--Bobby Evans

On 4/15/11 2:05 PM, "Injun Joe" <ll...@yahoo.com.hk> wrote:

Hi,
>I am coding a map-reduce program which involves several map-reduce steps. The 
>work that my program does is only in the mapper, so I was thinking to have no 
>reduce steps but successive mappers. The logic can be written like this for 
>mappers at iteration 0 and 1:
>
>1. Take input.
>2. Map 0:
>   Determine if a key-value pair satisfies condition C.
>    - If it satisfies condition then output the key-value pair to a file in 
>directory E.
>    - If it does not then transform key-value pair and output the key-value pair 
>to directory D.
>3. Map 1:
>   - Change input directory to directory D
>   - Perform same steps as map 0.
>
>So, the problem is that I have not been able to find a way to output key-value 
>pairs to different directories. All I have been able to specify is the map 
>output directory by TextOutputFormat.setOutputPath.
>
>Any help would be appreciated.
>
>Thanks a lot
>I
>
>
>

Re: successive mappers

Posted by Injun Joe <ll...@yahoo.com.hk>.

The problem with doing all of them in a single mapper is that some map instances 
may not require further processing. So if I try to do everything in a single 
mapper instance, I will have a lot of cpus lying idle while others take the 
load. 

________________________________
From: Robert Evans <ev...@yahoo-inc.com>
To: "mapreduce-user@hadoop.apache.org" <ma...@hadoop.apache.org>
Sent: Fri, April 15, 2011 4:20:14 PM
Subject: Re: successive mappers

I,

Take a look at the Multiple output format classes

http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapred/lib/MultipleTextOutputFormat.html

Is a good example.  You should be able to create a custom output format class 
that matches your needs.  Although, if all you are doing is map processing then 
why are you outputting intermediate results instead of processing them all in a 
single mapper?  It should be a lot faster if you don’t need the intermediate 
results.

--Bobby Evans

On 4/15/11 2:05 PM, "Injun Joe" <ll...@yahoo.com.hk> wrote:

Hi,
>I am coding a map-reduce program which involves several map-reduce steps. The 
>work that my program does is only in the mapper, so I was thinking to have no 
>reduce steps but successive mappers. The logic can be written like this for 
>mappers at iteration 0 and 1:
>
>1. Take input.
>2. Map 0:
>   Determine if a key-value pair satisfies condition C.
>    - If it satisfies condition then output the key-value pair to a file in 
>directory E.
>    - If it does not then transform key-value pair and output the key-value pair 
>to directory D.
>3. Map 1:
>   - Change input directory to directory D
>   - Perform same steps as map 0.
>
>So, the problem is that I have not been able to find a way to output key-value 
>pairs to different directories. All I have been able to specify is the map 
>output directory by TextOutputFormat.setOutputPath.
>
>Any help would be appreciated.
>
>Thanks a lot
>I
>
>
>

Re: successive mappers

Posted by Robert Evans <ev...@yahoo-inc.com>.

I,

Take a look at the Multiple output format classes

http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapred/lib/MultipleTextOutputFormat.html

Is a good example.  You should be able to create a custom output format class that matches your needs.  Although, if all you are doing is map processing then why are you outputting intermediate results instead of processing them all in a single mapper?  It should be a lot faster if you don't need the intermediate results.

--Bobby Evans

On 4/15/11 2:05 PM, "Injun Joe" <ll...@yahoo.com.hk> wrote:

Hi,
I am coding a map-reduce program which involves several map-reduce steps. The work that my program does is only in the mapper, so I was thinking to have no reduce steps but successive mappers. The logic can be written like this for mappers at iteration 0 and 1:

1. Take input.
2. Map 0:
   Determine if a key-value pair satisfies condition C.
    - If it satisfies condition then output the key-value pair to a file in directory E.
    - If it does not then transform key-value pair and output the key-value pair to directory D.
3. Map 1:
   - Change input directory to directory D
   - Perform same steps as map 0.

So, the problem is that I have not been able to find a way to output key-value pairs to different directories. All I have been able to specify is the map output directory by TextOutputFormat.setOutputPath.

Any help would be appreciated.

Thanks a lot
I