You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Injun Joe <ll...@yahoo.com.hk> on 2011/04/15 21:05:18 UTC
successive mappers
Hi,
I am coding a map-reduce program which involves several map-reduce steps. The work that my program does is only in the mapper, so I was thinking to have no reduce steps but successive mappers. The logic can be written like this for mappers at iteration 0 and 1:
1. Take input.
2. Map 0:
Determine if a key-value pair satisfies condition C.
- If it satisfies condition then output the key-value pair to a file in directory E.
- If it does not then transform key-value pair and output the key-value pair to directory D.
3. Map 1:
- Change input directory to directory D
- Perform same steps as map 0.
So, the problem is that I have not been able to find a way to output key-value pairs to different directories. All I have been able to specify is the map output directory by TextOutputFormat.setOutputPath.
Any help would be appreciated.
Thanks a lot
I
Re: successive mappers
Posted by Injun Joe <ll...@yahoo.com.hk>.
"that some map instances may not require further processing. So if I try to do
everything in a single mapper instance"
should be read as
"that some key value pairs may not require further processing. So if I try to do
everything in a single mapper ".
________________________________
From: Injun Joe <ll...@yahoo.com.hk>
To: mapreduce-user@hadoop.apache.org
Sent: Fri, April 15, 2011 4:41:03 PM
Subject: Re: successive mappers
The problem with doing all of them in a single mapper is that some map instances
may not require further processing. So if I try to do everything in a single
mapper instance, I will have a lot of cpus lying idle while others take the
load.
________________________________
From: Robert Evans <ev...@yahoo-inc.com>
To: "mapreduce-user@hadoop.apache.org" <ma...@hadoop.apache.org>
Sent: Fri, April 15, 2011 4:20:14 PM
Subject: Re: successive mappers
I,
Take a look at the Multiple output format classes
http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapred/lib/MultipleTextOutputFormat.html
Is a good example. You should be able to create a custom output format class
that matches your needs. Although, if all you are doing is map processing then
why are you outputting intermediate results instead of processing them all in a
single mapper? It should be a lot faster if you don’t need the intermediate
results.
--Bobby Evans
On 4/15/11 2:05 PM, "Injun Joe" <ll...@yahoo.com.hk> wrote:
Hi,
>I am coding a map-reduce program which involves several map-reduce steps. The
>work that my program does is only in the mapper, so I was thinking to have no
>reduce steps but successive mappers. The logic can be written like this for
>mappers at iteration 0 and 1:
>
>1. Take input.
>2. Map 0:
> Determine if a key-value pair satisfies condition C.
> - If it satisfies condition then output the key-value pair to a file in
>directory E.
> - If it does not then transform key-value pair and output the key-value pair
>to directory D.
>3. Map 1:
> - Change input directory to directory D
> - Perform same steps as map 0.
>
>So, the problem is that I have not been able to find a way to output key-value
>pairs to different directories. All I have been able to specify is the map
>output directory by TextOutputFormat.setOutputPath.
>
>Any help would be appreciated.
>
>Thanks a lot
>I
>
>
>
Re: successive mappers
Posted by Injun Joe <ll...@yahoo.com.hk>.
The problem with doing all of them in a single mapper is that some map instances
may not require further processing. So if I try to do everything in a single
mapper instance, I will have a lot of cpus lying idle while others take the
load.
________________________________
From: Robert Evans <ev...@yahoo-inc.com>
To: "mapreduce-user@hadoop.apache.org" <ma...@hadoop.apache.org>
Sent: Fri, April 15, 2011 4:20:14 PM
Subject: Re: successive mappers
I,
Take a look at the Multiple output format classes
http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapred/lib/MultipleTextOutputFormat.html
Is a good example. You should be able to create a custom output format class
that matches your needs. Although, if all you are doing is map processing then
why are you outputting intermediate results instead of processing them all in a
single mapper? It should be a lot faster if you don’t need the intermediate
results.
--Bobby Evans
On 4/15/11 2:05 PM, "Injun Joe" <ll...@yahoo.com.hk> wrote:
Hi,
>I am coding a map-reduce program which involves several map-reduce steps. The
>work that my program does is only in the mapper, so I was thinking to have no
>reduce steps but successive mappers. The logic can be written like this for
>mappers at iteration 0 and 1:
>
>1. Take input.
>2. Map 0:
> Determine if a key-value pair satisfies condition C.
> - If it satisfies condition then output the key-value pair to a file in
>directory E.
> - If it does not then transform key-value pair and output the key-value pair
>to directory D.
>3. Map 1:
> - Change input directory to directory D
> - Perform same steps as map 0.
>
>So, the problem is that I have not been able to find a way to output key-value
>pairs to different directories. All I have been able to specify is the map
>output directory by TextOutputFormat.setOutputPath.
>
>Any help would be appreciated.
>
>Thanks a lot
>I
>
>
>
Re: successive mappers
Posted by Robert Evans <ev...@yahoo-inc.com>.
I,
Take a look at the Multiple output format classes
http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapred/lib/MultipleTextOutputFormat.html
Is a good example. You should be able to create a custom output format class that matches your needs. Although, if all you are doing is map processing then why are you outputting intermediate results instead of processing them all in a single mapper? It should be a lot faster if you don't need the intermediate results.
--Bobby Evans
On 4/15/11 2:05 PM, "Injun Joe" <ll...@yahoo.com.hk> wrote:
Hi,
I am coding a map-reduce program which involves several map-reduce steps. The work that my program does is only in the mapper, so I was thinking to have no reduce steps but successive mappers. The logic can be written like this for mappers at iteration 0 and 1:
1. Take input.
2. Map 0:
Determine if a key-value pair satisfies condition C.
- If it satisfies condition then output the key-value pair to a file in directory E.
- If it does not then transform key-value pair and output the key-value pair to directory D.
3. Map 1:
- Change input directory to directory D
- Perform same steps as map 0.
So, the problem is that I have not been able to find a way to output key-value pairs to different directories. All I have been able to specify is the map output directory by TextOutputFormat.setOutputPath.
Any help would be appreciated.
Thanks a lot
I