You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Matei Zaharia <ma...@cloudera.com> on 2009/02/11 02:20:04 UTC

Is there a way to tell whether you're in a map task or a reduce task?

I'd like to write a combiner that shares a lot of code with a reducer,
except that the reducer updates an external database at the end. As far as I
can tell, since both combiners and reducers must implement the Reducer
interface, there is no way to have this be the same class. Is there a
recommended way to test inside the task whether you're running as a combiner
(in a map task) or as a reducer?
If not, I think this might be an interesting thing to support in the Hadoop
1.0 API. It would enable people to write an AbstractJob class where you just
implement map, combine and reduce functions, and can thus write MapReduce
jobs in a single Java class. Having separate mapper and reducer classes
would still be supported of course.

Re: Is there a way to tell whether you're in a map task or a reduce task?

Posted by Owen O'Malley <om...@apache.org>.
On Feb 10, 2009, at 5:20 PM, Matei Zaharia wrote:

> I'd like to write a combiner that shares a lot of code with a reducer,
> except that the reducer updates an external database at the end.

The right way to do this is to either do the update in the output  
format or do something like:

class MyCombiner implements Reducer {
  ...
  public void close() throws IOException {}
}

class MyReduer extends MyCombiner {
   ...
   public void close() throws IOException { ... update database ... }
}

> As far as I
> can tell, since both combiners and reducers must implement the Reducer
> interface, there is no way to have this be the same class.

There are ways to do it, but they are likely to change.

> Is there a
> recommended way to test inside the task whether you're running as a  
> combiner
> (in a map task) or as a reducer?

The question is worse than you think. In particular, the question is  
*not* are you in a map or reduce task. With current versions of  
Hadoop, the combiner can be called in the context of the reduce as  
well as the map. You really want to know if you are in a Reducer or  
Combiner context.

> If not, I think this might be an interesting thing to support in the  
> Hadoop
> 1.0 API.

It probably does make sense to add to ReduceContext.isCombiner() to  
answer the question. In practice, usually if someone wants to use  
*almost* the same code for combiner and reducer, I get suspicious of  
their design.

> It would enable people to write an AbstractJob class where you just
> implement map, combine and reduce functions, and can thus write  
> MapReduce
> jobs in a single Java class.

The old api allowed this, since both Mapper and Reducer were  
interfaces. The new api doesn't because they are both classes. It  
wouldn't be hard to make a set of adaptors in library code that would  
work. Basically, you would define a job with SimpleMapper,  
SimpleCombiner, and SimpleReducer that would call Task.map,  
Task.combine, and Task.reduce.

-- Owen