You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Songting Chen <ke...@yahoo.com> on 2008/12/09 20:35:18 UTC

End of block/file for Map

Is there a way for the Map process to know it's the end of records?

I need to flush some additional data at the end of the Map process, but wondering where I should put that code.

Thanks,
-Songting

Re: End of block/file for Map

Posted by Owen O'Malley <om...@apache.org>.
On Dec 9, 2008, at 7:34 PM, Aaron Kimball wrote:

> That's true, but you should be aware that you no longer have an
> OutputCollector available in the close() method.

True, but in practice you can keep a handle to it from the map method  
and it will work perfectly. This is required for both streaming and  
pipes to work. (Both of them do their processing asynchronously, so  
the close needs to wait for the subprocess to finish. Because of this,  
the contract with the Mapper and Reducer are very loose and the  
collect method may be called in between calls to the map method.)  In  
the context object api (hadoop-1230), the api will include the context  
object in cleanup, to make it clear that cleanup can also write records.

-- Owen

Re: End of block/file for Map

Posted by Aaron Kimball <aa...@cloudera.com>.
That's true, but you should be aware that you no longer have an
OutputCollector available in the close() method.  So if you are planning to
have each mapper emit some sort of "end" record along to the reducer, you
can't do so there. In general, there is not a good solution to that; you
should rethink your algorithm if possible so that you don't need to do that.


(I am not sure what happens if you memoize the OutputCollector you got as a
parameter to your map() method and try to use it. Probably nothing good.)

- Aaron


On Tue, Dec 9, 2008 at 11:42 AM, Owen O'Malley <om...@apache.org> wrote:

>
> On Dec 9, 2008, at 11:35 AM, Songting Chen wrote:
>
>  Is there a way for the Map process to know it's the end of records?
>>
>> I need to flush some additional data at the end of the Map process, but
>> wondering where I should put that code.
>>
>
> The close() method is called at the end of the map.
>
> -- Owen
>

Re: End of block/file for Map

Posted by Owen O'Malley <om...@apache.org>.
On Dec 9, 2008, at 11:35 AM, Songting Chen wrote:

> Is there a way for the Map process to know it's the end of records?
>
> I need to flush some additional data at the end of the Map process,  
> but wondering where I should put that code.

The close() method is called at the end of the map.

-- Owen