You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Myles Grant <my...@mylesgrant.com> on 2008/01/16 07:07:24 UTC

Single output file per reduce key?

Hello,

I'd like me reduce tasks to each output a single file per key,  
containing the value. Each file would be named with the key.  It  
appears that I need to (at least) create a new OutputFormat and  
possible a RecordWriter.  As doing this would likely involve a lot of  
trial and error on my part, I was curious if someone had implemented  
this already and would like to share.  I will be needing both versions  
that write text files and binary files eventually.

Short a full existing implementation that I can steal, how about some  
hints?

Cheers,
Myles

Re: Single output file per reduce key?

Posted by Amar Kamat <am...@yahoo-inc.com>.
Myles Grant wrote:
> I would like the values for a key to exist in a single file, and only 
> the values for that key.
Reducer.reduce() gets invoked once per key, i.e just once per key along 
with all the values associated with it.
Reducer.reduce(key,<value1, value2, value3 ....);
So what I suggested should help you generate one file per key. Since you 
have an iterator over all the values associated with that key you don't 
have to do much and since the input to the reducer is sorted you can be 
sure that all the values for the key are passed to Reducer.reduce().
Amar
> Each reduced key/value would get its own file.  If I understand 
> correctly, all output of the reducers is written to a single file.
>
> -Myles
>
> On Jan 16, 2008, at 9:29 PM, Amar Kamat wrote:
>
>> Hi,
>> Why couldn't you just write this logic in your reducer class. The 
>> reduce [reduceClass.reduce()] method is invoked with a key and an 
>> iterator over the values associated with the key. You can simply dump 
>> the values into a file. Since the input to the reducer is sorted you 
>> can simply dump the values to a file i.e no bookkeeping is required. 
>> I think this is what you wanted. no?
>> Myles Grant wrote:
>>> Hello,
>>>
>>> I'd like me reduce tasks to each output a single file per key, 
>>> containing the value. Each file would be named with the key.  It 
>>> appears that I need to (at least) create a new OutputFormat and 
>>> possible a RecordWriter.  As doing this would likely involve a lot 
>>> of trial and error on my part, I was curious if someone had 
>>> implemented this already and would like to share.  I will be needing 
>>> both versions that write text files and binary files eventually.
>>>
>>> Short a full existing implementation that I can steal, how about 
>>> some hints?
>>>
>>> Cheers,
>>> Myles
>>
>


Re: Single output file per reduce key?

Posted by Myles Grant <my...@mylesgrant.com>.
I would like the values for a key to exist in a single file, and only  
the values for that key.  Each reduced key/value would get its own  
file.  If I understand correctly, all output of the reducers is  
written to a single file.

-Myles

On Jan 16, 2008, at 9:29 PM, Amar Kamat wrote:

> Hi,
> Why couldn't you just write this logic in your reducer class. The  
> reduce [reduceClass.reduce()] method is invoked with a key and an  
> iterator over the values associated with the key. You can simply  
> dump the values into a file. Since the input to the reducer is  
> sorted you can simply dump the values to a file i.e no bookkeeping  
> is required. I think this is what you wanted. no?
> Myles Grant wrote:
>> Hello,
>>
>> I'd like me reduce tasks to each output a single file per key,  
>> containing the value. Each file would be named with the key.  It  
>> appears that I need to (at least) create a new OutputFormat and  
>> possible a RecordWriter.  As doing this would likely involve a lot  
>> of trial and error on my part, I was curious if someone had  
>> implemented this already and would like to share.  I will be  
>> needing both versions that write text files and binary files  
>> eventually.
>>
>> Short a full existing implementation that I can steal, how about  
>> some hints?
>>
>> Cheers,
>> Myles
>


Re: Single output file per reduce key?

Posted by Amar Kamat <am...@yahoo-inc.com>.
Hi,
Why couldn't you just write this logic in your reducer class. The reduce 
[reduceClass.reduce()] method is invoked with a key and an iterator over 
the values associated with the key. You can simply dump the values into 
a file. Since the input to the reducer is sorted you can simply dump the 
values to a file i.e no bookkeeping is required. I think this is what 
you wanted. no?
Myles Grant wrote:
> Hello,
>
> I'd like me reduce tasks to each output a single file per key, 
> containing the value. Each file would be named with the key.  It 
> appears that I need to (at least) create a new OutputFormat and 
> possible a RecordWriter.  As doing this would likely involve a lot of 
> trial and error on my part, I was curious if someone had implemented 
> this already and would like to share.  I will be needing both versions 
> that write text files and binary files eventually.
>
> Short a full existing implementation that I can steal, how about some 
> hints?
>
> Cheers,
> Myles