You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Myles Grant <my...@mylesgrant.com> on 2008/01/16 07:07:24 UTC
Single output file per reduce key?
Hello,
I'd like me reduce tasks to each output a single file per key,
containing the value. Each file would be named with the key. It
appears that I need to (at least) create a new OutputFormat and
possible a RecordWriter. As doing this would likely involve a lot of
trial and error on my part, I was curious if someone had implemented
this already and would like to share. I will be needing both versions
that write text files and binary files eventually.
Short a full existing implementation that I can steal, how about some
hints?
Cheers,
Myles
Re: Single output file per reduce key?
Posted by Amar Kamat <am...@yahoo-inc.com>.
Myles Grant wrote:
> I would like the values for a key to exist in a single file, and only
> the values for that key.
Reducer.reduce() gets invoked once per key, i.e just once per key along
with all the values associated with it.
Reducer.reduce(key,<value1, value2, value3 ....);
So what I suggested should help you generate one file per key. Since you
have an iterator over all the values associated with that key you don't
have to do much and since the input to the reducer is sorted you can be
sure that all the values for the key are passed to Reducer.reduce().
Amar
> Each reduced key/value would get its own file. If I understand
> correctly, all output of the reducers is written to a single file.
>
> -Myles
>
> On Jan 16, 2008, at 9:29 PM, Amar Kamat wrote:
>
>> Hi,
>> Why couldn't you just write this logic in your reducer class. The
>> reduce [reduceClass.reduce()] method is invoked with a key and an
>> iterator over the values associated with the key. You can simply dump
>> the values into a file. Since the input to the reducer is sorted you
>> can simply dump the values to a file i.e no bookkeeping is required.
>> I think this is what you wanted. no?
>> Myles Grant wrote:
>>> Hello,
>>>
>>> I'd like me reduce tasks to each output a single file per key,
>>> containing the value. Each file would be named with the key. It
>>> appears that I need to (at least) create a new OutputFormat and
>>> possible a RecordWriter. As doing this would likely involve a lot
>>> of trial and error on my part, I was curious if someone had
>>> implemented this already and would like to share. I will be needing
>>> both versions that write text files and binary files eventually.
>>>
>>> Short a full existing implementation that I can steal, how about
>>> some hints?
>>>
>>> Cheers,
>>> Myles
>>
>
Re: Single output file per reduce key?
Posted by Myles Grant <my...@mylesgrant.com>.
I would like the values for a key to exist in a single file, and only
the values for that key. Each reduced key/value would get its own
file. If I understand correctly, all output of the reducers is
written to a single file.
-Myles
On Jan 16, 2008, at 9:29 PM, Amar Kamat wrote:
> Hi,
> Why couldn't you just write this logic in your reducer class. The
> reduce [reduceClass.reduce()] method is invoked with a key and an
> iterator over the values associated with the key. You can simply
> dump the values into a file. Since the input to the reducer is
> sorted you can simply dump the values to a file i.e no bookkeeping
> is required. I think this is what you wanted. no?
> Myles Grant wrote:
>> Hello,
>>
>> I'd like me reduce tasks to each output a single file per key,
>> containing the value. Each file would be named with the key. It
>> appears that I need to (at least) create a new OutputFormat and
>> possible a RecordWriter. As doing this would likely involve a lot
>> of trial and error on my part, I was curious if someone had
>> implemented this already and would like to share. I will be
>> needing both versions that write text files and binary files
>> eventually.
>>
>> Short a full existing implementation that I can steal, how about
>> some hints?
>>
>> Cheers,
>> Myles
>
Re: Single output file per reduce key?
Posted by Amar Kamat <am...@yahoo-inc.com>.
Hi,
Why couldn't you just write this logic in your reducer class. The reduce
[reduceClass.reduce()] method is invoked with a key and an iterator over
the values associated with the key. You can simply dump the values into
a file. Since the input to the reducer is sorted you can simply dump the
values to a file i.e no bookkeeping is required. I think this is what
you wanted. no?
Myles Grant wrote:
> Hello,
>
> I'd like me reduce tasks to each output a single file per key,
> containing the value. Each file would be named with the key. It
> appears that I need to (at least) create a new OutputFormat and
> possible a RecordWriter. As doing this would likely involve a lot of
> trial and error on my part, I was curious if someone had implemented
> this already and would like to share. I will be needing both versions
> that write text files and binary files eventually.
>
> Short a full existing implementation that I can steal, how about some
> hints?
>
> Cheers,
> Myles