You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by modemide <mo...@gmail.com> on 2011/09/01 21:47:01 UTC
MultipleOutputs - Create multiple files during output
Hi all,
I was wondering if anyone was familiar with this class. I want to
create multiple output files during my reduce.
My input files will consist of
<name1><action1><date1>
<name1><action2><date2>
<name1><action3><date3>
<name2><action1><date1>
<name2><action2><date2>
<name2><action3><date3>
My goal is to create files with the following format
Filename:
<name>_<Date:CCYYMM>
File Contents:
<action1>
<action2>
<action3>
I.e. This will store all the actions of one person for any given month
in one file.
I just don't know how I will decide the file name at run time. Can anyone help?
Thanks,
Tim
Re: MultipleOutputs - Create multiple files during output
Posted by Stan Rosenberg <sr...@proclivitysystems.com>.
Hi Tim,
You could create a custom HashPartitioner so that all key,value pairs
denoting the actions of the same user end up in the same reducer; then you
need
only one output file per reducer. Btw, how large are the output files? make
sure you don't end up creating
a lot of small files, i.e., << 64MB.
Best,
stan
On Thu, Sep 1, 2011 at 3:47 PM, modemide <mo...@gmail.com> wrote:
> Hi all,
> I was wondering if anyone was familiar with this class. I want to
> create multiple output files during my reduce.
>
> My input files will consist of
> <name1><action1><date1>
> <name1><action2><date2>
> <name1><action3><date3>
>
> <name2><action1><date1>
> <name2><action2><date2>
> <name2><action3><date3>
>
>
> My goal is to create files with the following format
> Filename:
> <name>_<Date:CCYYMM>
>
> File Contents:
> <action1>
> <action2>
> <action3>
>
>
> I.e. This will store all the actions of one person for any given month
> in one file.
>
> I just don't know how I will decide the file name at run time. Can anyone
> help?
>
> Thanks,
> Tim
>