You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by modemide <mo...@gmail.com> on 2011/09/01 21:47:01 UTC

MultipleOutputs - Create multiple files during output

Hi all,
I was wondering if anyone was familiar with this class.  I want to
create multiple output files during my reduce.

My input files will consist of
<name1><action1><date1>
<name1><action2><date2>
<name1><action3><date3>

<name2><action1><date1>
<name2><action2><date2>
<name2><action3><date3>


My goal is to create files with the following format
Filename:
<name>_<Date:CCYYMM>

File Contents:
<action1>
<action2>
<action3>


I.e. This will store all the actions of one person for any given month
in one file.

I just don't know how I will decide the file name at run time.  Can anyone help?

Thanks,
Tim

Re: MultipleOutputs - Create multiple files during output

Posted by Stan Rosenberg <sr...@proclivitysystems.com>.
Hi Tim,

You could create a custom HashPartitioner so that all key,value pairs
denoting the actions of the same user end up in the same reducer; then you
need
only one output file per reducer.  Btw, how large are the output files? make
sure you don't end up creating
a lot of small files, i.e., << 64MB.

Best,

stan

On Thu, Sep 1, 2011 at 3:47 PM, modemide <mo...@gmail.com> wrote:

> Hi all,
> I was wondering if anyone was familiar with this class.  I want to
> create multiple output files during my reduce.
>
> My input files will consist of
> <name1><action1><date1>
> <name1><action2><date2>
> <name1><action3><date3>
>
> <name2><action1><date1>
> <name2><action2><date2>
> <name2><action3><date3>
>
>
> My goal is to create files with the following format
> Filename:
> <name>_<Date:CCYYMM>
>
> File Contents:
> <action1>
> <action2>
> <action3>
>
>
> I.e. This will store all the actions of one person for any given month
> in one file.
>
> I just don't know how I will decide the file name at run time.  Can anyone
> help?
>
> Thanks,
> Tim
>