You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by rajgopalv <ra...@gmail.com> on 2010/08/16 08:55:12 UTC

MultipleOutputFormat

 0  down vote  favorite
	

Hi. I'm a newbie in Hadoop. I'm trying out the Wordcount program.

Now to try out multiple output files, i use MultipleOutputFormat. this link
helped me in doing it.
http://hadoop.apache.org/common/docs/r0.19.0/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html

in my driver class i had

    MultipleOutputs.addNamedOutput(conf, "even",
            org.apache.hadoop.mapred.TextOutputFormat.class, Text.class,
            IntWritable.class);

    MultipleOutputs.addNamedOutput(conf, "odd",
            org.apache.hadoop.mapred.TextOutputFormat.class, Text.class,
            IntWritable.class);`

and my reduce class became this

public static class Reduce extends MapReduceBase implements
        Reducer<Text, IntWritable, Text, IntWritable> {
    MultipleOutputs mos = null;

    public void configure(JobConf job) {
        mos = new MultipleOutputs(job);
    }

    public void reduce(Text key, Iterator<IntWritable> values,
            OutputCollector<Text, IntWritable> output, Reporter reporter)
            throws IOException {
        int sum = 0;
        while (values.hasNext()) {
            sum += values.next().get();
        }
        if (sum % 2 == 0) {
            mos.getCollector("even", reporter).collect(key, new
IntWritable(sum));
        }else {
            mos.getCollector("odd", reporter).collect(key, new
IntWritable(sum));
        }
        //output.collect(key, new IntWritable(sum));
    }
    @Override
    public void close() throws IOException {
        // TODO Auto-generated method stub
    mos.close();
    }
}

Things worked , but i get LOT of files, (one odd and one even for every
map-reduce)

Question is : How can i have just 2 output files (odd & even) so that every
odd output of every reduce gets written into that odd file, and same for
even.

-- 
View this message in context: http://old.nabble.com/MultipleOutputFormat-tp29447204p29447204.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.


Re: MultipleOutputFormat

Posted by Patrick Angeles <pa...@cloudera.com>.
In this case, don't bother with MultipleOutput.

Specify 2 reducers, and a custom partitioner that sends 'even' records to
partition 0, and 'odd' partitions to partition 1.

You will have two output files named 'part-00000' and 'part-00001'
corresponding to odd and even.

On Mon, Aug 16, 2010 at 2:55 AM, rajgopalv <ra...@gmail.com> wrote:

>
>  0  down vote  favorite
>
>
> Hi. I'm a newbie in Hadoop. I'm trying out the Wordcount program.
>
> Now to try out multiple output files, i use MultipleOutputFormat. this link
> helped me in doing it.
>
> http://hadoop.apache.org/common/docs/r0.19.0/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html
>
> in my driver class i had
>
>    MultipleOutputs.addNamedOutput(conf, "even",
>            org.apache.hadoop.mapred.TextOutputFormat.class, Text.class,
>            IntWritable.class);
>
>    MultipleOutputs.addNamedOutput(conf, "odd",
>            org.apache.hadoop.mapred.TextOutputFormat.class, Text.class,
>            IntWritable.class);`
>
> and my reduce class became this
>
> public static class Reduce extends MapReduceBase implements
>        Reducer<Text, IntWritable, Text, IntWritable> {
>    MultipleOutputs mos = null;
>
>    public void configure(JobConf job) {
>        mos = new MultipleOutputs(job);
>    }
>
>    public void reduce(Text key, Iterator<IntWritable> values,
>            OutputCollector<Text, IntWritable> output, Reporter reporter)
>            throws IOException {
>        int sum = 0;
>        while (values.hasNext()) {
>            sum += values.next().get();
>        }
>        if (sum % 2 == 0) {
>            mos.getCollector("even", reporter).collect(key, new
> IntWritable(sum));
>        }else {
>            mos.getCollector("odd", reporter).collect(key, new
> IntWritable(sum));
>        }
>        //output.collect(key, new IntWritable(sum));
>    }
>    @Override
>    public void close() throws IOException {
>        // TODO Auto-generated method stub
>    mos.close();
>    }
> }
>
> Things worked , but i get LOT of files, (one odd and one even for every
> map-reduce)
>
> Question is : How can i have just 2 output files (odd & even) so that every
> odd output of every reduce gets written into that odd file, and same for
> even.
>
> --
> View this message in context:
> http://old.nabble.com/MultipleOutputFormat-tp29447204p29447204.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>

Re: MultipleOutputFormat

Posted by Amareshwari Sri Ramadasu <am...@yahoo-inc.com>.
Try with number of reducers = 1 .

-Amareshwari

On 8/16/10 12:25 PM, "rajgopalv" <ra...@gmail.com> wrote:



 0  down vote  favorite


Hi. I'm a newbie in Hadoop. I'm trying out the Wordcount program.

Now to try out multiple output files, i use MultipleOutputFormat. this link
helped me in doing it.
http://hadoop.apache.org/common/docs/r0.19.0/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html

in my driver class i had

    MultipleOutputs.addNamedOutput(conf, "even",
            org.apache.hadoop.mapred.TextOutputFormat.class, Text.class,
            IntWritable.class);

    MultipleOutputs.addNamedOutput(conf, "odd",
            org.apache.hadoop.mapred.TextOutputFormat.class, Text.class,
            IntWritable.class);`

and my reduce class became this

public static class Reduce extends MapReduceBase implements
        Reducer<Text, IntWritable, Text, IntWritable> {
    MultipleOutputs mos = null;

    public void configure(JobConf job) {
        mos = new MultipleOutputs(job);
    }

    public void reduce(Text key, Iterator<IntWritable> values,
            OutputCollector<Text, IntWritable> output, Reporter reporter)
            throws IOException {
        int sum = 0;
        while (values.hasNext()) {
            sum += values.next().get();
        }
        if (sum % 2 == 0) {
            mos.getCollector("even", reporter).collect(key, new
IntWritable(sum));
        }else {
            mos.getCollector("odd", reporter).collect(key, new
IntWritable(sum));
        }
        //output.collect(key, new IntWritable(sum));
    }
    @Override
    public void close() throws IOException {
        // TODO Auto-generated method stub
    mos.close();
    }
}

Things worked , but i get LOT of files, (one odd and one even for every
map-reduce)

Question is : How can i have just 2 output files (odd & even) so that every
odd output of every reduce gets written into that odd file, and same for
even.

--
View this message in context: http://old.nabble.com/MultipleOutputFormat-tp29447204p29447204.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.