You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Rasit OZDAS <ra...@gmail.com> on 2009/03/09 14:41:38 UTC

MultipleOutputFormat with sorting functionality

Hi, all!

I'm using multiple output format to write out 4 different files, each one
has the same type.
But it seems that outputs aren't being sorted.

Should they be sorted? Or isn't it implemented for multiple output format?

Here is some code:

// in main function
MultipleOutputs.addMultiNamedOutput(conf, "text", TextOutputFormat.class,
DoubleWritable.class, Text.class);

// in Reducer.configure()
mos = new MultipleOutputs(conf);

// in Reducer.reduce()
if (keystr.equalsIgnoreCase("BreachFace"))
                mos.getCollector("text", "BreachFace", reporter).collect(new
Text(key), dbl);
            else if (keystr.equalsIgnoreCase("Ejector"))
                mos.getCollector("text", "Ejector", reporter).collect(new
Text(key), dbl);
            else if (keystr.equalsIgnoreCase("FiringPin"))
                mos.getCollector("text", "FiringPin", reporter).collect(new
Text(key), dbl);
            else if (keystr.equalsIgnoreCase("WeightedSum"))
                mos.getCollector("text", "WeightedSum",
reporter).collect(new Text(key), dbl);
            else
                mos.getCollector("text", "Diger", reporter).collect(new
Text(key), dbl);


-- 
M. Raşit ÖZDAŞ

Re: MultipleOutputFormat with sorting functionality

Posted by Nick Cen <ce...@gmail.com>.
When you are using the default sorting, it will use string comparation, i
think this may not be what you intend. You can call the JobConf's
setKeyFieldComparatorOptions("-n") to do a number comparation.


2009/3/9 Rasit OZDAS <ra...@gmail.com>

> Thanks, Nick!
>
> It seems that sorting takes place in map, not in reduce :)
> I've added double values in front of every map key, the problem is solved
> now.
> I know it's more like a workaround rather than a real solution,
> and I don't know if it has performance problems.. Have an idea? I'm not
> familiar with what hadoop does exactly when I do this.
>
> Rasit
>
> 2009/3/9 Nick Cen <ce...@gmail.com>
>
> > I think the sort is not relatived to the output format.
> >
> > I previously have try this class ,but has a little different compared to
> > your code. I extend the MultipleTextOutputFormat class and override
> > its generateFileNameForKeyValue()
> > method, and everything seems working fine.
> >
> > 2009/3/9 Rasit OZDAS <ra...@gmail.com>
> >
> > > Hi, all!
> > >
> > > I'm using multiple output format to write out 4 different files, each
> one
> > > has the same type.
> > > But it seems that outputs aren't being sorted.
> > >
> > > Should they be sorted? Or isn't it implemented for multiple output
> > format?
> > >
> > > Here is some code:
> > >
> > > // in main function
> > > MultipleOutputs.addMultiNamedOutput(conf, "text",
> TextOutputFormat.class,
> > > DoubleWritable.class, Text.class);
> > >
> > > // in Reducer.configure()
> > > mos = new MultipleOutputs(conf);
> > >
> > > // in Reducer.reduce()
> > > if (keystr.equalsIgnoreCase("BreachFace"))
> > >                mos.getCollector("text", "BreachFace",
> > reporter).collect(new
> > > Text(key), dbl);
> > >            else if (keystr.equalsIgnoreCase("Ejector"))
> > >                mos.getCollector("text", "Ejector",
> reporter).collect(new
> > > Text(key), dbl);
> > >            else if (keystr.equalsIgnoreCase("FiringPin"))
> > >                mos.getCollector("text", "FiringPin",
> > reporter).collect(new
> > > Text(key), dbl);
> > >            else if (keystr.equalsIgnoreCase("WeightedSum"))
> > >                mos.getCollector("text", "WeightedSum",
> > > reporter).collect(new Text(key), dbl);
> > >            else
> > >                mos.getCollector("text", "Diger", reporter).collect(new
> > > Text(key), dbl);
> > >
> > >
> > > --
> > > M. Raşit ÖZDAŞ
> > >
> >
> >
> >
> > --
> > http://daily.appspot.com/food/
> >
>
>
>
> --
> M. Raşit ÖZDAŞ
>



-- 
http://daily.appspot.com/food/

Re: MultipleOutputFormat with sorting functionality

Posted by Rasit OZDAS <ra...@gmail.com>.
Thanks, Nick!

It seems that sorting takes place in map, not in reduce :)
I've added double values in front of every map key, the problem is solved
now.
I know it's more like a workaround rather than a real solution,
and I don't know if it has performance problems.. Have an idea? I'm not
familiar with what hadoop does exactly when I do this.

Rasit

2009/3/9 Nick Cen <ce...@gmail.com>

> I think the sort is not relatived to the output format.
>
> I previously have try this class ,but has a little different compared to
> your code. I extend the MultipleTextOutputFormat class and override
> its generateFileNameForKeyValue()
> method, and everything seems working fine.
>
> 2009/3/9 Rasit OZDAS <ra...@gmail.com>
>
> > Hi, all!
> >
> > I'm using multiple output format to write out 4 different files, each one
> > has the same type.
> > But it seems that outputs aren't being sorted.
> >
> > Should they be sorted? Or isn't it implemented for multiple output
> format?
> >
> > Here is some code:
> >
> > // in main function
> > MultipleOutputs.addMultiNamedOutput(conf, "text", TextOutputFormat.class,
> > DoubleWritable.class, Text.class);
> >
> > // in Reducer.configure()
> > mos = new MultipleOutputs(conf);
> >
> > // in Reducer.reduce()
> > if (keystr.equalsIgnoreCase("BreachFace"))
> >                mos.getCollector("text", "BreachFace",
> reporter).collect(new
> > Text(key), dbl);
> >            else if (keystr.equalsIgnoreCase("Ejector"))
> >                mos.getCollector("text", "Ejector", reporter).collect(new
> > Text(key), dbl);
> >            else if (keystr.equalsIgnoreCase("FiringPin"))
> >                mos.getCollector("text", "FiringPin",
> reporter).collect(new
> > Text(key), dbl);
> >            else if (keystr.equalsIgnoreCase("WeightedSum"))
> >                mos.getCollector("text", "WeightedSum",
> > reporter).collect(new Text(key), dbl);
> >            else
> >                mos.getCollector("text", "Diger", reporter).collect(new
> > Text(key), dbl);
> >
> >
> > --
> > M. Raşit ÖZDAŞ
> >
>
>
>
> --
> http://daily.appspot.com/food/
>



-- 
M. Raşit ÖZDAŞ

Re: MultipleOutputFormat with sorting functionality

Posted by Nick Cen <ce...@gmail.com>.
I think the sort is not relatived to the output format.

I previously have try this class ,but has a little different compared to
your code. I extend the MultipleTextOutputFormat class and override
its generateFileNameForKeyValue()
method, and everything seems working fine.

2009/3/9 Rasit OZDAS <ra...@gmail.com>

> Hi, all!
>
> I'm using multiple output format to write out 4 different files, each one
> has the same type.
> But it seems that outputs aren't being sorted.
>
> Should they be sorted? Or isn't it implemented for multiple output format?
>
> Here is some code:
>
> // in main function
> MultipleOutputs.addMultiNamedOutput(conf, "text", TextOutputFormat.class,
> DoubleWritable.class, Text.class);
>
> // in Reducer.configure()
> mos = new MultipleOutputs(conf);
>
> // in Reducer.reduce()
> if (keystr.equalsIgnoreCase("BreachFace"))
>                mos.getCollector("text", "BreachFace", reporter).collect(new
> Text(key), dbl);
>            else if (keystr.equalsIgnoreCase("Ejector"))
>                mos.getCollector("text", "Ejector", reporter).collect(new
> Text(key), dbl);
>            else if (keystr.equalsIgnoreCase("FiringPin"))
>                mos.getCollector("text", "FiringPin", reporter).collect(new
> Text(key), dbl);
>            else if (keystr.equalsIgnoreCase("WeightedSum"))
>                mos.getCollector("text", "WeightedSum",
> reporter).collect(new Text(key), dbl);
>            else
>                mos.getCollector("text", "Diger", reporter).collect(new
> Text(key), dbl);
>
>
> --
> M. Raşit ÖZDAŞ
>



-- 
http://daily.appspot.com/food/