You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by hadoop <ma...@gmail.com> on 2011/04/24 06:18:28 UTC

Sequence.Sorter Performance

Hi guys,

	I'm trying to sort a 2.5 GB sequence file in one mapper using its implemented Sort function, but it's taking long that the map is killed for not reporting .

I would increase the default time to get reports from the mapper, but I'll do this only if sorting using SequenceFile.sorter is known to be optimal ... Any one knows ?

	Thanks,

	Mark

Re: Sequence.Sorter Performance

Posted by Mark question <ma...@gmail.com>.
Thanks Owen !
Mark

On Mon, Apr 25, 2011 at 11:43 AM, Owen O'Malley <om...@apache.org> wrote:

> The SequenceFile sorter is ok. It used to be the sort used in the shuffle.
> *grin*
>
> Make sure to set io.sort.factor and io.sort.mb to appropriate values for
> your hardware. I'd usually use io.sort.factor as 25 * drives and io.sort.mb
> is the amount of memory you can allocate to the sorting.
>
> -- Owen
>

Re: Sequence.Sorter Performance

Posted by Owen O'Malley <om...@apache.org>.
The SequenceFile sorter is ok. It used to be the sort used in the shuffle.
*grin*

Make sure to set io.sort.factor and io.sort.mb to appropriate values for
your hardware. I'd usually use io.sort.factor as 25 * drives and io.sort.mb
is the amount of memory you can allocate to the sorting.

-- Owen