You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by hadoop <ma...@gmail.com> on 2011/04/24 06:18:28 UTC
Sequence.Sorter Performance
Hi guys,
I'm trying to sort a 2.5 GB sequence file in one mapper using its implemented Sort function, but it's taking long that the map is killed for not reporting .
I would increase the default time to get reports from the mapper, but I'll do this only if sorting using SequenceFile.sorter is known to be optimal ... Any one knows ?
Thanks,
Mark
Re: Sequence.Sorter Performance
Posted by Mark question <ma...@gmail.com>.
Thanks Owen !
Mark
On Mon, Apr 25, 2011 at 11:43 AM, Owen O'Malley <om...@apache.org> wrote:
> The SequenceFile sorter is ok. It used to be the sort used in the shuffle.
> *grin*
>
> Make sure to set io.sort.factor and io.sort.mb to appropriate values for
> your hardware. I'd usually use io.sort.factor as 25 * drives and io.sort.mb
> is the amount of memory you can allocate to the sorting.
>
> -- Owen
>
Re: Sequence.Sorter Performance
Posted by Owen O'Malley <om...@apache.org>.
The SequenceFile sorter is ok. It used to be the sort used in the shuffle.
*grin*
Make sure to set io.sort.factor and io.sort.mb to appropriate values for
your hardware. I'd usually use io.sort.factor as 25 * drives and io.sort.mb
is the amount of memory you can allocate to the sorting.
-- Owen