You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Songting Chen <ke...@yahoo.com> on 2009/02/05 07:55:45 UTC
Hadoop IO performance, prefetch etc
Hi,
Most of our map jobs are IO bound. However, for the same node, the IO throughput during the map phase is only 20% of its real sequential IO capability (we tested the sequential IO throughput by iozone)
I think the reason is that while each map has a sequential IO request, since there are many maps concurrently running on the same node, this causes quite expensive IO switches.
Prefetch may be a good solution here especially a map job is supposed to scan through an entire block and no more no less. Any idea how to enable it?
Thanks,
-Songting