You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Songting Chen <ke...@yahoo.com> on 2009/02/05 07:55:45 UTC

Hadoop IO performance, prefetch etc

Hi,
   Most of our map jobs are IO bound. However, for the same node, the IO throughput during the map phase is only 20% of its real sequential IO capability (we tested the sequential IO throughput by iozone) 
   I think the reason is that while each map has a sequential IO request, since there are many maps concurrently running on the same node, this causes quite expensive IO switches.
   Prefetch may be a good solution here especially a map job is supposed to scan through an entire block and no more no less. Any idea how to enable it?

Thanks,
-Songting