You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Shrinivas Joshi <js...@gmail.com> on 2011/07/19 22:26:28 UTC

IO pipeline optimizations

This blog post on YDN website
http://developer.yahoo.com/blogs/hadoop/posts/2009/08/the_anatomy_of_hadoop_io_pipel/has
detailed discussion on different steps involved in Hadoop IO
operations
and opportunities for optimizations. Could someone please comment on current
state of these potential optimizations? Are some of these expected to be
addressed in "next gen MR" release?

Thanks,
-Shrinivas

Re: IO pipeline optimizations

Posted by Todd Lipcon <to...@cloudera.com>.

Hi Shrinivas,

There has been some work going on recently around optimizing checksums. See
HDFS-2080 for example. This will help both the write and read code, though
we've focused more on read.

There have also been a lot of improvements around random read access - for
example HDFS-941 which improves random read by more than 2x.

I'm planning on writing a blog post in the next couple of weeks about some
of this work.

-Todd

On Tue, Jul 19, 2011 at 1:26 PM, Shrinivas Joshi <js...@gmail.com>wrote:

> This blog post on YDN website
>
> http://developer.yahoo.com/blogs/hadoop/posts/2009/08/the_anatomy_of_hadoop_io_pipel/has
> detailed discussion on different steps involved in Hadoop IO
> operations
> and opportunities for optimizations. Could someone please comment on
> current
> state of these potential optimizations? Are some of these expected to be
> addressed in "next gen MR" release?
>
> Thanks,
> -Shrinivas
>

-- 
Todd Lipcon
Software Engineer, Cloudera