You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Mori Bellamy <mb...@apple.com> on 2008/07/22 21:22:12 UTC

question on HDFS

hey all,
let us say that i have 3 boxes, A B and C. initially, map tasks are  
running on all 3. after most of the mapping is done, C is 32% done  
with reduce (so still copying stuff to its local disk) and A is stuck  
on a particularly long map-task (it got an ill-behaved record from the  
input splits). does A's intermediate map output data go directly to  
C's local disk, or is it still written to HDFS and therefore  
distributed amongst all the machines? also, will A's disk be a favored  
target for A's output bytes, or is the target volume independent of  
the corresponding mapper?

Thanks! The answer to this question should clear a lot of things up  
for me.

Re: question on HDFS

Posted by Arun C Murthy <ac...@yahoo-inc.com>.
Mori,

On Jul 22, 2008, at 12:22 PM, Mori Bellamy wrote:

> hey all,
> let us say that i have 3 boxes, A B and C. initially, map tasks are  
> running on all 3. after most of the mapping is done, C is 32% done  
> with reduce (so still copying stuff to its local disk) and A is  
> stuck on a particularly long map-task (it got an ill-behaved record  
> from the input splits). does A's intermediate map output data go  
> directly to C's local disk, or is it still written to HDFS and  
> therefore distributed amongst all the machines? also, will A's disk  
> be a favored target for A's output bytes, or is the target volume  
> independent of the corresponding mapper?
>

Intermediate outputs (i.e. map outputs) are written to the local disk  
and not to HDFS. The reduce fetches the intermediate outputs via HTTP.

hth,
Arun

> Thanks! The answer to this question should clear a lot of things up  
> for me.