You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by John Lilley <jo...@redpoint.net> on 2013/05/17 00:08:45 UTC

Question about writing HDFS files

I seem to recall reading that when a MapReduce task writes a file, the blocks of the file are always written to local disk, and replicated to other nodes.  If this is true, is this also true for non-MR applications writing to HDFS from Hadoop worker nodes?  What about clients outside of the cluster doing a file load?
Thanks
John


Re: Question about writing HDFS files

Posted by "J. Rottinghuis" <jr...@gmail.com>.
Yes.

Joep


On Fri, May 17, 2013 at 6:38 AM, John Lilley <jo...@redpoint.net>wrote:

> Right, sorry for the ambiguity, I was talking about HDFS writes only.
>
> So my application doesn't need to do anything to signal that it is writing
> from inside vs. outside of the Hadoop cluster, it figures that out from IP
> or hostname?
>
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Thursday, May 16, 2013 11:12 PM
> To: <us...@hadoop.apache.org>
> Subject: Re: Question about writing HDFS files
>
> Thanks for the clarification Rahul. In that case, then the reading is
> correct (and that a HDFS client behaves the same, in and out of MR - its
> not really related to MR at all).
>
> A "client outside" would write to a random set of datanode, across at
> least two racks for 3 replicas if rack awareness is turned on.
>
> On Fri, May 17, 2013 at 8:17 AM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
> > Hi Harsh,
> >
> > I think what John meant by writing to local disk is writing to the
> > same data node first which has initiated the write call.
> >
> > John can further clarify.
> >
> >
> > On Fri, May 17, 2013 at 4:23 AM, Harsh J <ha...@cloudera.com> wrote:
> >>
> >> That is not true. HDFS writes are not staged to a local disk first
> >> before being written onto the DataNodes. The old architecture docs
> >> seem to suggest that the writes get staged to a local disk but thats
> >> not true anymore, see https://issues.apache.org/jira/browse/HDFS-1454.
> >>
> >> Also worth noting that a HDFS client behaves the same way in almost
> >> all contexts, whether its invoked from an MR framework or directly
> >> from shell.
> >>
> >> On Fri, May 17, 2013 at 3:38 AM, John Lilley
> >> <jo...@redpoint.net>
> >> wrote:
> >> > I seem to recall reading that when a MapReduce task writes a file,
> >> > the blocks of the file are always written to local disk, and
> >> > replicated to other nodes.  If this is true, is this also true for
> >> > non-MR applications writing to HDFS from Hadoop worker nodes?  What
> >> > about clients outside of the cluster doing a file load?
> >> >
> >> > Thanks
> >> >
> >> > John
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
>
>
>
> --
> Harsh J
>

Re: Question about writing HDFS files

Posted by "J. Rottinghuis" <jr...@gmail.com>.
Yes.

Joep


On Fri, May 17, 2013 at 6:38 AM, John Lilley <jo...@redpoint.net>wrote:

> Right, sorry for the ambiguity, I was talking about HDFS writes only.
>
> So my application doesn't need to do anything to signal that it is writing
> from inside vs. outside of the Hadoop cluster, it figures that out from IP
> or hostname?
>
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Thursday, May 16, 2013 11:12 PM
> To: <us...@hadoop.apache.org>
> Subject: Re: Question about writing HDFS files
>
> Thanks for the clarification Rahul. In that case, then the reading is
> correct (and that a HDFS client behaves the same, in and out of MR - its
> not really related to MR at all).
>
> A "client outside" would write to a random set of datanode, across at
> least two racks for 3 replicas if rack awareness is turned on.
>
> On Fri, May 17, 2013 at 8:17 AM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
> > Hi Harsh,
> >
> > I think what John meant by writing to local disk is writing to the
> > same data node first which has initiated the write call.
> >
> > John can further clarify.
> >
> >
> > On Fri, May 17, 2013 at 4:23 AM, Harsh J <ha...@cloudera.com> wrote:
> >>
> >> That is not true. HDFS writes are not staged to a local disk first
> >> before being written onto the DataNodes. The old architecture docs
> >> seem to suggest that the writes get staged to a local disk but thats
> >> not true anymore, see https://issues.apache.org/jira/browse/HDFS-1454.
> >>
> >> Also worth noting that a HDFS client behaves the same way in almost
> >> all contexts, whether its invoked from an MR framework or directly
> >> from shell.
> >>
> >> On Fri, May 17, 2013 at 3:38 AM, John Lilley
> >> <jo...@redpoint.net>
> >> wrote:
> >> > I seem to recall reading that when a MapReduce task writes a file,
> >> > the blocks of the file are always written to local disk, and
> >> > replicated to other nodes.  If this is true, is this also true for
> >> > non-MR applications writing to HDFS from Hadoop worker nodes?  What
> >> > about clients outside of the cluster doing a file load?
> >> >
> >> > Thanks
> >> >
> >> > John
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
>
>
>
> --
> Harsh J
>

Re: Question about writing HDFS files

Posted by "J. Rottinghuis" <jr...@gmail.com>.
Yes.

Joep


On Fri, May 17, 2013 at 6:38 AM, John Lilley <jo...@redpoint.net>wrote:

> Right, sorry for the ambiguity, I was talking about HDFS writes only.
>
> So my application doesn't need to do anything to signal that it is writing
> from inside vs. outside of the Hadoop cluster, it figures that out from IP
> or hostname?
>
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Thursday, May 16, 2013 11:12 PM
> To: <us...@hadoop.apache.org>
> Subject: Re: Question about writing HDFS files
>
> Thanks for the clarification Rahul. In that case, then the reading is
> correct (and that a HDFS client behaves the same, in and out of MR - its
> not really related to MR at all).
>
> A "client outside" would write to a random set of datanode, across at
> least two racks for 3 replicas if rack awareness is turned on.
>
> On Fri, May 17, 2013 at 8:17 AM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
> > Hi Harsh,
> >
> > I think what John meant by writing to local disk is writing to the
> > same data node first which has initiated the write call.
> >
> > John can further clarify.
> >
> >
> > On Fri, May 17, 2013 at 4:23 AM, Harsh J <ha...@cloudera.com> wrote:
> >>
> >> That is not true. HDFS writes are not staged to a local disk first
> >> before being written onto the DataNodes. The old architecture docs
> >> seem to suggest that the writes get staged to a local disk but thats
> >> not true anymore, see https://issues.apache.org/jira/browse/HDFS-1454.
> >>
> >> Also worth noting that a HDFS client behaves the same way in almost
> >> all contexts, whether its invoked from an MR framework or directly
> >> from shell.
> >>
> >> On Fri, May 17, 2013 at 3:38 AM, John Lilley
> >> <jo...@redpoint.net>
> >> wrote:
> >> > I seem to recall reading that when a MapReduce task writes a file,
> >> > the blocks of the file are always written to local disk, and
> >> > replicated to other nodes.  If this is true, is this also true for
> >> > non-MR applications writing to HDFS from Hadoop worker nodes?  What
> >> > about clients outside of the cluster doing a file load?
> >> >
> >> > Thanks
> >> >
> >> > John
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
>
>
>
> --
> Harsh J
>

Re: Question about writing HDFS files

Posted by "J. Rottinghuis" <jr...@gmail.com>.
Yes.

Joep


On Fri, May 17, 2013 at 6:38 AM, John Lilley <jo...@redpoint.net>wrote:

> Right, sorry for the ambiguity, I was talking about HDFS writes only.
>
> So my application doesn't need to do anything to signal that it is writing
> from inside vs. outside of the Hadoop cluster, it figures that out from IP
> or hostname?
>
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Thursday, May 16, 2013 11:12 PM
> To: <us...@hadoop.apache.org>
> Subject: Re: Question about writing HDFS files
>
> Thanks for the clarification Rahul. In that case, then the reading is
> correct (and that a HDFS client behaves the same, in and out of MR - its
> not really related to MR at all).
>
> A "client outside" would write to a random set of datanode, across at
> least two racks for 3 replicas if rack awareness is turned on.
>
> On Fri, May 17, 2013 at 8:17 AM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
> > Hi Harsh,
> >
> > I think what John meant by writing to local disk is writing to the
> > same data node first which has initiated the write call.
> >
> > John can further clarify.
> >
> >
> > On Fri, May 17, 2013 at 4:23 AM, Harsh J <ha...@cloudera.com> wrote:
> >>
> >> That is not true. HDFS writes are not staged to a local disk first
> >> before being written onto the DataNodes. The old architecture docs
> >> seem to suggest that the writes get staged to a local disk but thats
> >> not true anymore, see https://issues.apache.org/jira/browse/HDFS-1454.
> >>
> >> Also worth noting that a HDFS client behaves the same way in almost
> >> all contexts, whether its invoked from an MR framework or directly
> >> from shell.
> >>
> >> On Fri, May 17, 2013 at 3:38 AM, John Lilley
> >> <jo...@redpoint.net>
> >> wrote:
> >> > I seem to recall reading that when a MapReduce task writes a file,
> >> > the blocks of the file are always written to local disk, and
> >> > replicated to other nodes.  If this is true, is this also true for
> >> > non-MR applications writing to HDFS from Hadoop worker nodes?  What
> >> > about clients outside of the cluster doing a file load?
> >> >
> >> > Thanks
> >> >
> >> > John
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
>
>
>
> --
> Harsh J
>

RE: Question about writing HDFS files

Posted by John Lilley <jo...@redpoint.net>.
Right, sorry for the ambiguity, I was talking about HDFS writes only.

So my application doesn't need to do anything to signal that it is writing from inside vs. outside of the Hadoop cluster, it figures that out from IP or hostname?


-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Thursday, May 16, 2013 11:12 PM
To: <us...@hadoop.apache.org>
Subject: Re: Question about writing HDFS files

Thanks for the clarification Rahul. In that case, then the reading is correct (and that a HDFS client behaves the same, in and out of MR - its not really related to MR at all).

A "client outside" would write to a random set of datanode, across at least two racks for 3 replicas if rack awareness is turned on.

On Fri, May 17, 2013 at 8:17 AM, Rahul Bhattacharjee <ra...@gmail.com> wrote:
> Hi Harsh,
>
> I think what John meant by writing to local disk is writing to the 
> same data node first which has initiated the write call.
>
> John can further clarify.
>
>
> On Fri, May 17, 2013 at 4:23 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>> That is not true. HDFS writes are not staged to a local disk first 
>> before being written onto the DataNodes. The old architecture docs 
>> seem to suggest that the writes get staged to a local disk but thats 
>> not true anymore, see https://issues.apache.org/jira/browse/HDFS-1454.
>>
>> Also worth noting that a HDFS client behaves the same way in almost 
>> all contexts, whether its invoked from an MR framework or directly 
>> from shell.
>>
>> On Fri, May 17, 2013 at 3:38 AM, John Lilley 
>> <jo...@redpoint.net>
>> wrote:
>> > I seem to recall reading that when a MapReduce task writes a file, 
>> > the blocks of the file are always written to local disk, and 
>> > replicated to other nodes.  If this is true, is this also true for 
>> > non-MR applications writing to HDFS from Hadoop worker nodes?  What 
>> > about clients outside of the cluster doing a file load?
>> >
>> > Thanks
>> >
>> > John
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>
>



--
Harsh J

RE: Question about writing HDFS files

Posted by John Lilley <jo...@redpoint.net>.
Right, sorry for the ambiguity, I was talking about HDFS writes only.

So my application doesn't need to do anything to signal that it is writing from inside vs. outside of the Hadoop cluster, it figures that out from IP or hostname?


-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Thursday, May 16, 2013 11:12 PM
To: <us...@hadoop.apache.org>
Subject: Re: Question about writing HDFS files

Thanks for the clarification Rahul. In that case, then the reading is correct (and that a HDFS client behaves the same, in and out of MR - its not really related to MR at all).

A "client outside" would write to a random set of datanode, across at least two racks for 3 replicas if rack awareness is turned on.

On Fri, May 17, 2013 at 8:17 AM, Rahul Bhattacharjee <ra...@gmail.com> wrote:
> Hi Harsh,
>
> I think what John meant by writing to local disk is writing to the 
> same data node first which has initiated the write call.
>
> John can further clarify.
>
>
> On Fri, May 17, 2013 at 4:23 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>> That is not true. HDFS writes are not staged to a local disk first 
>> before being written onto the DataNodes. The old architecture docs 
>> seem to suggest that the writes get staged to a local disk but thats 
>> not true anymore, see https://issues.apache.org/jira/browse/HDFS-1454.
>>
>> Also worth noting that a HDFS client behaves the same way in almost 
>> all contexts, whether its invoked from an MR framework or directly 
>> from shell.
>>
>> On Fri, May 17, 2013 at 3:38 AM, John Lilley 
>> <jo...@redpoint.net>
>> wrote:
>> > I seem to recall reading that when a MapReduce task writes a file, 
>> > the blocks of the file are always written to local disk, and 
>> > replicated to other nodes.  If this is true, is this also true for 
>> > non-MR applications writing to HDFS from Hadoop worker nodes?  What 
>> > about clients outside of the cluster doing a file load?
>> >
>> > Thanks
>> >
>> > John
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>
>



--
Harsh J

RE: Question about writing HDFS files

Posted by John Lilley <jo...@redpoint.net>.
Right, sorry for the ambiguity, I was talking about HDFS writes only.

So my application doesn't need to do anything to signal that it is writing from inside vs. outside of the Hadoop cluster, it figures that out from IP or hostname?


-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Thursday, May 16, 2013 11:12 PM
To: <us...@hadoop.apache.org>
Subject: Re: Question about writing HDFS files

Thanks for the clarification Rahul. In that case, then the reading is correct (and that a HDFS client behaves the same, in and out of MR - its not really related to MR at all).

A "client outside" would write to a random set of datanode, across at least two racks for 3 replicas if rack awareness is turned on.

On Fri, May 17, 2013 at 8:17 AM, Rahul Bhattacharjee <ra...@gmail.com> wrote:
> Hi Harsh,
>
> I think what John meant by writing to local disk is writing to the 
> same data node first which has initiated the write call.
>
> John can further clarify.
>
>
> On Fri, May 17, 2013 at 4:23 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>> That is not true. HDFS writes are not staged to a local disk first 
>> before being written onto the DataNodes. The old architecture docs 
>> seem to suggest that the writes get staged to a local disk but thats 
>> not true anymore, see https://issues.apache.org/jira/browse/HDFS-1454.
>>
>> Also worth noting that a HDFS client behaves the same way in almost 
>> all contexts, whether its invoked from an MR framework or directly 
>> from shell.
>>
>> On Fri, May 17, 2013 at 3:38 AM, John Lilley 
>> <jo...@redpoint.net>
>> wrote:
>> > I seem to recall reading that when a MapReduce task writes a file, 
>> > the blocks of the file are always written to local disk, and 
>> > replicated to other nodes.  If this is true, is this also true for 
>> > non-MR applications writing to HDFS from Hadoop worker nodes?  What 
>> > about clients outside of the cluster doing a file load?
>> >
>> > Thanks
>> >
>> > John
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>
>



--
Harsh J

RE: Question about writing HDFS files

Posted by John Lilley <jo...@redpoint.net>.
Right, sorry for the ambiguity, I was talking about HDFS writes only.

So my application doesn't need to do anything to signal that it is writing from inside vs. outside of the Hadoop cluster, it figures that out from IP or hostname?


-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Thursday, May 16, 2013 11:12 PM
To: <us...@hadoop.apache.org>
Subject: Re: Question about writing HDFS files

Thanks for the clarification Rahul. In that case, then the reading is correct (and that a HDFS client behaves the same, in and out of MR - its not really related to MR at all).

A "client outside" would write to a random set of datanode, across at least two racks for 3 replicas if rack awareness is turned on.

On Fri, May 17, 2013 at 8:17 AM, Rahul Bhattacharjee <ra...@gmail.com> wrote:
> Hi Harsh,
>
> I think what John meant by writing to local disk is writing to the 
> same data node first which has initiated the write call.
>
> John can further clarify.
>
>
> On Fri, May 17, 2013 at 4:23 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>> That is not true. HDFS writes are not staged to a local disk first 
>> before being written onto the DataNodes. The old architecture docs 
>> seem to suggest that the writes get staged to a local disk but thats 
>> not true anymore, see https://issues.apache.org/jira/browse/HDFS-1454.
>>
>> Also worth noting that a HDFS client behaves the same way in almost 
>> all contexts, whether its invoked from an MR framework or directly 
>> from shell.
>>
>> On Fri, May 17, 2013 at 3:38 AM, John Lilley 
>> <jo...@redpoint.net>
>> wrote:
>> > I seem to recall reading that when a MapReduce task writes a file, 
>> > the blocks of the file are always written to local disk, and 
>> > replicated to other nodes.  If this is true, is this also true for 
>> > non-MR applications writing to HDFS from Hadoop worker nodes?  What 
>> > about clients outside of the cluster doing a file load?
>> >
>> > Thanks
>> >
>> > John
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>
>



--
Harsh J

Re: Question about writing HDFS files

Posted by Harsh J <ha...@cloudera.com>.
Thanks for the clarification Rahul. In that case, then the reading is
correct (and that a HDFS client behaves the same, in and out of MR -
its not really related to MR at all).

A "client outside" would write to a random set of datanode, across at
least two racks for 3 replicas if rack awareness is turned on.

On Fri, May 17, 2013 at 8:17 AM, Rahul Bhattacharjee
<ra...@gmail.com> wrote:
> Hi Harsh,
>
> I think what John meant by writing to local disk is writing to the same data
> node first which has initiated the write call.
>
> John can further clarify.
>
>
> On Fri, May 17, 2013 at 4:23 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>> That is not true. HDFS writes are not staged to a local disk first
>> before being written onto the DataNodes. The old architecture docs
>> seem to suggest that the writes get staged to a local disk but thats
>> not true anymore, see https://issues.apache.org/jira/browse/HDFS-1454.
>>
>> Also worth noting that a HDFS client behaves the same way in almost
>> all contexts, whether its invoked from an MR framework or directly
>> from shell.
>>
>> On Fri, May 17, 2013 at 3:38 AM, John Lilley <jo...@redpoint.net>
>> wrote:
>> > I seem to recall reading that when a MapReduce task writes a file, the
>> > blocks of the file are always written to local disk, and replicated to
>> > other
>> > nodes.  If this is true, is this also true for non-MR applications
>> > writing
>> > to HDFS from Hadoop worker nodes?  What about clients outside of the
>> > cluster
>> > doing a file load?
>> >
>> > Thanks
>> >
>> > John
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Re: Question about writing HDFS files

Posted by Harsh J <ha...@cloudera.com>.
Thanks for the clarification Rahul. In that case, then the reading is
correct (and that a HDFS client behaves the same, in and out of MR -
its not really related to MR at all).

A "client outside" would write to a random set of datanode, across at
least two racks for 3 replicas if rack awareness is turned on.

On Fri, May 17, 2013 at 8:17 AM, Rahul Bhattacharjee
<ra...@gmail.com> wrote:
> Hi Harsh,
>
> I think what John meant by writing to local disk is writing to the same data
> node first which has initiated the write call.
>
> John can further clarify.
>
>
> On Fri, May 17, 2013 at 4:23 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>> That is not true. HDFS writes are not staged to a local disk first
>> before being written onto the DataNodes. The old architecture docs
>> seem to suggest that the writes get staged to a local disk but thats
>> not true anymore, see https://issues.apache.org/jira/browse/HDFS-1454.
>>
>> Also worth noting that a HDFS client behaves the same way in almost
>> all contexts, whether its invoked from an MR framework or directly
>> from shell.
>>
>> On Fri, May 17, 2013 at 3:38 AM, John Lilley <jo...@redpoint.net>
>> wrote:
>> > I seem to recall reading that when a MapReduce task writes a file, the
>> > blocks of the file are always written to local disk, and replicated to
>> > other
>> > nodes.  If this is true, is this also true for non-MR applications
>> > writing
>> > to HDFS from Hadoop worker nodes?  What about clients outside of the
>> > cluster
>> > doing a file load?
>> >
>> > Thanks
>> >
>> > John
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Re: Question about writing HDFS files

Posted by Harsh J <ha...@cloudera.com>.
Thanks for the clarification Rahul. In that case, then the reading is
correct (and that a HDFS client behaves the same, in and out of MR -
its not really related to MR at all).

A "client outside" would write to a random set of datanode, across at
least two racks for 3 replicas if rack awareness is turned on.

On Fri, May 17, 2013 at 8:17 AM, Rahul Bhattacharjee
<ra...@gmail.com> wrote:
> Hi Harsh,
>
> I think what John meant by writing to local disk is writing to the same data
> node first which has initiated the write call.
>
> John can further clarify.
>
>
> On Fri, May 17, 2013 at 4:23 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>> That is not true. HDFS writes are not staged to a local disk first
>> before being written onto the DataNodes. The old architecture docs
>> seem to suggest that the writes get staged to a local disk but thats
>> not true anymore, see https://issues.apache.org/jira/browse/HDFS-1454.
>>
>> Also worth noting that a HDFS client behaves the same way in almost
>> all contexts, whether its invoked from an MR framework or directly
>> from shell.
>>
>> On Fri, May 17, 2013 at 3:38 AM, John Lilley <jo...@redpoint.net>
>> wrote:
>> > I seem to recall reading that when a MapReduce task writes a file, the
>> > blocks of the file are always written to local disk, and replicated to
>> > other
>> > nodes.  If this is true, is this also true for non-MR applications
>> > writing
>> > to HDFS from Hadoop worker nodes?  What about clients outside of the
>> > cluster
>> > doing a file load?
>> >
>> > Thanks
>> >
>> > John
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Re: Question about writing HDFS files

Posted by Harsh J <ha...@cloudera.com>.
Thanks for the clarification Rahul. In that case, then the reading is
correct (and that a HDFS client behaves the same, in and out of MR -
its not really related to MR at all).

A "client outside" would write to a random set of datanode, across at
least two racks for 3 replicas if rack awareness is turned on.

On Fri, May 17, 2013 at 8:17 AM, Rahul Bhattacharjee
<ra...@gmail.com> wrote:
> Hi Harsh,
>
> I think what John meant by writing to local disk is writing to the same data
> node first which has initiated the write call.
>
> John can further clarify.
>
>
> On Fri, May 17, 2013 at 4:23 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>> That is not true. HDFS writes are not staged to a local disk first
>> before being written onto the DataNodes. The old architecture docs
>> seem to suggest that the writes get staged to a local disk but thats
>> not true anymore, see https://issues.apache.org/jira/browse/HDFS-1454.
>>
>> Also worth noting that a HDFS client behaves the same way in almost
>> all contexts, whether its invoked from an MR framework or directly
>> from shell.
>>
>> On Fri, May 17, 2013 at 3:38 AM, John Lilley <jo...@redpoint.net>
>> wrote:
>> > I seem to recall reading that when a MapReduce task writes a file, the
>> > blocks of the file are always written to local disk, and replicated to
>> > other
>> > nodes.  If this is true, is this also true for non-MR applications
>> > writing
>> > to HDFS from Hadoop worker nodes?  What about clients outside of the
>> > cluster
>> > doing a file load?
>> >
>> > Thanks
>> >
>> > John
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Re: Question about writing HDFS files

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
Hi Harsh,

I think what John meant by writing to local disk is writing to the same
data node first which has initiated the write call.

John can further clarify.


On Fri, May 17, 2013 at 4:23 AM, Harsh J <ha...@cloudera.com> wrote:

> That is not true. HDFS writes are not staged to a local disk first
> before being written onto the DataNodes. The old architecture docs
> seem to suggest that the writes get staged to a local disk but thats
> not true anymore, see https://issues.apache.org/jira/browse/HDFS-1454.
>
> Also worth noting that a HDFS client behaves the same way in almost
> all contexts, whether its invoked from an MR framework or directly
> from shell.
>
> On Fri, May 17, 2013 at 3:38 AM, John Lilley <jo...@redpoint.net>
> wrote:
> > I seem to recall reading that when a MapReduce task writes a file, the
> > blocks of the file are always written to local disk, and replicated to
> other
> > nodes.  If this is true, is this also true for non-MR applications
> writing
> > to HDFS from Hadoop worker nodes?  What about clients outside of the
> cluster
> > doing a file load?
> >
> > Thanks
> >
> > John
> >
> >
>
>
>
> --
> Harsh J
>

Re: Question about writing HDFS files

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
Hi Harsh,

I think what John meant by writing to local disk is writing to the same
data node first which has initiated the write call.

John can further clarify.


On Fri, May 17, 2013 at 4:23 AM, Harsh J <ha...@cloudera.com> wrote:

> That is not true. HDFS writes are not staged to a local disk first
> before being written onto the DataNodes. The old architecture docs
> seem to suggest that the writes get staged to a local disk but thats
> not true anymore, see https://issues.apache.org/jira/browse/HDFS-1454.
>
> Also worth noting that a HDFS client behaves the same way in almost
> all contexts, whether its invoked from an MR framework or directly
> from shell.
>
> On Fri, May 17, 2013 at 3:38 AM, John Lilley <jo...@redpoint.net>
> wrote:
> > I seem to recall reading that when a MapReduce task writes a file, the
> > blocks of the file are always written to local disk, and replicated to
> other
> > nodes.  If this is true, is this also true for non-MR applications
> writing
> > to HDFS from Hadoop worker nodes?  What about clients outside of the
> cluster
> > doing a file load?
> >
> > Thanks
> >
> > John
> >
> >
>
>
>
> --
> Harsh J
>

Re: Question about writing HDFS files

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
Hi Harsh,

I think what John meant by writing to local disk is writing to the same
data node first which has initiated the write call.

John can further clarify.


On Fri, May 17, 2013 at 4:23 AM, Harsh J <ha...@cloudera.com> wrote:

> That is not true. HDFS writes are not staged to a local disk first
> before being written onto the DataNodes. The old architecture docs
> seem to suggest that the writes get staged to a local disk but thats
> not true anymore, see https://issues.apache.org/jira/browse/HDFS-1454.
>
> Also worth noting that a HDFS client behaves the same way in almost
> all contexts, whether its invoked from an MR framework or directly
> from shell.
>
> On Fri, May 17, 2013 at 3:38 AM, John Lilley <jo...@redpoint.net>
> wrote:
> > I seem to recall reading that when a MapReduce task writes a file, the
> > blocks of the file are always written to local disk, and replicated to
> other
> > nodes.  If this is true, is this also true for non-MR applications
> writing
> > to HDFS from Hadoop worker nodes?  What about clients outside of the
> cluster
> > doing a file load?
> >
> > Thanks
> >
> > John
> >
> >
>
>
>
> --
> Harsh J
>

Re: Question about writing HDFS files

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
Hi Harsh,

I think what John meant by writing to local disk is writing to the same
data node first which has initiated the write call.

John can further clarify.


On Fri, May 17, 2013 at 4:23 AM, Harsh J <ha...@cloudera.com> wrote:

> That is not true. HDFS writes are not staged to a local disk first
> before being written onto the DataNodes. The old architecture docs
> seem to suggest that the writes get staged to a local disk but thats
> not true anymore, see https://issues.apache.org/jira/browse/HDFS-1454.
>
> Also worth noting that a HDFS client behaves the same way in almost
> all contexts, whether its invoked from an MR framework or directly
> from shell.
>
> On Fri, May 17, 2013 at 3:38 AM, John Lilley <jo...@redpoint.net>
> wrote:
> > I seem to recall reading that when a MapReduce task writes a file, the
> > blocks of the file are always written to local disk, and replicated to
> other
> > nodes.  If this is true, is this also true for non-MR applications
> writing
> > to HDFS from Hadoop worker nodes?  What about clients outside of the
> cluster
> > doing a file load?
> >
> > Thanks
> >
> > John
> >
> >
>
>
>
> --
> Harsh J
>

Re: Question about writing HDFS files

Posted by Harsh J <ha...@cloudera.com>.
That is not true. HDFS writes are not staged to a local disk first
before being written onto the DataNodes. The old architecture docs
seem to suggest that the writes get staged to a local disk but thats
not true anymore, see https://issues.apache.org/jira/browse/HDFS-1454.

Also worth noting that a HDFS client behaves the same way in almost
all contexts, whether its invoked from an MR framework or directly
from shell.

On Fri, May 17, 2013 at 3:38 AM, John Lilley <jo...@redpoint.net> wrote:
> I seem to recall reading that when a MapReduce task writes a file, the
> blocks of the file are always written to local disk, and replicated to other
> nodes.  If this is true, is this also true for non-MR applications writing
> to HDFS from Hadoop worker nodes?  What about clients outside of the cluster
> doing a file load?
>
> Thanks
>
> John
>
>



--
Harsh J

Re: Question about writing HDFS files

Posted by Harsh J <ha...@cloudera.com>.
That is not true. HDFS writes are not staged to a local disk first
before being written onto the DataNodes. The old architecture docs
seem to suggest that the writes get staged to a local disk but thats
not true anymore, see https://issues.apache.org/jira/browse/HDFS-1454.

Also worth noting that a HDFS client behaves the same way in almost
all contexts, whether its invoked from an MR framework or directly
from shell.

On Fri, May 17, 2013 at 3:38 AM, John Lilley <jo...@redpoint.net> wrote:
> I seem to recall reading that when a MapReduce task writes a file, the
> blocks of the file are always written to local disk, and replicated to other
> nodes.  If this is true, is this also true for non-MR applications writing
> to HDFS from Hadoop worker nodes?  What about clients outside of the cluster
> doing a file load?
>
> Thanks
>
> John
>
>



--
Harsh J

Re: Question about writing HDFS files

Posted by Harsh J <ha...@cloudera.com>.
That is not true. HDFS writes are not staged to a local disk first
before being written onto the DataNodes. The old architecture docs
seem to suggest that the writes get staged to a local disk but thats
not true anymore, see https://issues.apache.org/jira/browse/HDFS-1454.

Also worth noting that a HDFS client behaves the same way in almost
all contexts, whether its invoked from an MR framework or directly
from shell.

On Fri, May 17, 2013 at 3:38 AM, John Lilley <jo...@redpoint.net> wrote:
> I seem to recall reading that when a MapReduce task writes a file, the
> blocks of the file are always written to local disk, and replicated to other
> nodes.  If this is true, is this also true for non-MR applications writing
> to HDFS from Hadoop worker nodes?  What about clients outside of the cluster
> doing a file load?
>
> Thanks
>
> John
>
>



--
Harsh J

Re: Question about writing HDFS files

Posted by Harsh J <ha...@cloudera.com>.
That is not true. HDFS writes are not staged to a local disk first
before being written onto the DataNodes. The old architecture docs
seem to suggest that the writes get staged to a local disk but thats
not true anymore, see https://issues.apache.org/jira/browse/HDFS-1454.

Also worth noting that a HDFS client behaves the same way in almost
all contexts, whether its invoked from an MR framework or directly
from shell.

On Fri, May 17, 2013 at 3:38 AM, John Lilley <jo...@redpoint.net> wrote:
> I seem to recall reading that when a MapReduce task writes a file, the
> blocks of the file are always written to local disk, and replicated to other
> nodes.  If this is true, is this also true for non-MR applications writing
> to HDFS from Hadoop worker nodes?  What about clients outside of the cluster
> doing a file load?
>
> Thanks
>
> John
>
>



--
Harsh J