You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Stas Oskin <st...@gmail.com> on 2009/04/10 17:33:52 UTC

Does the HDFS client read the data from NameNode, or from DataNode directly?

Hi.

I wanted to verify a point about HDFS client operations:

When asking for file, is the all communication done through the NameNode? Or
after being pointed to correct DataNode, does the HDFS works directly
against it?

Also, NameNode provides a URL named "streamFile" which allows any HTTP
client to get the stored files. Any idea how it's operations compare in
terms of speed to client HDFS access?

Regards.

Re: Does the HDFS client read the data from NameNode, or from DataNode directly?

Posted by Stas Oskin <st...@gmail.com>.
Hi.


> What happens here is that the NameNode redirects you to a "smartly" (a data
> node that has some of the file's first 5 blocks, I think) chosen DataNode,
> and that DataNode proxies the file for you.  Specifically, the assembling
> of
> a full file from multiple nodes is happening on that DataNode.  If you were
> using a DFSClient, it would assemble the file from blocks at the client,
> and
> talk to many data nodes.
>

I see, thanks for the explanation.

Re: Does the HDFS client read the data from NameNode, or from DataNode directly?

Posted by Philip Zeyliger <ph...@cloudera.com>.
>
>
> Also, NameNode provides a URL named "streamFile" which allows any HTTP
> client to get the stored files. Any idea how it's operations compare in
> terms of speed to client HDFS access?


What happens here is that the NameNode redirects you to a "smartly" (a data
node that has some of the file's first 5 blocks, I think) chosen DataNode,
and that DataNode proxies the file for you.  Specifically, the assembling of
a full file from multiple nodes is happening on that DataNode.  If you were
using a DFSClient, it would assemble the file from blocks at the client, and
talk to many data nodes.

-- Philip

Re: Does the HDFS client read the data from NameNode, or from DataNode directly?

Posted by Stas Oskin <st...@gmail.com>.
Thanks, this is what I thought.
Regards.

2009/4/10 Alex Loddengaard <al...@cloudera.com>

> Data is streamed directly from the data nodes themselves.  The name node is
> only queried for block locations and other meta data.
>
> Alex
>
> On Fri, Apr 10, 2009 at 8:33 AM, Stas Oskin <st...@gmail.com> wrote:
>
> > Hi.
> >
> > I wanted to verify a point about HDFS client operations:
> >
> > When asking for file, is the all communication done through the NameNode?
> > Or
> > after being pointed to correct DataNode, does the HDFS works directly
> > against it?
> >
> > Also, NameNode provides a URL named "streamFile" which allows any HTTP
> > client to get the stored files. Any idea how it's operations compare in
> > terms of speed to client HDFS access?
> >
> > Regards.
> >
>

Re: Does the HDFS client read the data from NameNode, or from DataNode directly?

Posted by Alex Loddengaard <al...@cloudera.com>.
Data is streamed directly from the data nodes themselves.  The name node is
only queried for block locations and other meta data.

Alex

On Fri, Apr 10, 2009 at 8:33 AM, Stas Oskin <st...@gmail.com> wrote:

> Hi.
>
> I wanted to verify a point about HDFS client operations:
>
> When asking for file, is the all communication done through the NameNode?
> Or
> after being pointed to correct DataNode, does the HDFS works directly
> against it?
>
> Also, NameNode provides a URL named "streamFile" which allows any HTTP
> client to get the stored files. Any idea how it's operations compare in
> terms of speed to client HDFS access?
>
> Regards.
>