You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Demai Ni <ni...@gmail.com> on 2014/10/13 20:58:33 UTC

read from a hdfs file on the same host as client

hi, folks,

a very simple question, looking forward a couple pointers.

Let's say I have a hdfs file: testfile, which only have one block(256MB),
and the block has a replica on datanode: host1.hdfs.com (the whole hdfs may
have 100 nodes though, and the other 2 replica are available at other
datanode).

If on host1.hdfs.com, I did a "hadoop fs -cat testfile" or a java client to
read the file. Should I assume there won't be any significant data movement
through network?  That is the namenode is smart enough to give me the data
on host1.hdfs.com directly?

thanks

Demai

Re: read from a hdfs file on the same host as client

Posted by Demai Ni <ni...@gmail.com>.

Shivram,

many thanks for confirming the behavior. I will also turn on the
shortcircuit as you suggested. Appreciate the help

Demai

On Mon, Oct 13, 2014 at 3:42 PM, Shivram Mani <sm...@pivotal.io> wrote:

> Demai, you are right. HDFS's default BlockPlacementPolicyDefault makes
> sure one replica of your block is available on the writer's datanode.
> The replica selection for the read operation is also aimed at minimizing
> bandwidth/latency and will serve the block from the reader's local node.
> If you want to further optimize this, you can set 'dfs.client.read.shortcircuit'
> to true. This would allow the client to bypass the datanode to read the
> file directly.
>
> On Mon, Oct 13, 2014 at 11:58 AM, Demai Ni <ni...@gmail.com> wrote:
>
>> hi, folks,
>>
>> a very simple question, looking forward a couple pointers.
>>
>> Let's say I have a hdfs file: testfile, which only have one block(256MB),
>> and the block has a replica on datanode: host1.hdfs.com (the whole hdfs
>> may have 100 nodes though, and the other 2 replica are available at other
>> datanode).
>>
>> If on host1.hdfs.com, I did a "hadoop fs -cat testfile" or a java client
>> to read the file. Should I assume there won't be any significant data
>> movement through network?  That is the namenode is smart enough to give me
>> the data on host1.hdfs.com directly?
>>
>> thanks
>>
>> Demai
>>
>
>
>
> --
> Thanks
> Shivram
>

Re: read from a hdfs file on the same host as client

Posted by Demai Ni <ni...@gmail.com>.

Shivram,

many thanks for confirming the behavior. I will also turn on the
shortcircuit as you suggested. Appreciate the help

Demai

On Mon, Oct 13, 2014 at 3:42 PM, Shivram Mani <sm...@pivotal.io> wrote:

> Demai, you are right. HDFS's default BlockPlacementPolicyDefault makes
> sure one replica of your block is available on the writer's datanode.
> The replica selection for the read operation is also aimed at minimizing
> bandwidth/latency and will serve the block from the reader's local node.
> If you want to further optimize this, you can set 'dfs.client.read.shortcircuit'
> to true. This would allow the client to bypass the datanode to read the
> file directly.
>
> On Mon, Oct 13, 2014 at 11:58 AM, Demai Ni <ni...@gmail.com> wrote:
>
>> hi, folks,
>>
>> a very simple question, looking forward a couple pointers.
>>
>> Let's say I have a hdfs file: testfile, which only have one block(256MB),
>> and the block has a replica on datanode: host1.hdfs.com (the whole hdfs
>> may have 100 nodes though, and the other 2 replica are available at other
>> datanode).
>>
>> If on host1.hdfs.com, I did a "hadoop fs -cat testfile" or a java client
>> to read the file. Should I assume there won't be any significant data
>> movement through network?  That is the namenode is smart enough to give me
>> the data on host1.hdfs.com directly?
>>
>> thanks
>>
>> Demai
>>
>
>
>
> --
> Thanks
> Shivram
>

Re: read from a hdfs file on the same host as client

Posted by Demai Ni <ni...@gmail.com>.

Shivram,

many thanks for confirming the behavior. I will also turn on the
shortcircuit as you suggested. Appreciate the help

Demai

On Mon, Oct 13, 2014 at 3:42 PM, Shivram Mani <sm...@pivotal.io> wrote:

> Demai, you are right. HDFS's default BlockPlacementPolicyDefault makes
> sure one replica of your block is available on the writer's datanode.
> The replica selection for the read operation is also aimed at minimizing
> bandwidth/latency and will serve the block from the reader's local node.
> If you want to further optimize this, you can set 'dfs.client.read.shortcircuit'
> to true. This would allow the client to bypass the datanode to read the
> file directly.
>
> On Mon, Oct 13, 2014 at 11:58 AM, Demai Ni <ni...@gmail.com> wrote:
>
>> hi, folks,
>>
>> a very simple question, looking forward a couple pointers.
>>
>> Let's say I have a hdfs file: testfile, which only have one block(256MB),
>> and the block has a replica on datanode: host1.hdfs.com (the whole hdfs
>> may have 100 nodes though, and the other 2 replica are available at other
>> datanode).
>>
>> If on host1.hdfs.com, I did a "hadoop fs -cat testfile" or a java client
>> to read the file. Should I assume there won't be any significant data
>> movement through network?  That is the namenode is smart enough to give me
>> the data on host1.hdfs.com directly?
>>
>> thanks
>>
>> Demai
>>
>
>
>
> --
> Thanks
> Shivram
>

Re: read from a hdfs file on the same host as client

Posted by Demai Ni <ni...@gmail.com>.

Shivram,

many thanks for confirming the behavior. I will also turn on the
shortcircuit as you suggested. Appreciate the help

Demai

On Mon, Oct 13, 2014 at 3:42 PM, Shivram Mani <sm...@pivotal.io> wrote:

> Demai, you are right. HDFS's default BlockPlacementPolicyDefault makes
> sure one replica of your block is available on the writer's datanode.
> The replica selection for the read operation is also aimed at minimizing
> bandwidth/latency and will serve the block from the reader's local node.
> If you want to further optimize this, you can set 'dfs.client.read.shortcircuit'
> to true. This would allow the client to bypass the datanode to read the
> file directly.
>
> On Mon, Oct 13, 2014 at 11:58 AM, Demai Ni <ni...@gmail.com> wrote:
>
>> hi, folks,
>>
>> a very simple question, looking forward a couple pointers.
>>
>> Let's say I have a hdfs file: testfile, which only have one block(256MB),
>> and the block has a replica on datanode: host1.hdfs.com (the whole hdfs
>> may have 100 nodes though, and the other 2 replica are available at other
>> datanode).
>>
>> If on host1.hdfs.com, I did a "hadoop fs -cat testfile" or a java client
>> to read the file. Should I assume there won't be any significant data
>> movement through network?  That is the namenode is smart enough to give me
>> the data on host1.hdfs.com directly?
>>
>> thanks
>>
>> Demai
>>
>
>
>
> --
> Thanks
> Shivram
>

Re: read from a hdfs file on the same host as client

Posted by Shivram Mani <sm...@pivotal.io>.

Demai, you are right. HDFS's default BlockPlacementPolicyDefault makes sure
one replica of your block is available on the writer's datanode.
The replica selection for the read operation is also aimed at minimizing
bandwidth/latency and will serve the block from the reader's local node.
If you want to further optimize this, you can set
'dfs.client.read.shortcircuit'
to true. This would allow the client to bypass the datanode to read the
file directly.

On Mon, Oct 13, 2014 at 11:58 AM, Demai Ni <ni...@gmail.com> wrote:

> hi, folks,
>
> a very simple question, looking forward a couple pointers.
>
> Let's say I have a hdfs file: testfile, which only have one block(256MB),
> and the block has a replica on datanode: host1.hdfs.com (the whole hdfs
> may have 100 nodes though, and the other 2 replica are available at other
> datanode).
>
> If on host1.hdfs.com, I did a "hadoop fs -cat testfile" or a java client
> to read the file. Should I assume there won't be any significant data
> movement through network?  That is the namenode is smart enough to give me
> the data on host1.hdfs.com directly?
>
> thanks
>
> Demai
>

-- 
Thanks
Shivram

Re: read from a hdfs file on the same host as client

Posted by Shivram Mani <sm...@pivotal.io>.

Demai, you are right. HDFS's default BlockPlacementPolicyDefault makes sure
one replica of your block is available on the writer's datanode.
The replica selection for the read operation is also aimed at minimizing
bandwidth/latency and will serve the block from the reader's local node.
If you want to further optimize this, you can set
'dfs.client.read.shortcircuit'
to true. This would allow the client to bypass the datanode to read the
file directly.

On Mon, Oct 13, 2014 at 11:58 AM, Demai Ni <ni...@gmail.com> wrote:

> hi, folks,
>
> a very simple question, looking forward a couple pointers.
>
> Let's say I have a hdfs file: testfile, which only have one block(256MB),
> and the block has a replica on datanode: host1.hdfs.com (the whole hdfs
> may have 100 nodes though, and the other 2 replica are available at other
> datanode).
>
> If on host1.hdfs.com, I did a "hadoop fs -cat testfile" or a java client
> to read the file. Should I assume there won't be any significant data
> movement through network?  That is the namenode is smart enough to give me
> the data on host1.hdfs.com directly?
>
> thanks
>
> Demai
>

-- 
Thanks
Shivram

Re: read from a hdfs file on the same host as client

Posted by Shivram Mani <sm...@pivotal.io>.

Demai, you are right. HDFS's default BlockPlacementPolicyDefault makes sure
one replica of your block is available on the writer's datanode.
The replica selection for the read operation is also aimed at minimizing
bandwidth/latency and will serve the block from the reader's local node.
If you want to further optimize this, you can set
'dfs.client.read.shortcircuit'
to true. This would allow the client to bypass the datanode to read the
file directly.

On Mon, Oct 13, 2014 at 11:58 AM, Demai Ni <ni...@gmail.com> wrote:

> hi, folks,
>
> a very simple question, looking forward a couple pointers.
>
> Let's say I have a hdfs file: testfile, which only have one block(256MB),
> and the block has a replica on datanode: host1.hdfs.com (the whole hdfs
> may have 100 nodes though, and the other 2 replica are available at other
> datanode).
>
> If on host1.hdfs.com, I did a "hadoop fs -cat testfile" or a java client
> to read the file. Should I assume there won't be any significant data
> movement through network?  That is the namenode is smart enough to give me
> the data on host1.hdfs.com directly?
>
> thanks
>
> Demai
>

-- 
Thanks
Shivram

Re: read from a hdfs file on the same host as client

Posted by Shivram Mani <sm...@pivotal.io>.

Demai, you are right. HDFS's default BlockPlacementPolicyDefault makes sure
one replica of your block is available on the writer's datanode.
The replica selection for the read operation is also aimed at minimizing
bandwidth/latency and will serve the block from the reader's local node.
If you want to further optimize this, you can set
'dfs.client.read.shortcircuit'
to true. This would allow the client to bypass the datanode to read the
file directly.

On Mon, Oct 13, 2014 at 11:58 AM, Demai Ni <ni...@gmail.com> wrote:

> hi, folks,
>
> a very simple question, looking forward a couple pointers.
>
> Let's say I have a hdfs file: testfile, which only have one block(256MB),
> and the block has a replica on datanode: host1.hdfs.com (the whole hdfs
> may have 100 nodes though, and the other 2 replica are available at other
> datanode).
>
> If on host1.hdfs.com, I did a "hadoop fs -cat testfile" or a java client
> to read the file. Should I assume there won't be any significant data
> movement through network?  That is the namenode is smart enough to give me
> the data on host1.hdfs.com directly?
>
> thanks
>
> Demai
>

-- 
Thanks
Shivram