You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by elton sky <el...@gmail.com> on 2010/10/09 06:42:30 UTC

increase BytesPerChecksum decrease write performance??

Hello,

I was benchmarking write/read of HDFS.

I changed the chunksize, i.e. bytesPerChecksum or bpc, and create a 1G file
with 128MB block size. The bpc I used: 512B, 32KB, 64KB, 256KB, 512KB, 2MB,
8MB.

The result surprised me. The performance for 512B, 32KB, 64KB are quite
similar, and then, as the increase of the bpc size the throughput decreases.
And comparing 512B to 8MB, there's a 40% to 50% difference in throughput.

Is there any idea for this?

Re: increase BytesPerChecksum decrease write performance??

Posted by elton sky <el...@gmail.com>.
Hi Hairong,

I am using 0.20.2. And I set "dfs.write.packet.size" to : 512B, 32KB, 64KB,
256KB, 512KB, 2MB, 8MB and keep the BytesPerChecksum as 512B as default. And
I got similar result as before.

I think the problem is on the packet size, which is the size of buffer for
each write/read on the pipeline.
any idea?

BTW: if "dfs.write.packet.size" in 0.20.2 equals "
dfs.client-write-packet-size " in 0.21

On Tue, Oct 12, 2010 at 3:44 AM, Hairong Kuang <ku...@gmail.com>wrote:

>  This might be caused by the default wirte packet size. In HDFS, user data
> are pipeline to datanodes in packets. The default packet size is 64K. If the
> chunksize is bigger than 64K, the packet size automatically adjusts to
> include at least one chunk.
>
> Please set the packet size to be 8MB by configuring
> dfs.client-write-packet-size (in trunk) and rerun your experiments.
>
> Hairong
>
>
> On 10/8/10 9:42 PM, "elton sky" <el...@gmail.com> wrote:
>
> Hello,
>
> I was benchmarking write/read of HDFS.
>
> I changed the chunksize, i.e. bytesPerChecksum or bpc, and create a 1G file
> with 128MB block size. The bpc I used: 512B, 32KB, 64KB, 256KB, 512KB, 2MB,
> 8MB.
>
> The result surprised me. The performance for 512B, 32KB, 64KB are quite
> similar, and then, as the increase of the bpc size the throughput decreases.
> And comparing 512B to 8MB, there's a 40% to 50% difference in throughput.
>
> Is there any idea for this?
>
>

Re: increase BytesPerChecksum decrease write performance??

Posted by Hairong Kuang <ku...@gmail.com>.
This might be caused by the default wirte packet size. In HDFS, user data
are pipeline to datanodes in packets. The default packet size is 64K. If the
chunksize is bigger than 64K, the packet size automatically adjusts to
include at least one chunk.

Please set the packet size to be 8MB by configuring
dfs.client-write-packet-size (in trunk) and rerun your experiments.

Hairong

On 10/8/10 9:42 PM, "elton sky" <el...@gmail.com> wrote:

> Hello,
> 
> I was benchmarking write/read of HDFS. 
> 
> I changed the chunksize, i.e. bytesPerChecksum or bpc, and create a 1G file
> with 128MB block size. The bpc I used: 512B, 32KB, 64KB, 256KB, 512KB, 2MB,
> 8MB.
> 
> The result surprised me. The performance for 512B, 32KB, 64KB are quite
> similar, and then, as the increase of the bpc size the throughput decreases.
> And comparing 512B to 8MB, there's a 40% to 50% difference in throughput. 
> 
> Is there any idea for this?
> 


Re: increase BytesPerChecksum decrease write performance??

Posted by Hairong Kuang <ku...@gmail.com>.
This might be caused by the default wirte packet size. In HDFS, user data
are pipeline to datanodes in packets. The default packet size is 64K. If the
chunksize is bigger than 64K, the packet size automatically adjusts to
include at least one chunk.

Please set the packet size to be 8MB by configuring
dfs.client-write-packet-size (in trunk) and rerun your experiments.

Hairong

On 10/8/10 9:42 PM, "elton sky" <el...@gmail.com> wrote:

> Hello,
> 
> I was benchmarking write/read of HDFS. 
> 
> I changed the chunksize, i.e. bytesPerChecksum or bpc, and create a 1G file
> with 128MB block size. The bpc I used: 512B, 32KB, 64KB, 256KB, 512KB, 2MB,
> 8MB.
> 
> The result surprised me. The performance for 512B, 32KB, 64KB are quite
> similar, and then, as the increase of the bpc size the throughput decreases.
> And comparing 512B to 8MB, there's a 40% to 50% difference in throughput. 
> 
> Is there any idea for this?
>