You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by jlei liu <li...@gmail.com> on 2012/09/15 12:12:21 UTC

HDFS dfs.client.read.shortcircuit.skip.checksum

I use hadoop-0.20.2-cdh3u5 version, and config
dfs.client.read.shortcircuit=ture.


I use 10 threas to pread local file, the file is 700m and the OS cache the
file.

When I config the dfs.client.read.shortcircuit.skip.checksum=false, the TPS
is about 2000.

When I config the dfs.client.read.shortcircuit.skip.checksum=true, the TPS
is about 17000.

Why when don't read meta file, the performance improve 8 times?  If the
reason is there are two times seek,  I think whether we should save the
checksums to block file.


Thanks,

LiuLei

Re: HDFS dfs.client.read.shortcircuit.skip.checksum

Posted by Todd Lipcon <to...@cloudera.com>.

Hi LiuLei,

Since you're using CDH3 (a 1.x derived distribution) you are using the old
checksum implementations written in Java.

In Hadoop 2.0 (or CDH4), we have JNI-based checksumming which uses
Nehalem's hardware CRC support. This is several times faster.

My guess is that this accounts for the substantial difference. You could
try re-running your test on a newer version to confirm.

-Todd

On Sat, Sep 15, 2012 at 7:13 AM, jlei liu <li...@gmail.com> wrote:

> I read  64k data from file every time.
>>
>>
>>
>

-- 
Todd Lipcon
Software Engineer, Cloudera

Re: HDFS dfs.client.read.shortcircuit.skip.checksum

Posted by Todd Lipcon <to...@cloudera.com>.

Hi LiuLei,

Since you're using CDH3 (a 1.x derived distribution) you are using the old
checksum implementations written in Java.

In Hadoop 2.0 (or CDH4), we have JNI-based checksumming which uses
Nehalem's hardware CRC support. This is several times faster.

My guess is that this accounts for the substantial difference. You could
try re-running your test on a newer version to confirm.

-Todd

On Sat, Sep 15, 2012 at 7:13 AM, jlei liu <li...@gmail.com> wrote:

> I read  64k data from file every time.
>>
>>
>>
>

-- 
Todd Lipcon
Software Engineer, Cloudera

Re: HDFS dfs.client.read.shortcircuit.skip.checksum

Posted by Todd Lipcon <to...@cloudera.com>.

Hi LiuLei,

Since you're using CDH3 (a 1.x derived distribution) you are using the old
checksum implementations written in Java.

In Hadoop 2.0 (or CDH4), we have JNI-based checksumming which uses
Nehalem's hardware CRC support. This is several times faster.

My guess is that this accounts for the substantial difference. You could
try re-running your test on a newer version to confirm.

-Todd

On Sat, Sep 15, 2012 at 7:13 AM, jlei liu <li...@gmail.com> wrote:

> I read  64k data from file every time.
>>
>>
>>
>

-- 
Todd Lipcon
Software Engineer, Cloudera

Re: HDFS dfs.client.read.shortcircuit.skip.checksum

Posted by Todd Lipcon <to...@cloudera.com>.

Hi LiuLei,

Since you're using CDH3 (a 1.x derived distribution) you are using the old
checksum implementations written in Java.

In Hadoop 2.0 (or CDH4), we have JNI-based checksumming which uses
Nehalem's hardware CRC support. This is several times faster.

My guess is that this accounts for the substantial difference. You could
try re-running your test on a newer version to confirm.

-Todd

On Sat, Sep 15, 2012 at 7:13 AM, jlei liu <li...@gmail.com> wrote:

> I read  64k data from file every time.
>>
>>
>>
>

-- 
Todd Lipcon
Software Engineer, Cloudera

Re: HDFS dfs.client.read.shortcircuit.skip.checksum

Posted by jlei liu <li...@gmail.com>.

>
> I read  64k data from file every time.
>
>
>

Re: HDFS dfs.client.read.shortcircuit.skip.checksum

Posted by jlei liu <li...@gmail.com>.

>
> I read  64k data from file every time.
>
>
>

Re: HDFS dfs.client.read.shortcircuit.skip.checksum

Posted by jlei liu <li...@gmail.com>.

>
> I read  64k data from file every time.
>
>
>

Re: HDFS dfs.client.read.shortcircuit.skip.checksum

Posted by jlei liu <li...@gmail.com>.

>
> I read  64k data from file every time.
>
>
>