You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Wei-Chiu Chuang <we...@apache.org> on 2019/09/04 17:43:53 UTC

Re: Is shortcircuit-read (SCR) really fast?

Hi Daegyu,
let's move this discussion to the user group, so that any one else can
comment on this. I obviously don't have the best answers to the questions.
But these are great questions.

Re: benchmarks for SCR:
I believe yes. In fact, I found a benchmark running Accumulo and HBase on
HDFS
http://accumulosummit.com/2015/program/talks/hdfs-short-circuit-local-read-performance-benchmarking-with-apache-accumulo-and-apache-hbase/
However, HDFS SCR is a very old feature, and since this isn't new, there is
less interest in performing the same benchmarks. So I don't expect to see
new benchmarks.

As far as I know I've not received reports regarding SCR performance
regression for users running HBase or Impala (these two applications
typically have SCR enabled).

A colleague of mine, Nicolae (CC'ed here) is also doing a similar benchmark
with NVMe SSD. I believe Nicolae will be interested in how you pushed HDSF
to the hardware limit. IIRC the theoretical limit of DataNode to client is
about 500MB/s per client.

On Fri, Aug 30, 2019 at 11:49 PM Daegyu Han <hd...@gmail.com> wrote:

> Sorry for late reply.
>
> I used my own benchmark using HDFS api.
>
> The cluster environment I used is as follows:
>
> Hadoop 2.9.2
>
> Samsung nvme ssd
>
> Configured hdfs block size to 1GB.
>
> I used only one datanode to prevent remote read.
>
>
> First, I uploaded 1Gb file to hdfs.
>
> Then, I ran my benchmark code.
>
> I added some log to my hdfs code so I can see each method runtime.
>
>
> Anyway, Has there been any performance evaluation by companies using HDFS
> on SCR and legacy read?
>
>
> As far as I know, legacy read goes through datanode so it induce many
> sendfile system call and tcp socket opening overhead.
>
>
> Intuitively, I think SCR that client directly read file should be faster
> than legacy read.
>
> However, the first step which is requesting file is synchronous and can be
> overhead when using fast nvme ssd.
>
>
> What do you think?
>
>
> Thank you
>
>
>
> 2019년 8월 30일 (금) 22:27, Wei-Chiu Chuang <we...@cloudera.com>님이 작성:
>
>> Interesting benchmark. Thank you, Daegyu.
>> Can you try a larger file too? Like 128mb or 1gb? HDFS is not optimized
>> for smaller files.
>>
>> What did you use for benchmark?
>>
>> Daegyu Han <hd...@gmail.com>於 2019年8月29日 週四,下午11:40寫道:
>>
>>> Hi all,
>>>
>>> Is ShortCircuit read faster than legacy read which goes through data
>>> nodes?
>>>
>>> I have evaluated SCR and legacy local read on both HDD and NVMe SSD.
>>> However, I have not seen any results that SCR is faster than  legacy.
>>>
>>> Rather, SCR was slower than legacy when using NVMe SSDs because of the
>>> operation that initially to get the file descriptors.
>>>
>>> When I configured SCR, getBlockReader() elapsed time is slower than
>>> legacy local read.
>>>
>>> When I used NVMe SSD,
>>> I also found that DFSInputStream: dataIn.read() time is really similar
>>> to hardware limit.
>>> (8MB/0.00289sec) = 2800MB/s
>>>
>>> I checked the logs that the execution time measured by the application
>>> took 5ms to process 8mb.
>>>
>>> There is a 3 ms runtime difference between blockReader.doRead () in
>>> DFSInputStream.java and dataIn.read () in BlockReader.java.
>>> Where is this 3ms difference from?
>>>
>>> Thank you
>>> ᐧ
>>>
>>