You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Gary Malouf <ma...@gmail.com> on 2014/09/17 14:15:20 UTC

Short Circuit Local Reads

Cloudera had a blog post about this in August 2013:
http://blog.cloudera.com/blog/2013/08/how-improved-short-circuit-local-reads-bring-better-performance-and-security-to-hadoop/

Has anyone been using this in production - curious as to if it made a
significant difference from a Spark perspective.

Re: Short Circuit Local Reads

Posted by Colin McCabe <cm...@alumni.cmu.edu>.
On Tue, Sep 30, 2014 at 6:28 PM, Andrew Ash <an...@andrewash.com> wrote:
> Thanks for the research Kay!
>
> It does seem addressed, and hopefully fixed in that ticket conversation also
> in https://issues.apache.org/jira/browse/HDFS-4697  So the best thing here
> is to wait to upgrade to a version of Hadoop that has that fix and then
> repeating the test right now.  That will be quite a while for me (at least
> early 2015) but I'd be interested in hearing people who are already on CDH5+
> attempting to replicate the above experiment.

If you want to test the remote read path on cdh4 without readahead,
you can set both dfs.datanode.readahead.bytes and
dfs.client.cache.readahead to 0.  This might help give a fairer
comparison with short-circuit.

SCR also maintains a cache of recently used file descriptors whose
size is specified by  dfs.client.read.shortcircuit.streams.cache.size.
You could try increasing this number and see if it helps at all.  In
CDH4, it was set at a relatively low 100, and the cache expiry time
(specified by dfs.client.read.shortcircuit.streams.cache.expiry.ms)
was also set at a relatively low 5000 ms (5 seconds.)  So you could
try playing with those knobs.  When this cache hits, we completely
avoid the overhead of passing a file descriptor, calling JNI routines,
and opening the file descriptor on the DN side.

It would also be interesting to see CPU consumption numbers.  In
general, one of the benefits of SCR is reduced CPU consumption, which
may or may not be a benefit depending on what your job is bottlenecked
on.  We also find that workloads involving a lot of seeks benefit
greatly... the original rationale for SCR was HBase.

I would also advise dropping the caches in between doing your
experiments using "echo 3 > /proc/sys/vm/drop_caches"  You will need
to shut everything down first because pages that are in use or dirty
are not purged.  In general, 17 GB is not a lot of data on a modern
machine and I would expect things like VM startup time and what's in
the page cache at the beginning to make non-trivial contributions to
the numbers unless you are careful.

>
> Cheers,
> Andrew
>
> On Tue, Sep 30, 2014 at 2:26 PM, Kay Ousterhout <ka...@gmail.com>
> wrote:
>>
>> Hi Andrew and Gary,
>>
>> I've done some experimentation with this and had similar results.  I can't
>> explain the speedup in write performance, but I dug into the read slowdown
>> and found that enabling short-circuit reads results in Hadoop not doing
>> read-ahead in the same way.  At a high level, when SCR is off, HDFS does
>> read-ahead on input data, so much of the time spent reading input data is
>> pipelined with computation.  There were some bugs with SCR where, when SCR
>> was turned on, reading no longer got pipelined, slowing down performance.
>> In particular, I believe that non-shortcircuited-reads use fadvise to tell
>> the OS to read the file in the background, which is not done with shirt
>> circuit reads.

It's not fadvise, but it is the "readahead" system call on Linux.
Since it is a blocking system call, we need worker threads to do this
in the background.

We don't use the "readahead" system call for short-circuit reads on
cdh5.  Part of the reason that hasn't been implemented yet is that one
of the main advantages of short-circuit is reduced CPU consumption,
and we felt spawning more threads might cut into that.  We could
implement it pretty easily if people wanted it, but the biggest users
of SCR (HBase, Impala) have not requested it yet, so we haven't yet.

best,
Colin


>>  This problem is partially described in
>> https://issues.apache.org/jira/browse/HDFS-5634, a seemingly unrelated JIRA
>> that mentions this way down in some of comments.  This was supposedly fixed
>> in newer versions of Hadoop but I haven't verified it.
>>
>> -Kay
>>
>>>
>>>
>>> ---------- Forwarded message ----------
>>> From: Andrew Ash <an...@andrewash.com>
>>> Date: Tue, Sep 30, 2014 at 1:33 PM
>>> Subject: Re: Short Circuit Local Reads
>>> To: Matei Zaharia <ma...@gmail.com>
>>> Cc: "user@spark.apache.org" <us...@spark.apache.org>, Gary Malouf
>>> <ma...@gmail.com>
>>>
>>>
>>> Hi Gary,
>>>
>>> I gave this a shot on a test cluster of CDH4.7 and actually saw a
>>> regression in performance when running the numbers.  Have you done any
>>> benchmarking?  Below are my numbers:
>>>
>>>
>>>
>>> Experimental method:
>>> 1. Write 14GB of data to HDFS via [1]
>>> 2. Read data multiple times via [2]
>>>
>>>
>>> Experiment 1: run on virtual machines
>>>
>>>
>>> With short-circuit read disabled:
>>> 14/09/24 15:10:49 INFO spark.SparkContext: Job finished:
>>> saveAsTextFile at <console>:13, took 344.931469949 s
>>> 14/09/24 15:11:30 INFO spark.SparkContext: Job finished: count at
>>> <console>:13, took 18.601568871 s
>>> 14/09/24 15:11:54 INFO spark.SparkContext: Job finished: count at
>>> <console>:13, took 16.531909024 s
>>> 14/09/24 15:12:18 INFO spark.SparkContext: Job finished: count at
>>> <console>:13, took 17.639692651 s
>>> 14/09/24 15:12:38 INFO spark.SparkContext: Job finished: count at
>>> <console>:13, took 16.773438345 s
>>>
>>> With short-circuit read enabled:
>>> 14/09/24 14:28:38 INFO spark.SparkContext: Job finished:
>>> saveAsTextFile at <console>:13, took 299.511103592 s
>>> 14/09/24 14:29:17 INFO spark.SparkContext: Job finished: count at
>>> <console>:13, took 22.459146194 s
>>> 14/09/24 14:29:44 INFO spark.SparkContext: Job finished: count at
>>> <console>:13, took 19.806642815 s
>>> 14/09/24 14:30:11 INFO spark.SparkContext: Job finished: count at
>>> <console>:13, took 20.284644308 s
>>> 14/09/24 14:30:40 INFO spark.SparkContext: Job finished: count at
>>> <console>:13, took 21.720455219 s
>>>
>>>
>>> My summary hear is that enabling short-circuit read caused the write
>>> to go faster (what?) and caused a slight decrease in read performance,
>>> from ~17sec to ~20sec.
>>>
>>> The VMs were backed by FusionIO drives but I thought maybe there was
>>> something funky with the VMs so switched to bare hardware in a second
>>> experiment.
>>>
>>>
>>> Experiment 2: run on bare hardware
>>>
>>> With short-circuit read disabled:
>>> 14/09/24 15:59:11 INFO spark.SparkContext: Job finished:
>>> saveAsTextFile at <console>:13, took 1605.965203162 s
>>> 14/09/24 15:59:39 INFO spark.SparkContext: Job finished: count at
>>> <console>:13, took 11.984355461 s
>>> 14/09/24 16:00:00 INFO spark.SparkContext: Job finished: count at
>>> <console>:13, took 11.134712764 s
>>> 14/09/24 16:00:11 INFO spark.SparkContext: Job finished: count at
>>> <console>:13, took 8.694292372 s
>>> 14/09/24 16:00:24 INFO spark.SparkContext: Job finished: count at
>>> <console>:13, took 9.83986823 s
>>>
>>> With short-circuit read enabled:
>>> 14/09/24 16:23:14 INFO spark.SparkContext: Job finished:
>>> saveAsTextFile at <console>:13, took 1113.897715871 s
>>> 14/09/24 16:25:19 INFO spark.SparkContext: Job finished: count at
>>> <console>:13, took 14.249690605 s
>>> 14/09/24 16:25:47 INFO spark.SparkContext: Job finished: count at
>>> <console>:13, took 12.67330165 s
>>> 14/09/24 16:26:04 INFO spark.SparkContext: Job finished: count at
>>> <console>:13, took 10.673825924 s
>>> 14/09/24 16:26:19 INFO spark.SparkContext: Job finished: count at
>>> <console>:13, took 9.722516379 s
>>>
>>>
>>> This is separate hardware so the numbers are very different (it's not
>>> just bypassing the VM overhead).
>>>
>>> Again, the writes are much faster (1605s -> 1113s) but the reads are
>>> comparable if not slightly slower (~10.4s -> ~11.8s)
>>>
>>>
>>>
>>>
>>> To make sure that short circuit reads were actually working I looked
>>> at the datanode logs and saw the following line.  I think this
>>> confirms that a) the read was local (127.0.0.1 -> 127.0.0.1) from
>>> Spark and b) short-circuit read was successfully used ("success:
>>> true").
>>>
>>> hadoop-datanode-mybox.local.log:2014-09-24 16:26:52,800 INFO
>>> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>>> 127.0.0.1, dest: 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_FDS, blockid:
>>> -312380305519226759, srvID:
>>> DS-96112752-10.201.12.105-50010-1411586696381, success: true
>>>
>>>
>>> Has anyone actually deployed this feature and benchmarked gains?  I
>>> was hoping to throw this switch on my clusters and get a 30% perf
>>> boost but in practice that has not materialized.
>>>
>>>
>>> Cheers!
>>> Andrew
>>>
>>>
>>>
>>> [1] sc.parallelize(1 to (14*1024*1024)).map(k => Seq(k,
>>> org.apache.commons.lang.RandomStringUtils.random(1024,
>>>
>>> "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWxyZ0123456789")).mkString("|")).saveAsTextFile("hdfs:///tmp/output")
>>> [2] sc.textFile("hdfs:///tmp/output").count
>>>
>>> On Wed, Sep 17, 2014 at 11:19 AM, Matei Zaharia <ma...@gmail.com>
>>> wrote:
>>> >
>>> > I'm pretty sure it does help, though I don't have any numbers for it.
>>> > In any case, Spark will automatically benefit from this if you link it to a
>>> > version of HDFS that contains this.
>>> >
>>> > Matei
>>> >
>>> > On September 17, 2014 at 5:15:47 AM, Gary Malouf
>>> > (malouf.gary@gmail.com) wrote:
>>> >
>>> > Cloudera had a blog post about this in August 2013:
>>> > http://blog.cloudera.com/blog/2013/08/how-improved-short-circuit-local-reads-bring-better-performance-and-security-to-hadoop/
>>> >
>>> > Has anyone been using this in production - curious as to if it made a
>>> > significant difference from a Spark perspective.
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Short Circuit Local Reads

Posted by Andrew Ash <an...@andrewash.com>.
Thanks for the research Kay!

It does seem addressed, and hopefully fixed in that ticket conversation
also in https://issues.apache.org/jira/browse/HDFS-4697  So the best thing
here is to wait to upgrade to a version of Hadoop that has that fix and
then repeating the test right now.  That will be quite a while for me (at
least early 2015) but I'd be interested in hearing people who are already
on CDH5+ attempting to replicate the above experiment.

Cheers,
Andrew

On Tue, Sep 30, 2014 at 2:26 PM, Kay Ousterhout <ka...@gmail.com>
wrote:

> Hi Andrew and Gary,
>
> I've done some experimentation with this and had similar results.  I can't
> explain the speedup in write performance, but I dug into the read slowdown
> and found that enabling short-circuit reads results in Hadoop not doing
> read-ahead in the same way.  At a high level, when SCR is off, HDFS does
> read-ahead on input data, so much of the time spent reading input data is
> pipelined with computation.  There were some bugs with SCR where, when SCR
> was turned on, reading no longer got pipelined, slowing down performance.
> In particular, I believe that non-shortcircuited-reads use fadvise to tell
> the OS to read the file in the background, which is not done with shirt
> circuit reads.  This problem is partially described in
> https://issues.apache.org/jira/browse/HDFS-5634, a seemingly unrelated
> JIRA that mentions this way down in some of comments.  This was supposedly
> fixed in newer versions of Hadoop but I haven't verified it.
>
> -Kay
>
>
>>
>> ---------- Forwarded message ----------
>> From: Andrew Ash <an...@andrewash.com>
>> Date: Tue, Sep 30, 2014 at 1:33 PM
>> Subject: Re: Short Circuit Local Reads
>> To: Matei Zaharia <ma...@gmail.com>
>> Cc: "user@spark.apache.org" <us...@spark.apache.org>, Gary Malouf
>> <ma...@gmail.com>
>>
>>
>> Hi Gary,
>>
>> I gave this a shot on a test cluster of CDH4.7 and actually saw a
>> regression in performance when running the numbers.  Have you done any
>> benchmarking?  Below are my numbers:
>>
>>
>>
>> Experimental method:
>> 1. Write 14GB of data to HDFS via [1]
>> 2. Read data multiple times via [2]
>>
>>
>> Experiment 1: run on virtual machines
>>
>>
>> With short-circuit read disabled:
>> 14/09/24 15:10:49 INFO spark.SparkContext: Job finished:
>> saveAsTextFile at <console>:13, took 344.931469949 s
>> 14/09/24 15:11:30 INFO spark.SparkContext: Job finished: count at
>> <console>:13, took 18.601568871 s
>> 14/09/24 15:11:54 INFO spark.SparkContext: Job finished: count at
>> <console>:13, took 16.531909024 s
>> 14/09/24 15:12:18 INFO spark.SparkContext: Job finished: count at
>> <console>:13, took 17.639692651 s
>> 14/09/24 15:12:38 INFO spark.SparkContext: Job finished: count at
>> <console>:13, took 16.773438345 s
>>
>> With short-circuit read enabled:
>> 14/09/24 14:28:38 INFO spark.SparkContext: Job finished:
>> saveAsTextFile at <console>:13, took 299.511103592 s
>> 14/09/24 14:29:17 INFO spark.SparkContext: Job finished: count at
>> <console>:13, took 22.459146194 s
>> 14/09/24 14:29:44 INFO spark.SparkContext: Job finished: count at
>> <console>:13, took 19.806642815 s
>> 14/09/24 14:30:11 INFO spark.SparkContext: Job finished: count at
>> <console>:13, took 20.284644308 s
>> 14/09/24 14:30:40 INFO spark.SparkContext: Job finished: count at
>> <console>:13, took 21.720455219 s
>>
>>
>> My summary hear is that enabling short-circuit read caused the write
>> to go faster (what?) and caused a slight decrease in read performance,
>> from ~17sec to ~20sec.
>>
>> The VMs were backed by FusionIO drives but I thought maybe there was
>> something funky with the VMs so switched to bare hardware in a second
>> experiment.
>>
>>
>> Experiment 2: run on bare hardware
>>
>> With short-circuit read disabled:
>> 14/09/24 15:59:11 INFO spark.SparkContext: Job finished:
>> saveAsTextFile at <console>:13, took 1605.965203162 s
>> 14/09/24 15:59:39 INFO spark.SparkContext: Job finished: count at
>> <console>:13, took 11.984355461 s
>> 14/09/24 16:00:00 INFO spark.SparkContext: Job finished: count at
>> <console>:13, took 11.134712764 s
>> 14/09/24 16:00:11 INFO spark.SparkContext: Job finished: count at
>> <console>:13, took 8.694292372 s
>> 14/09/24 16:00:24 INFO spark.SparkContext: Job finished: count at
>> <console>:13, took 9.83986823 s
>>
>> With short-circuit read enabled:
>> 14/09/24 16:23:14 INFO spark.SparkContext: Job finished:
>> saveAsTextFile at <console>:13, took 1113.897715871 s
>> 14/09/24 16:25:19 INFO spark.SparkContext: Job finished: count at
>> <console>:13, took 14.249690605 s
>> 14/09/24 16:25:47 INFO spark.SparkContext: Job finished: count at
>> <console>:13, took 12.67330165 s
>> 14/09/24 16:26:04 INFO spark.SparkContext: Job finished: count at
>> <console>:13, took 10.673825924 s
>> 14/09/24 16:26:19 INFO spark.SparkContext: Job finished: count at
>> <console>:13, took 9.722516379 s
>>
>>
>> This is separate hardware so the numbers are very different (it's not
>> just bypassing the VM overhead).
>>
>> Again, the writes are much faster (1605s -> 1113s) but the reads are
>> comparable if not slightly slower (~10.4s -> ~11.8s)
>>
>>
>>
>>
>> To make sure that short circuit reads were actually working I looked
>> at the datanode logs and saw the following line.  I think this
>> confirms that a) the read was local (127.0.0.1 -> 127.0.0.1) from
>> Spark and b) short-circuit read was successfully used ("success:
>> true").
>>
>> hadoop-datanode-mybox.local.log:2014-09-24 16:26:52,800 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>> 127.0.0.1, dest: 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_FDS, blockid:
>> -312380305519226759, srvID:
>> DS-96112752-10.201.12.105-50010-1411586696381, success: true
>>
>>
>> Has anyone actually deployed this feature and benchmarked gains?  I
>> was hoping to throw this switch on my clusters and get a 30% perf
>> boost but in practice that has not materialized.
>>
>>
>> Cheers!
>> Andrew
>>
>>
>>
>> [1] sc.parallelize(1 to (14*1024*1024)).map(k => Seq(k,
>> org.apache.commons.lang.RandomStringUtils.random(1024,
>>
>> "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWxyZ0123456789")).mkString("|")).saveAsTextFile("hdfs:///tmp/output")
>> [2] sc.textFile("hdfs:///tmp/output").count
>>
>> On Wed, Sep 17, 2014 at 11:19 AM, Matei Zaharia <ma...@gmail.com>
>> wrote:
>> >
>> > I'm pretty sure it does help, though I don't have any numbers for it.
>> In any case, Spark will automatically benefit from this if you link it to a
>> version of HDFS that contains this.
>> >
>> > Matei
>> >
>> > On September 17, 2014 at 5:15:47 AM, Gary Malouf (malouf.gary@gmail.com)
>> wrote:
>> >
>> > Cloudera had a blog post about this in August 2013:
>> http://blog.cloudera.com/blog/2013/08/how-improved-short-circuit-local-reads-bring-better-performance-and-security-to-hadoop/
>> >
>> > Has anyone been using this in production - curious as to if it made a
>> significant difference from a Spark perspective.
>>
>
>

Re: Short Circuit Local Reads

Posted by Kay Ousterhout <ka...@gmail.com>.
Hi Andrew and Gary,

I've done some experimentation with this and had similar results.  I can't
explain the speedup in write performance, but I dug into the read slowdown
and found that enabling short-circuit reads results in Hadoop not doing
read-ahead in the same way.  At a high level, when SCR is off, HDFS does
read-ahead on input data, so much of the time spent reading input data is
pipelined with computation.  There were some bugs with SCR where, when SCR
was turned on, reading no longer got pipelined, slowing down performance.
In particular, I believe that non-shortcircuited-reads use fadvise to tell
the OS to read the file in the background, which is not done with shirt
circuit reads.  This problem is partially described in
https://issues.apache.org/jira/browse/HDFS-5634, a seemingly unrelated JIRA
that mentions this way down in some of comments.  This was supposedly fixed
in newer versions of Hadoop but I haven't verified it.

-Kay


>
> ---------- Forwarded message ----------
> From: Andrew Ash <an...@andrewash.com>
> Date: Tue, Sep 30, 2014 at 1:33 PM
> Subject: Re: Short Circuit Local Reads
> To: Matei Zaharia <ma...@gmail.com>
> Cc: "user@spark.apache.org" <us...@spark.apache.org>, Gary Malouf
> <ma...@gmail.com>
>
>
> Hi Gary,
>
> I gave this a shot on a test cluster of CDH4.7 and actually saw a
> regression in performance when running the numbers.  Have you done any
> benchmarking?  Below are my numbers:
>
>
>
> Experimental method:
> 1. Write 14GB of data to HDFS via [1]
> 2. Read data multiple times via [2]
>
>
> Experiment 1: run on virtual machines
>
>
> With short-circuit read disabled:
> 14/09/24 15:10:49 INFO spark.SparkContext: Job finished:
> saveAsTextFile at <console>:13, took 344.931469949 s
> 14/09/24 15:11:30 INFO spark.SparkContext: Job finished: count at
> <console>:13, took 18.601568871 s
> 14/09/24 15:11:54 INFO spark.SparkContext: Job finished: count at
> <console>:13, took 16.531909024 s
> 14/09/24 15:12:18 INFO spark.SparkContext: Job finished: count at
> <console>:13, took 17.639692651 s
> 14/09/24 15:12:38 INFO spark.SparkContext: Job finished: count at
> <console>:13, took 16.773438345 s
>
> With short-circuit read enabled:
> 14/09/24 14:28:38 INFO spark.SparkContext: Job finished:
> saveAsTextFile at <console>:13, took 299.511103592 s
> 14/09/24 14:29:17 INFO spark.SparkContext: Job finished: count at
> <console>:13, took 22.459146194 s
> 14/09/24 14:29:44 INFO spark.SparkContext: Job finished: count at
> <console>:13, took 19.806642815 s
> 14/09/24 14:30:11 INFO spark.SparkContext: Job finished: count at
> <console>:13, took 20.284644308 s
> 14/09/24 14:30:40 INFO spark.SparkContext: Job finished: count at
> <console>:13, took 21.720455219 s
>
>
> My summary hear is that enabling short-circuit read caused the write
> to go faster (what?) and caused a slight decrease in read performance,
> from ~17sec to ~20sec.
>
> The VMs were backed by FusionIO drives but I thought maybe there was
> something funky with the VMs so switched to bare hardware in a second
> experiment.
>
>
> Experiment 2: run on bare hardware
>
> With short-circuit read disabled:
> 14/09/24 15:59:11 INFO spark.SparkContext: Job finished:
> saveAsTextFile at <console>:13, took 1605.965203162 s
> 14/09/24 15:59:39 INFO spark.SparkContext: Job finished: count at
> <console>:13, took 11.984355461 s
> 14/09/24 16:00:00 INFO spark.SparkContext: Job finished: count at
> <console>:13, took 11.134712764 s
> 14/09/24 16:00:11 INFO spark.SparkContext: Job finished: count at
> <console>:13, took 8.694292372 s
> 14/09/24 16:00:24 INFO spark.SparkContext: Job finished: count at
> <console>:13, took 9.83986823 s
>
> With short-circuit read enabled:
> 14/09/24 16:23:14 INFO spark.SparkContext: Job finished:
> saveAsTextFile at <console>:13, took 1113.897715871 s
> 14/09/24 16:25:19 INFO spark.SparkContext: Job finished: count at
> <console>:13, took 14.249690605 s
> 14/09/24 16:25:47 INFO spark.SparkContext: Job finished: count at
> <console>:13, took 12.67330165 s
> 14/09/24 16:26:04 INFO spark.SparkContext: Job finished: count at
> <console>:13, took 10.673825924 s
> 14/09/24 16:26:19 INFO spark.SparkContext: Job finished: count at
> <console>:13, took 9.722516379 s
>
>
> This is separate hardware so the numbers are very different (it's not
> just bypassing the VM overhead).
>
> Again, the writes are much faster (1605s -> 1113s) but the reads are
> comparable if not slightly slower (~10.4s -> ~11.8s)
>
>
>
>
> To make sure that short circuit reads were actually working I looked
> at the datanode logs and saw the following line.  I think this
> confirms that a) the read was local (127.0.0.1 -> 127.0.0.1) from
> Spark and b) short-circuit read was successfully used ("success:
> true").
>
> hadoop-datanode-mybox.local.log:2014-09-24 16:26:52,800 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
> 127.0.0.1, dest: 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_FDS, blockid:
> -312380305519226759, srvID:
> DS-96112752-10.201.12.105-50010-1411586696381, success: true
>
>
> Has anyone actually deployed this feature and benchmarked gains?  I
> was hoping to throw this switch on my clusters and get a 30% perf
> boost but in practice that has not materialized.
>
>
> Cheers!
> Andrew
>
>
>
> [1] sc.parallelize(1 to (14*1024*1024)).map(k => Seq(k,
> org.apache.commons.lang.RandomStringUtils.random(1024,
>
> "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWxyZ0123456789")).mkString("|")).saveAsTextFile("hdfs:///tmp/output")
> [2] sc.textFile("hdfs:///tmp/output").count
>
> On Wed, Sep 17, 2014 at 11:19 AM, Matei Zaharia <ma...@gmail.com>
> wrote:
> >
> > I'm pretty sure it does help, though I don't have any numbers for it. In
> any case, Spark will automatically benefit from this if you link it to a
> version of HDFS that contains this.
> >
> > Matei
> >
> > On September 17, 2014 at 5:15:47 AM, Gary Malouf (malouf.gary@gmail.com)
> wrote:
> >
> > Cloudera had a blog post about this in August 2013:
> http://blog.cloudera.com/blog/2013/08/how-improved-short-circuit-local-reads-bring-better-performance-and-security-to-hadoop/
> >
> > Has anyone been using this in production - curious as to if it made a
> significant difference from a Spark perspective.
>

Re: Short Circuit Local Reads

Posted by Andrew Ash <an...@andrewash.com>.
Hi Gary,

I gave this a shot on a test cluster of CDH4.7 and actually saw a
regression in performance when running the numbers.  Have you done any
benchmarking?  Below are my numbers:



Experimental method:
1. Write 14GB of data to HDFS via [1]
2. Read data multiple times via [2]


*Experiment 1: run on virtual machines*


With short-circuit read *disabled*:
14/09/24 15:10:49 INFO spark.SparkContext: Job finished: saveAsTextFile at
<console>:13, took 344.931469949 s
14/09/24 15:11:30 INFO spark.SparkContext: Job finished: count at
<console>:13, took 18.601568871 s
14/09/24 15:11:54 INFO spark.SparkContext: Job finished: count at
<console>:13, took 16.531909024 s
14/09/24 15:12:18 INFO spark.SparkContext: Job finished: count at
<console>:13, took 17.639692651 s
14/09/24 15:12:38 INFO spark.SparkContext: Job finished: count at
<console>:13, took 16.773438345 s

With short-circuit read *enabled*:
14/09/24 14:28:38 INFO spark.SparkContext: Job finished: saveAsTextFile at
<console>:13, took 299.511103592 s
14/09/24 14:29:17 INFO spark.SparkContext: Job finished: count at
<console>:13, took 22.459146194 s
14/09/24 14:29:44 INFO spark.SparkContext: Job finished: count at
<console>:13, took 19.806642815 s
14/09/24 14:30:11 INFO spark.SparkContext: Job finished: count at
<console>:13, took 20.284644308 s
14/09/24 14:30:40 INFO spark.SparkContext: Job finished: count at
<console>:13, took 21.720455219 s


My summary hear is that enabling short-circuit read caused the write to go
faster (what?) and caused a slight decrease in read performance, from
~17sec to ~20sec.

The VMs were backed by FusionIO drives but I thought maybe there was
something funky with the VMs so switched to bare hardware in a second
experiment.


*Experiment 2: run on bare hardware*

With short-circuit read *disabled*:
14/09/24 15:59:11 INFO spark.SparkContext: Job finished: saveAsTextFile at
<console>:13, took 1605.965203162 s
14/09/24 15:59:39 INFO spark.SparkContext: Job finished: count at
<console>:13, took 11.984355461 s
14/09/24 16:00:00 INFO spark.SparkContext: Job finished: count at
<console>:13, took 11.134712764 s
14/09/24 16:00:11 INFO spark.SparkContext: Job finished: count at
<console>:13, took 8.694292372 s
14/09/24 16:00:24 INFO spark.SparkContext: Job finished: count at
<console>:13, took 9.83986823 s

With short-circuit read *enabled*:
14/09/24 16:23:14 INFO spark.SparkContext: Job finished: saveAsTextFile at
<console>:13, took 1113.897715871 s
14/09/24 16:25:19 INFO spark.SparkContext: Job finished: count at
<console>:13, took 14.249690605 s
14/09/24 16:25:47 INFO spark.SparkContext: Job finished: count at
<console>:13, took 12.67330165 s
14/09/24 16:26:04 INFO spark.SparkContext: Job finished: count at
<console>:13, took 10.673825924 s
14/09/24 16:26:19 INFO spark.SparkContext: Job finished: count at
<console>:13, took 9.722516379 s


This is separate hardware so the numbers are very different (it's not just
bypassing the VM overhead).

Again, the writes are much faster (1605s -> 1113s) but the reads are
comparable if not slightly slower (~10.4s -> ~11.8s)




To make sure that short circuit reads were actually working I looked at the
datanode logs and saw the following line.  I think this confirms that a)
the read was local (127.0.0.1 -> 127.0.0.1) from Spark and b) short-circuit
read was successfully used ("success: true").

hadoop-datanode-mybox.local.log:2014-09-24 16:26:52,800 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
127.0.0.1, dest: 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_FDS, blockid:
-312380305519226759, srvID: DS-96112752-10.201.12.105-50010-1411586696381,
success: true


Has anyone actually deployed this feature and benchmarked gains?  I was
hoping to throw this switch on my clusters and get a 30% perf boost but in
practice that has not materialized.


Cheers!
Andrew



[1] sc.parallelize(1 to (14*1024*1024)).map(k =>
Seq(k, org.apache.commons.lang.RandomStringUtils.random(1024,
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWxyZ0123456789")).mkString("|")).saveAsTextFile("hdfs:///tmp/output")
[2] sc.textFile("hdfs:///tmp/output").count

On Wed, Sep 17, 2014 at 11:19 AM, Matei Zaharia <ma...@gmail.com>
wrote:

> I'm pretty sure it does help, though I don't have any numbers for it. In
> any case, Spark will automatically benefit from this if you link it to a
> version of HDFS that contains this.
>
> Matei
>
> On September 17, 2014 at 5:15:47 AM, Gary Malouf (malouf.gary@gmail.com)
> wrote:
>
> Cloudera had a blog post about this in August 2013:
> http://blog.cloudera.com/blog/2013/08/how-improved-short-circuit-local-reads-bring-better-performance-and-security-to-hadoop/
>
> Has anyone been using this in production - curious as to if it made a
> significant difference from a Spark perspective.
>
>

Re: Short Circuit Local Reads

Posted by Matei Zaharia <ma...@gmail.com>.
I'm pretty sure it does help, though I don't have any numbers for it. In any case, Spark will automatically benefit from this if you link it to a version of HDFS that contains this.

Matei

On September 17, 2014 at 5:15:47 AM, Gary Malouf (malouf.gary@gmail.com) wrote:

Cloudera had a blog post about this in August 2013: http://blog.cloudera.com/blog/2013/08/how-improved-short-circuit-local-reads-bring-better-performance-and-security-to-hadoop/

Has anyone been using this in production - curious as to if it made a significant difference from a Spark perspective.