You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Fabrizio Milo aka misto <mi...@gmail.com> on 2014/03/13 02:22:09 UTC

Local Standalone Application and shuffle spills

Hello everyone

I have a question about Shuffle Spills. From the introduction to
amplab spark internals
each task output could be saved to disk for 'redundancy'

if I set spark.shuffle.spill to false would this behavior be
eliminated and make it in a way that it will never spill to disk ?

Thank you

-- 
LinkedIn: http://linkedin.com/in/fmilo
Twitter: @fabmilo
Github: http://github.com/Mistobaan/
-----------------------
Simplicity, consistency, and repetition - that's how you get through.
(Jack Welch)
Perfection must be reached by degrees; she requires the slow hand of
time (Voltaire)
The best way to predict the future is to invent it (Alan Kay)

Re: Local Standalone Application and shuffle spills

Posted by Aaron Davidson <il...@gmail.com>.
The amplab spark internals talk you mentioned is actually referring to the
RDD persistence levels, where by default we do not persist RDDs to disk (
https://spark.apache.org/docs/0.9.0/scala-programming-guide.html#rdd-persistence
).

"spark.shuffle.spill" refers to a different behavior -- if the "reduce"
phase of your shuffle would otherwise cause Spark to OOM, it will instead
write data to temporary files on disk. You probably don't want to disable
this unless you'd prefer to tune Spark to make sure the reduce can stay in
memory.

Note that if your goal is to force Spark never to use disk, it is
complicated by the fact that shuffles always write data to disk in an
analogous way to the shuffle between the map and reduce phases of
MapReduce. You would have to use a ramdisk for Spark's local directory.


On Wed, Mar 12, 2014 at 6:22 PM, Fabrizio Milo aka misto <
mistobaan@gmail.com> wrote:

> Hello everyone
>
> I have a question about Shuffle Spills. From the introduction to
> amplab spark internals
> each task output could be saved to disk for 'redundancy'
>
> if I set spark.shuffle.spill to false would this behavior be
> eliminated and make it in a way that it will never spill to disk ?
>
> Thank you
>
> --
> LinkedIn: http://linkedin.com/in/fmilo
> Twitter: @fabmilo
> Github: http://github.com/Mistobaan/
> -----------------------
> Simplicity, consistency, and repetition - that's how you get through.
> (Jack Welch)
> Perfection must be reached by degrees; she requires the slow hand of
> time (Voltaire)
> The best way to predict the future is to invent it (Alan Kay)
>