You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Andrew Ash <an...@andrewash.com> on 2014/02/06 07:29:35 UTC

[0.9.0] MEMORY_AND_DISK_SER not falling back to disk

// version 0.9.0

Hi Spark users,

My understanding of the MEMORY_AND_DISK_SER persistence level was that if
an RDD could fit into memory then it would be left there (same as
MEMORY_ONLY), and only if it was too big for memory would it spill to disk.
 Here's how the docs describe it:

MEMORY_AND_DISK_SER Similar to MEMORY_ONLY_SER, but spill partitions that
don't fit in memory to disk instead of recomputing them on the fly each
time they're needed.
https://spark.incubator.apache.org/docs/latest/scala-programming-guide.html



What I'm observing though is that really large RDDs are actually causing
OOMs.  I'm not sure if this is a regression in 0.9.0 or if it has been this
way for some time.

While I look through the source code, has anyone actually observed the
correct spill to disk behavior rather than an OOM?

Thanks!
Andrew

Re: [0.9.0] MEMORY_AND_DISK_SER not falling back to disk

Posted by Andrew Ash <an...@andrewash.com>.
My understanding of off-heap storage was that you'd still need to get those
JVM objects on-heap in order to actually use them with map, filter, etc.
 Would we be trading CPU time to get memory efficiency if we went down the
off-heap storage route?  I'm not sure what discussions have already
happened here and what kind of implementation we're talking about.




On Mon, Feb 10, 2014 at 2:28 AM, Rafal Kwasny <ma...@entropy.be> wrote:

> Hi Everyone,
> Maybe it's a good time to reevaluate off-heap storage for RDD's with
> custom allocator?
>
> On a few occasions recently I had to lower both
> spark.storage.memoryFraction and spark.shuffle.memoryFraction
> spark.shuffle.spill helps a bit with large scale reduces
>
> Also it could be you're hitting:
> https://github.com/apache/incubator-spark/pull/180
>
> /Rafal
>
>
>
> Andrew Ash wrote:
>
> I dropped down to 0.5 but still OOM'd, so sent it all the way to 0.1 and
> didn't get an OOM.  I could tune this some more to find where the cliff is,
> but this is a one-off job so now that it's completed I don't want to spend
> any more time tuning it.
>
> Is there a reason that this value couldn't be dynamically adjusted in
> response to actual heap usage?
>
> I can imagine a scenario where spending too much time in GC (descending
> into GC hell) drops the value a little to keep from OOM, or directly
> measuring how much of the heap is spent on this scratch space and adjusting
> appropriately.
>
>
> On Sat, Feb 8, 2014 at 3:40 PM, Matei Zaharia <ma...@gmail.com>wrote:
>
>> This probably means that there’s not enough free memory for the “scratch”
>> space used for computations, so we OOM before the Spark cache decides that
>> it’s full and starts to spill stuff. Try reducing
>> spark.storage.memoryFraction (default is 0.66, try 0.5).
>>
>> Matei
>>
>> On Feb 5, 2014, at 10:29 PM, Andrew Ash <an...@andrewash.com> wrote:
>>
>> // version 0.9.0
>>
>> Hi Spark users,
>>
>> My understanding of the MEMORY_AND_DISK_SER persistence level was that if
>> an RDD could fit into memory then it would be left there (same as
>> MEMORY_ONLY), and only if it was too big for memory would it spill to disk.
>>  Here's how the docs describe it:
>>
>> MEMORY_AND_DISK_SER Similar to MEMORY_ONLY_SER, but spill partitions
>> that don't fit in memory to disk instead of recomputing them on the fly
>> each time they're needed.
>>
>> https://spark.incubator.apache.org/docs/latest/scala-programming-guide.html
>>
>>
>>
>> What I'm observing though is that really large RDDs are actually causing
>> OOMs.  I'm not sure if this is a regression in 0.9.0 or if it has been this
>> way for some time.
>>
>> While I look through the source code, has anyone actually observed the
>> correct spill to disk behavior rather than an OOM?
>>
>> Thanks!
>> Andrew
>>
>>
>>
>
>

Re: [0.9.0] MEMORY_AND_DISK_SER not falling back to disk

Posted by Rafal Kwasny <ma...@entropy.be>.
Hi Everyone,
Maybe it's a good time to reevaluate off-heap storage for RDD's with
custom allocator?

On a few occasions recently I had to lower both
spark.storage.memoryFraction and spark.shuffle.memoryFraction
spark.shuffle.spill helps a bit with large scale reduces

Also it could be you're hitting:
https://github.com/apache/incubator-spark/pull/180

/Rafal


Andrew Ash wrote:
> I dropped down to 0.5 but still OOM'd, so sent it all the way to 0.1
> and didn't get an OOM.  I could tune this some more to find where the
> cliff is, but this is a one-off job so now that it's completed I don't
> want to spend any more time tuning it.
>
> Is there a reason that this value couldn't be dynamically adjusted in
> response to actual heap usage?
>
> I can imagine a scenario where spending too much time in GC
> (descending into GC hell) drops the value a little to keep from OOM,
> or directly measuring how much of the heap is spent on this scratch
> space and adjusting appropriately.
>
>
> On Sat, Feb 8, 2014 at 3:40 PM, Matei Zaharia <matei.zaharia@gmail.com
> <ma...@gmail.com>> wrote:
>
>     This probably means that there’s not enough free memory for the
>     “scratch” space used for computations, so we OOM before the Spark
>     cache decides that it’s full and starts to spill stuff. Try
>     reducing spark.storage.memoryFraction (default is 0.66, try 0.5).
>
>     Matei
>
>     On Feb 5, 2014, at 10:29 PM, Andrew Ash <andrew@andrewash.com
>     <ma...@andrewash.com>> wrote:
>
>>     // version 0.9.0
>>
>>     Hi Spark users,
>>
>>     My understanding of the MEMORY_AND_DISK_SER persistence level was
>>     that if an RDD could fit into memory then it would be left there
>>     (same as MEMORY_ONLY), and only if it was too big for memory
>>     would it spill to disk.  Here's how the docs describe it:
>>
>>     MEMORY_AND_DISK_SER	Similar to MEMORY_ONLY_SER, but spill
>>     partitions that don't fit in memory to disk instead of
>>     recomputing them on the fly each time they're needed.
>>
>>     https://spark.incubator.apache.org/docs/latest/scala-programming-guide.html
>>
>>
>>
>>     What I'm observing though is that really large RDDs are actually
>>     causing OOMs.  I'm not sure if this is a regression in 0.9.0 or
>>     if it has been this way for some time.
>>
>>     While I look through the source code, has anyone actually
>>     observed the correct spill to disk behavior rather than an OOM?
>>
>>     Thanks!
>>     Andrew
>
>


Re: [0.9.0] MEMORY_AND_DISK_SER not falling back to disk

Posted by Andrew Ash <an...@andrewash.com>.
I dropped down to 0.5 but still OOM'd, so sent it all the way to 0.1 and
didn't get an OOM.  I could tune this some more to find where the cliff is,
but this is a one-off job so now that it's completed I don't want to spend
any more time tuning it.

Is there a reason that this value couldn't be dynamically adjusted in
response to actual heap usage?

I can imagine a scenario where spending too much time in GC (descending
into GC hell) drops the value a little to keep from OOM, or directly
measuring how much of the heap is spent on this scratch space and adjusting
appropriately.


On Sat, Feb 8, 2014 at 3:40 PM, Matei Zaharia <ma...@gmail.com>wrote:

> This probably means that there’s not enough free memory for the “scratch”
> space used for computations, so we OOM before the Spark cache decides that
> it’s full and starts to spill stuff. Try reducing
> spark.storage.memoryFraction (default is 0.66, try 0.5).
>
> Matei
>
> On Feb 5, 2014, at 10:29 PM, Andrew Ash <an...@andrewash.com> wrote:
>
> // version 0.9.0
>
> Hi Spark users,
>
> My understanding of the MEMORY_AND_DISK_SER persistence level was that if
> an RDD could fit into memory then it would be left there (same as
> MEMORY_ONLY), and only if it was too big for memory would it spill to disk.
>  Here's how the docs describe it:
>
> MEMORY_AND_DISK_SER Similar to MEMORY_ONLY_SER, but spill partitions that
> don't fit in memory to disk instead of recomputing them on the fly each
> time they're needed.
> https://spark.incubator.apache.org/docs/latest/scala-programming-guide.html
>
>
>
> What I'm observing though is that really large RDDs are actually causing
> OOMs.  I'm not sure if this is a regression in 0.9.0 or if it has been this
> way for some time.
>
> While I look through the source code, has anyone actually observed the
> correct spill to disk behavior rather than an OOM?
>
> Thanks!
> Andrew
>
>
>

Re: [0.9.0] MEMORY_AND_DISK_SER not falling back to disk

Posted by Matei Zaharia <ma...@gmail.com>.
This probably means that there’s not enough free memory for the “scratch” space used for computations, so we OOM before the Spark cache decides that it’s full and starts to spill stuff. Try reducing spark.storage.memoryFraction (default is 0.66, try 0.5).

Matei

On Feb 5, 2014, at 10:29 PM, Andrew Ash <an...@andrewash.com> wrote:

> // version 0.9.0
> 
> Hi Spark users,
> 
> My understanding of the MEMORY_AND_DISK_SER persistence level was that if an RDD could fit into memory then it would be left there (same as MEMORY_ONLY), and only if it was too big for memory would it spill to disk.  Here's how the docs describe it:
> 
> MEMORY_AND_DISK_SER	 Similar to MEMORY_ONLY_SER, but spill partitions that don't fit in memory to disk instead of recomputing them on the fly each time they're needed.
> https://spark.incubator.apache.org/docs/latest/scala-programming-guide.html
> 
> 
> 
> What I'm observing though is that really large RDDs are actually causing OOMs.  I'm not sure if this is a regression in 0.9.0 or if it has been this way for some time.
> 
> While I look through the source code, has anyone actually observed the correct spill to disk behavior rather than an OOM?
> 
> Thanks!
> Andrew