You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Alexander Pivovarov <ap...@gmail.com> on 2016/05/12 21:16:30 UTC

Spark uses disk instead of memory to store RDD blocks

Hello Everyone

I use Spark 1.6.0 on YARN  (EMR-4.3.0)

I use MEMORY_AND_DISK_SER StorageLevel for my RDD. And I use Kryo Serializer

I noticed that Spark uses Disk to store some RDD blocks even if Executors
have lots memory available. See the screenshot
http://postimg.org/image/gxpsw1fk1/

Any ideas why it might happen?

Thank you
Alex

Re: Spark uses disk instead of memory to store RDD blocks

Posted by Takeshi Yamamuro <li...@gmail.com>.
If you invoked the shuffling that eats a large amount of execution memory,
it possibly swept away
cached RDD blocks because the memory for the shuffling run short.
Please see:
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/memory/UnifiedMemoryManager.scala#L32

// maropu

On Fri, May 13, 2016 at 9:35 AM, Alexander Pivovarov <ap...@gmail.com>
wrote:

> Each executor on the screenshot has 25GB memory remaining . What was the
> reason to store 170-500 MB to disk if executor has 25GB memory available?
>
> On Thu, May 12, 2016 at 5:12 PM, Takeshi Yamamuro <li...@gmail.com>
> wrote:
>
>> Hi,
>>
>> Not sure this is a correct answer though, seems `UnifiedMemoryManager`
>> spills
>> some blocks of RDDs into disk when execution memory runs short.
>>
>> // maropu
>>
>> On Fri, May 13, 2016 at 6:16 AM, Alexander Pivovarov <
>> apivovarov@gmail.com> wrote:
>>
>>> Hello Everyone
>>>
>>> I use Spark 1.6.0 on YARN  (EMR-4.3.0)
>>>
>>> I use MEMORY_AND_DISK_SER StorageLevel for my RDD. And I use Kryo
>>> Serializer
>>>
>>> I noticed that Spark uses Disk to store some RDD blocks even if
>>> Executors have lots memory available. See the screenshot
>>> http://postimg.org/image/gxpsw1fk1/
>>>
>>> Any ideas why it might happen?
>>>
>>> Thank you
>>> Alex
>>>
>>
>>
>>
>> --
>> ---
>> Takeshi Yamamuro
>>
>
>


-- 
---
Takeshi Yamamuro

Re: Spark uses disk instead of memory to store RDD blocks

Posted by Alexander Pivovarov <ap...@gmail.com>.
Each executor on the screenshot has 25GB memory remaining . What was the
reason to store 170-500 MB to disk if executor has 25GB memory available?

On Thu, May 12, 2016 at 5:12 PM, Takeshi Yamamuro <li...@gmail.com>
wrote:

> Hi,
>
> Not sure this is a correct answer though, seems `UnifiedMemoryManager`
> spills
> some blocks of RDDs into disk when execution memory runs short.
>
> // maropu
>
> On Fri, May 13, 2016 at 6:16 AM, Alexander Pivovarov <apivovarov@gmail.com
> > wrote:
>
>> Hello Everyone
>>
>> I use Spark 1.6.0 on YARN  (EMR-4.3.0)
>>
>> I use MEMORY_AND_DISK_SER StorageLevel for my RDD. And I use Kryo
>> Serializer
>>
>> I noticed that Spark uses Disk to store some RDD blocks even if Executors
>> have lots memory available. See the screenshot
>> http://postimg.org/image/gxpsw1fk1/
>>
>> Any ideas why it might happen?
>>
>> Thank you
>> Alex
>>
>
>
>
> --
> ---
> Takeshi Yamamuro
>

Re: Spark uses disk instead of memory to store RDD blocks

Posted by Takeshi Yamamuro <li...@gmail.com>.
Hi,

Not sure this is a correct answer though, seems `UnifiedMemoryManager`
spills
some blocks of RDDs into disk when execution memory runs short.

// maropu

On Fri, May 13, 2016 at 6:16 AM, Alexander Pivovarov <ap...@gmail.com>
wrote:

> Hello Everyone
>
> I use Spark 1.6.0 on YARN  (EMR-4.3.0)
>
> I use MEMORY_AND_DISK_SER StorageLevel for my RDD. And I use Kryo
> Serializer
>
> I noticed that Spark uses Disk to store some RDD blocks even if Executors
> have lots memory available. See the screenshot
> http://postimg.org/image/gxpsw1fk1/
>
> Any ideas why it might happen?
>
> Thank you
> Alex
>



-- 
---
Takeshi Yamamuro