You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Alexander Pivovarov <ap...@gmail.com> on 2016/05/12 21:16:30 UTC
Spark uses disk instead of memory to store RDD blocks
Hello Everyone
I use Spark 1.6.0 on YARN (EMR-4.3.0)
I use MEMORY_AND_DISK_SER StorageLevel for my RDD. And I use Kryo Serializer
I noticed that Spark uses Disk to store some RDD blocks even if Executors
have lots memory available. See the screenshot
http://postimg.org/image/gxpsw1fk1/
Any ideas why it might happen?
Thank you
Alex
Re: Spark uses disk instead of memory to store RDD blocks
Posted by Takeshi Yamamuro <li...@gmail.com>.
If you invoked the shuffling that eats a large amount of execution memory,
it possibly swept away
cached RDD blocks because the memory for the shuffling run short.
Please see:
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/memory/UnifiedMemoryManager.scala#L32
// maropu
On Fri, May 13, 2016 at 9:35 AM, Alexander Pivovarov <ap...@gmail.com>
wrote:
> Each executor on the screenshot has 25GB memory remaining . What was the
> reason to store 170-500 MB to disk if executor has 25GB memory available?
>
> On Thu, May 12, 2016 at 5:12 PM, Takeshi Yamamuro <li...@gmail.com>
> wrote:
>
>> Hi,
>>
>> Not sure this is a correct answer though, seems `UnifiedMemoryManager`
>> spills
>> some blocks of RDDs into disk when execution memory runs short.
>>
>> // maropu
>>
>> On Fri, May 13, 2016 at 6:16 AM, Alexander Pivovarov <
>> apivovarov@gmail.com> wrote:
>>
>>> Hello Everyone
>>>
>>> I use Spark 1.6.0 on YARN (EMR-4.3.0)
>>>
>>> I use MEMORY_AND_DISK_SER StorageLevel for my RDD. And I use Kryo
>>> Serializer
>>>
>>> I noticed that Spark uses Disk to store some RDD blocks even if
>>> Executors have lots memory available. See the screenshot
>>> http://postimg.org/image/gxpsw1fk1/
>>>
>>> Any ideas why it might happen?
>>>
>>> Thank you
>>> Alex
>>>
>>
>>
>>
>> --
>> ---
>> Takeshi Yamamuro
>>
>
>
--
---
Takeshi Yamamuro
Re: Spark uses disk instead of memory to store RDD blocks
Posted by Alexander Pivovarov <ap...@gmail.com>.
Each executor on the screenshot has 25GB memory remaining . What was the
reason to store 170-500 MB to disk if executor has 25GB memory available?
On Thu, May 12, 2016 at 5:12 PM, Takeshi Yamamuro <li...@gmail.com>
wrote:
> Hi,
>
> Not sure this is a correct answer though, seems `UnifiedMemoryManager`
> spills
> some blocks of RDDs into disk when execution memory runs short.
>
> // maropu
>
> On Fri, May 13, 2016 at 6:16 AM, Alexander Pivovarov <apivovarov@gmail.com
> > wrote:
>
>> Hello Everyone
>>
>> I use Spark 1.6.0 on YARN (EMR-4.3.0)
>>
>> I use MEMORY_AND_DISK_SER StorageLevel for my RDD. And I use Kryo
>> Serializer
>>
>> I noticed that Spark uses Disk to store some RDD blocks even if Executors
>> have lots memory available. See the screenshot
>> http://postimg.org/image/gxpsw1fk1/
>>
>> Any ideas why it might happen?
>>
>> Thank you
>> Alex
>>
>
>
>
> --
> ---
> Takeshi Yamamuro
>
Re: Spark uses disk instead of memory to store RDD blocks
Posted by Takeshi Yamamuro <li...@gmail.com>.
Hi,
Not sure this is a correct answer though, seems `UnifiedMemoryManager`
spills
some blocks of RDDs into disk when execution memory runs short.
// maropu
On Fri, May 13, 2016 at 6:16 AM, Alexander Pivovarov <ap...@gmail.com>
wrote:
> Hello Everyone
>
> I use Spark 1.6.0 on YARN (EMR-4.3.0)
>
> I use MEMORY_AND_DISK_SER StorageLevel for my RDD. And I use Kryo
> Serializer
>
> I noticed that Spark uses Disk to store some RDD blocks even if Executors
> have lots memory available. See the screenshot
> http://postimg.org/image/gxpsw1fk1/
>
> Any ideas why it might happen?
>
> Thank you
> Alex
>
--
---
Takeshi Yamamuro