You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by diplomatic Guru <di...@gmail.com> on 2016/10/10 22:14:18 UTC

[Spark] RDDs are not persisting in memory

Hello team,

Spark version: 1.6.0

I'm trying to persist done data into memory for reusing them. However, when
I call rdd.cache() OR  rdd.persist(StorageLevel.MEMORY_ONLY())  it does not
store the data as I can not see any rdd information under WebUI (Storage
Tab).

Therefore I tried rdd.persist(StorageLevel.MEMORY_AND_DISK()), for which it
stored the data into Disk only as shown in below screenshot:

[image: Inline images 2]

Do you know why the memory is not being used?

Is there a configuration in cluster level to stop jobs from storing data
into memory altogether?


Please let me know.

Thanks

Guru

Re: [Spark] RDDs are not persisting in memory

Posted by diplomatic Guru <di...@gmail.com>.

Hello team. so I found and resolved the issue. In case if someone run into
same problem this was the problem.

>>Each nodes were allocated 1397MB memory for storages.
16/10/11 13:16:58 INFO storage.MemoryStore: MemoryStore started with
capacity 1397.3 MB

>> However, my RDD exceeded the storage limit (although it says computed
1224MB).

16/10/11 13:18:36 WARN storage.MemoryStore: Not enough space to cache
rdd_6_0 in memory! (computed 1224.3 MB so far)
16/10/11 13:18:36 INFO storage.MemoryStore: Memory use = 331.8 KB (blocks)
+ 1224.3 MB (scratch space shared across 2 tasks(s)) = 1224.6 MB. Storage
limit = 1397.3 MB.

Therefore, I repartitioned the RDDs for better memory utilisation, wich
resolved the issue.

Kind regards,

Guru


On 11 October 2016 at 11:23, diplomatic Guru <di...@gmail.com>
wrote:

> @Song, I have called an action but it did not cache as you can see in the
> provided screenshot on my original email. It has cahced into Disk but not
> memory.
>
> @Chin Wei Low, I have 15GB memory allocated which is more than the dataset
> size.
>
> Any other suggestion please?
>
>
> Kind regards,
>
> Guru
>
> On 11 October 2016 at 03:34, Chin Wei Low <lo...@gmail.com> wrote:
>
>> Hi,
>>
>> Your RDD is 5GB, perhaps it is too large to fit into executor's storage
>> memory. You can refer to the Executors tab in Spark UI to check the
>> available memory for storage for each of the executor.
>>
>> Regards,
>> Chin Wei
>>
>> On Tue, Oct 11, 2016 at 6:14 AM, diplomatic Guru <
>> diplomaticguru@gmail.com> wrote:
>>
>>> Hello team,
>>>
>>> Spark version: 1.6.0
>>>
>>> I'm trying to persist done data into memory for reusing them. However,
>>> when I call rdd.cache() OR  rdd.persist(StorageLevel.MEMORY_ONLY())  it
>>> does not store the data as I can not see any rdd information under WebUI
>>> (Storage Tab).
>>>
>>> Therefore I tried rdd.persist(StorageLevel.MEMORY_AND_DISK()), for
>>> which it stored the data into Disk only as shown in below screenshot:
>>>
>>> [image: Inline images 2]
>>>
>>> Do you know why the memory is not being used?
>>>
>>> Is there a configuration in cluster level to stop jobs from storing data
>>> into memory altogether?
>>>
>>>
>>> Please let me know.
>>>
>>> Thanks
>>>
>>> Guru
>>>
>>>
>>
>

Re: [Spark] RDDs are not persisting in memory

Posted by diplomatic Guru <di...@gmail.com>.

@Song, I have called an action but it did not cache as you can see in the
provided screenshot on my original email. It has cahced into Disk but not
memory.

@Chin Wei Low, I have 15GB memory allocated which is more than the dataset
size.

Any other suggestion please?


Kind regards,

Guru

On 11 October 2016 at 03:34, Chin Wei Low <lo...@gmail.com> wrote:

> Hi,
>
> Your RDD is 5GB, perhaps it is too large to fit into executor's storage
> memory. You can refer to the Executors tab in Spark UI to check the
> available memory for storage for each of the executor.
>
> Regards,
> Chin Wei
>
> On Tue, Oct 11, 2016 at 6:14 AM, diplomatic Guru <diplomaticguru@gmail.com
> > wrote:
>
>> Hello team,
>>
>> Spark version: 1.6.0
>>
>> I'm trying to persist done data into memory for reusing them. However,
>> when I call rdd.cache() OR  rdd.persist(StorageLevel.MEMORY_ONLY())  it
>> does not store the data as I can not see any rdd information under WebUI
>> (Storage Tab).
>>
>> Therefore I tried rdd.persist(StorageLevel.MEMORY_AND_DISK()), for which
>> it stored the data into Disk only as shown in below screenshot:
>>
>> [image: Inline images 2]
>>
>> Do you know why the memory is not being used?
>>
>> Is there a configuration in cluster level to stop jobs from storing data
>> into memory altogether?
>>
>>
>> Please let me know.
>>
>> Thanks
>>
>> Guru
>>
>>
>

Re: [Spark] RDDs are not persisting in memory

Posted by Chin Wei Low <lo...@gmail.com>.

Hi,

Your RDD is 5GB, perhaps it is too large to fit into executor's storage
memory. You can refer to the Executors tab in Spark UI to check the
available memory for storage for each of the executor.

Regards,
Chin Wei

On Tue, Oct 11, 2016 at 6:14 AM, diplomatic Guru <di...@gmail.com>
wrote:

> Hello team,
>
> Spark version: 1.6.0
>
> I'm trying to persist done data into memory for reusing them. However,
> when I call rdd.cache() OR  rdd.persist(StorageLevel.MEMORY_ONLY())  it
> does not store the data as I can not see any rdd information under WebUI
> (Storage Tab).
>
> Therefore I tried rdd.persist(StorageLevel.MEMORY_AND_DISK()), for which
> it stored the data into Disk only as shown in below screenshot:
>
> [image: Inline images 2]
>
> Do you know why the memory is not being used?
>
> Is there a configuration in cluster level to stop jobs from storing data
> into memory altogether?
>
>
> Please let me know.
>
> Thanks
>
> Guru
>
>