You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Dibyendu Bhattacharya <di...@gmail.com> on 2015/07/09 13:17:14 UTC

Some BlockManager Doubts

Hi ,

Just would like to clarify few doubts I have how BlockManager behaves .
This is mostly in regards to Spark Streaming Context .

There are two possible cases Blocks may get dropped / not stored in memory

Case 1. While writing the Block for MEMORY_ONLY settings , if Node's
BlockManager does not have enough memory to unroll the block , Block wont
be stored to memory and Receiver will throw error while writing the Block..
If StorageLevel is using Disk ( as in case MEMORY_AND_DISK) , blocks will
be stored to Disk ONLY IF BlockManager not able to unroll to Memory... This
is fine in the case while receiving the blocks , but this logic has a issue
when old Blocks are chosen to be dropped from memory as Case 2

Case 2 : Now let say either for MEMORY_ONLY or MEMORY_AND_DISK settings ,
blocks are successfully stored to Memory in Case 1 . Now what would happen
if memory limit goes beyond a certain threshold, BlockManager start
dropping LRU blocks from memory which was successfully stored while
receiving.

Primary issue here what I see , while dropping the blocks in Case 2 , Spark
does not check if storage level is using Disk (MEMORY_AND_DISK ) , and even
with DISK storage levels  blocks is drooped from memory without writing it
to Disk.
Or I believe the issue is at the first place that blocks are NOT written to
Disk simultaneously in Case 1 , I understand this will impact throughput ,
but it design may throw BlockNotFound error if Blocks are chosen to be
dropped even in case of StorageLevel is using Disk.

Any thoughts ?

Regards,
Dibyendu

Re: Some BlockManager Doubts

Posted by Shixiong Zhu <zs...@gmail.com>.

MEMORY_AND_DISK will use disk if there is no enough memory. If there is no
enough memory when putting a MEMORY_AND_DISK block, BlockManager will store
it to disk. And if a MEMORY_AND_DISK block is dropped from memory,
MemoryStore will call BlockManager.dropFromMemory to store it to disk, see
MemoryStore.ensureFreeSpace for details.

Best Regards,
Shixiong Zhu

2015-07-09 19:17 GMT+08:00 Dibyendu Bhattacharya <
dibyendu.bhattachary@gmail.com>:

> Hi ,
>
> Just would like to clarify few doubts I have how BlockManager behaves .
> This is mostly in regards to Spark Streaming Context .
>
> There are two possible cases Blocks may get dropped / not stored in memory
>
> Case 1. While writing the Block for MEMORY_ONLY settings , if Node's
> BlockManager does not have enough memory to unroll the block , Block wont
> be stored to memory and Receiver will throw error while writing the Block..
> If StorageLevel is using Disk ( as in case MEMORY_AND_DISK) , blocks will
> be stored to Disk ONLY IF BlockManager not able to unroll to Memory... This
> is fine in the case while receiving the blocks , but this logic has a issue
> when old Blocks are chosen to be dropped from memory as Case 2
>
> Case 2 : Now let say either for MEMORY_ONLY or MEMORY_AND_DISK settings ,
> blocks are successfully stored to Memory in Case 1 . Now what would happen
> if memory limit goes beyond a certain threshold, BlockManager start
> dropping LRU blocks from memory which was successfully stored while
> receiving.
>
> Primary issue here what I see , while dropping the blocks in Case 2 ,
> Spark does not check if storage level is using Disk (MEMORY_AND_DISK ) ,
> and even with DISK storage levels  blocks is drooped from memory without
> writing it to Disk.
> Or I believe the issue is at the first place that blocks are NOT written
> to Disk simultaneously in Case 1 , I understand this will impact throughput
> , but it design may throw BlockNotFound error if Blocks are chosen to be
> dropped even in case of StorageLevel is using Disk.
>
> Any thoughts ?
>
> Regards,
> Dibyendu
>