You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by José Raúl Pérez Rodríguez <jo...@gmail.com> on 2018/04/10 17:27:26 UTC

cache OS memory and spark usage of it

Hi,

When I issue a "free -m" command in a host, I see a lot of memory used 
for cache in OS, however Spark Streaming is not able to request that 
memory for its usage, and it fail the execution due to not been able to 
launch executors.

What I understand of the OS memory cache (the one in "free -m" command 
result) is that, in practice is a free memory, because programs can 
request that memory for usage when needed, and OS "gives" the requested 
amount to the program. Is that right? If not, what is the behavior of OS 
cache? And what can spark do to use this memory?

Thanks a lot,

Raúl



---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: cache OS memory and spark usage of it

Posted by yncxcw <yn...@gmail.com>.

hi, Raúl 

(1)&(2) yes, the OS needs some pressure to release it. For example, if you
have a total 16GB ram in your machine, then you read a file of 8GB and
immediately close it. Noe the page cache would cache 8GB the file data. Then
you start a program requesting memory from OS, the OS will release the page
cache as long as your request goes beyond 8GB.

(3) I think you can configure your JVM with a maximum heap size of 14GB
(-xmx) and leave 2GB memory for OS.  you will have memory elasticity with
this configuration. The JVM will increase memory allocation from OS as long
as new objects are created, but it is bounded by 14GB which will not cause
memory swapping. For example, if your application only needs 8GB memory,
then the rest 8GB can be used for page cache, improving you IO performance.
Otherwise, if your application needs 14GB memory, then the JVM will force OS
to release almost all page cache. In this situation, your IO performance may
not be good, but you can hold more data (e.g, RDD) in your application.


Wei



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: cache OS memory and spark usage of it

Posted by Jose Raul Perez Rodriguez <jo...@gmail.com>.

it was helpful,

Then, the OS needs to fill some pressure from the applications 
requesting memory to free some memory cache?

Exactly under which circumstances the OS free that memory to give it to 
applications requesting it?

I mean if the total memory is 16GB and 10GB are used for OS cache, how 
the JVM can obtain memory from that.

Thanks,


On 11/04/18 01:36, yncxcw wrote:
> hi, Raúl
>
> First, the most of the OS memory cache is used by  Page Cache
> <https://en.wikipedia.org/wiki/Page_cache>   which OS use for caching the
> recent read/write I/O.
>
> I think the understanding of OS memory cache should be discussed in two
> different perspectives. From a perspective of
> user-space (e.g, Spark application), it is not used, since the Spark is not
> allocating memory from this part of memory.
> However, from a perspective of OS, it is actually used, because the memory
> pages are already allocated for caching the
> I/O pages. For each I/O request, the OS always allocate memory pages to
> cache it to expect these cached I/O pages can be reused in near future.
> Recall, you use vim/emacs to open a large file. It is pretty slow when you
> open it at the first time. But it will be much faster when you close it and
> open it immediately because the file has been cached in file cache at the
> first time you open it.
>
> It is hard for Spark to use this part of memory. Because this part of the
> memory is managed by OS and is transparent to applications.  The only thing
> you can do is that you can continuously allocate memory from OS (by
> malloc()), to some certain points which the OS senses some memory pressure,
> the OS will voluntarily release the page cache to satisfy your memory
> allocation. Another thing is that the memory limit of Spark is limited by
> maximum JVM heap size. So your memory request from your Spark application is
> actually handled by JVM not the OS.
>
>
> Hope this answer can help you!
>
>
> Wei
>
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: cache OS memory and spark usage of it

Posted by Jose Raul Perez Rodriguez <jo...@gmail.com>.

it was helpful,

Then, the OS needs to fill some pressure from the applications 
requesting memory to free some memory cache?

Exactly under which circumstances the OS free that memory to give it to 
applications requesting it?

I mean if the total memory is 16GB and 10GB are used for OS cache, how 
the JVM can obtain memory from that.

Thanks,


On 11/04/18 01:36, yncxcw wrote:
> hi, Raúl
>
> First, the most of the OS memory cache is used by  Page Cache
> <https://en.wikipedia.org/wiki/Page_cache>   which OS use for caching the
> recent read/write I/O.
>
> I think the understanding of OS memory cache should be discussed in two
> different perspectives. From a perspective of
> user-space (e.g, Spark application), it is not used, since the Spark is not
> allocating memory from this part of memory.
> However, from a perspective of OS, it is actually used, because the memory
> pages are already allocated for caching the
> I/O pages. For each I/O request, the OS always allocate memory pages to
> cache it to expect these cached I/O pages can be reused in near future.
> Recall, you use vim/emacs to open a large file. It is pretty slow when you
> open it at the first time. But it will be much faster when you close it and
> open it immediately because the file has been cached in file cache at the
> first time you open it.
>
> It is hard for Spark to use this part of memory. Because this part of the
> memory is managed by OS and is transparent to applications.  The only thing
> you can do is that you can continuously allocate memory from OS (by
> malloc()), to some certain points which the OS senses some memory pressure,
> the OS will voluntarily release the page cache to satisfy your memory
> allocation. Another thing is that the memory limit of Spark is limited by
> maximum JVM heap size. So your memory request from your Spark application is
> actually handled by JVM not the OS.
>
>
> Hope this answer can help you!
>
>
> Wei
>
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>


---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: cache OS memory and spark usage of it

Posted by yncxcw <yn...@gmail.com>.

hi, Raúl 

First, the most of the OS memory cache is used by  Page Cache
<https://en.wikipedia.org/wiki/Page_cache>   which OS use for caching the
recent read/write I/O.

I think the understanding of OS memory cache should be discussed in two
different perspectives. From a perspective of 
user-space (e.g, Spark application), it is not used, since the Spark is not
allocating memory from this part of memory. 
However, from a perspective of OS, it is actually used, because the memory
pages are already allocated for caching the 
I/O pages. For each I/O request, the OS always allocate memory pages to
cache it to expect these cached I/O pages can be reused in near future. 
Recall, you use vim/emacs to open a large file. It is pretty slow when you
open it at the first time. But it will be much faster when you close it and
open it immediately because the file has been cached in file cache at the
first time you open it.

It is hard for Spark to use this part of memory. Because this part of the
memory is managed by OS and is transparent to applications.  The only thing
you can do is that you can continuously allocate memory from OS (by
malloc()), to some certain points which the OS senses some memory pressure,
the OS will voluntarily release the page cache to satisfy your memory
allocation. Another thing is that the memory limit of Spark is limited by
maximum JVM heap size. So your memory request from your Spark application is
actually handled by JVM not the OS.


Hope this answer can help you!


Wei




--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org