You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by wxhsdp <wx...@gmail.com> on 2014/04/30 13:52:57 UTC

something about memory usage

Hi, guys

  i want to do some optimizations of my spark codes. i use VisualVM to
monitor the executor when run the app.
  here's the snapshot:
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n5107/executor.png> 

from the snapshot, i can get the memory usage information about the
executor, but the executor contains lots of tasks. is it possible to get the
memory usage of one single task in JVM with GC running in the background?

by the way, you can see every time when memory is consumed up to 90%, JVM
does GC operation.
i'am a little confused about that. i originally thought that 60% of the
memory is kept for Spark's memory cache(i did not cache any RDDs in my
application), so there was only 40% left for running the app.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/something-about-memory-usage-tp5107.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: something about memory usage

Posted by wxhsdp <wx...@gmail.com>.
Hi, daniel, thx for your help

  i'am just running 1 core slaves. but still i can not work it out. the
executor does the task one by one,
  task0, task1, task2...

  how can i get the memory task1 used with so many threads running in the
background, also with GC.
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n5137/executor_threads.png> 

  it's not accurate to use Runtime.getRuntime().freeMemory



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/something-about-memory-usage-tp5107p5137.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: something about memory usage

Posted by Daniel Darabos <da...@lynxanalytics.com>.
On Wed, Apr 30, 2014 at 1:52 PM, wxhsdp <wx...@gmail.com> wrote:

> Hi, guys
>
>   i want to do some optimizations of my spark codes. i use VisualVM to
> monitor the executor when run the app.
>   here's the snapshot:
> <
> http://apache-spark-user-list.1001560.n3.nabble.com/file/n5107/executor.png
> >
>
> from the snapshot, i can get the memory usage information about the
> executor, but the executor contains lots of tasks. is it possible to get
> the
> memory usage of one single task in JVM with GC running in the background?
>

I guess you could run 1-core slaves. That way they would only work on one
task at a time.

by the way, you can see every time when memory is consumed up to 90%, JVM
> does GC operation.
> i'am a little confused about that. i originally thought that 60% of the
> memory is kept for Spark's memory cache(i did not cache any RDDs in my
> application), so there was only 40% left for running the app.
>

The way I understand it, Spark does not have a tight control on the memory.
Your code running on the executor can easily use more than 40% of memory.
Spark only limits the memory used for RDD caches and shuffles. If its RDD
caches are full, taking up 60% of the heap, and your code takes up more
than 40% (after GC), the executor will die with OOM.

I suppose there is not much Spark could do about this. You cannot control
how much memory a function you call is allowed to use.