You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Elkhan Dadashov <el...@gmail.com> on 2015/07/14 18:39:50 UTC

ProcessBuilder in SparkLauncher is memory inefficient for launching new process

Hi all,

If you want to launch Spark job from Java in programmatic way, then you
need to Use SparkLauncher.

SparkLauncher uses ProcessBuilder for creating new process - Java seems
handle process creation in an inefficient way.

"
When you execute a process, you must first fork() and then exec(). Forking
creates a child process by duplicating the current process. Then, you call
exec() to change the “process image” to a new “process image”, essentially
executing different code within the child process.
...
When we want to fork a new process, we have to copy the ENTIRE Java JVM…
What we really are doing is requesting the same amount of memory the JVM
been allocated.
"
Source: http://bryanmarty.com/2012/01/14/forking-jvm/
This link <http://bryanmarty.com/2012/01/14/forking-jvm/> shows different
solutions for launching new processes in Java.

If our main program JVM already uses big amount of memory (let's say 6GB),
then for creating new process while using SparkLauncher, we need 12 GB
(virtual) memory available, even though we will not use it.

It will be very helpful if someone could share his/her experience for
handing this memory inefficiency in creating new processes in Java.

Re: ProcessBuilder in SparkLauncher is memory inefficient for launching new process

Posted by Jong Wook Kim <jo...@nyu.edu>.

The article you've linked, is specific to an embedded system. the JVM built for that architecture (which the author didn't mention) might not be as stable and well-supported as HotSpot.

ProcessBuilder is a stable Java API and despite somewhat limited functionality it is the standard method to launch a subprocess within a JVM.

You also have misconception about forking and memory. Forking a process does not double the memory consumption, any modern unix (except that the poor embedded one) will use copy-on-write scheme for the forked process' virtual memory, so no more physical memory will be consumed. 

Jong Wook.

> On Jul 15, 2015, at 01:39, Elkhan Dadashov <el...@gmail.com> wrote:
> 
> Hi all,
> 
> If you want to launch Spark job from Java in programmatic way, then you need to Use SparkLauncher.
> 
> SparkLauncher uses ProcessBuilder for creating new process - Java seems handle process creation in an inefficient way.
> 
> "
> When you execute a process, you must first fork() and then exec(). Forking creates a child process by duplicating the current process. Then, you call exec() to change the “process image” to a new “process image”, essentially executing different code within the child process.
> ...
> When we want to fork a new process, we have to copy the ENTIRE Java JVM… What we really are doing is requesting the same amount of memory the JVM been allocated.
> "
> Source: http://bryanmarty.com/2012/01/14/forking-jvm/ <http://bryanmarty.com/2012/01/14/forking-jvm/>
> This link <http://bryanmarty.com/2012/01/14/forking-jvm/> shows different solutions for launching new processes in Java.
> 
> If our main program JVM already uses big amount of memory (let's say 6GB), then for creating new process while using SparkLauncher, we need 12 GB (virtual) memory available, even though we will not use it.
> 
> It will be very helpful if someone could share his/her experience for handing this memory inefficiency in creating new processes in Java.
>