You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Shiyuan <gs...@gmail.com> on 2018/05/16 20:45:20 UTC

Submit many spark applications

Hi Spark-users,
 I want to submit as many spark applications as the resources permit. I am
using cluster mode on a yarn cluster.  Yarn can queue and launch these
applications without problems. The problem lies on spark-submit itself.
Spark-submit starts a jvm which could fail due to insufficient memory on
the machine where I run spark-submit if many spark-submit jvm are running.
Any suggestions on how to solve this problem? Thank you!

Re: Submit many spark applications

Posted by Marcelo Vanzin <va...@cloudera.com>.
I already gave my recommendation in my very first reply to this thread...

On Fri, May 25, 2018 at 10:23 AM, raksja <sh...@gmail.com> wrote:
> ok, when to use what?
> do you have any recommendation?
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>



-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Submit many spark applications

Posted by raksja <sh...@gmail.com>.
ok, when to use what?
do you have any recommendation?



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Submit many spark applications

Posted by Marcelo Vanzin <va...@cloudera.com>.
On Fri, May 25, 2018 at 10:18 AM, raksja <sh...@gmail.com> wrote:
> InProcessLauncher would just start a subprocess as you mentioned earlier.

No. As the name says, it runs things in the same process.

-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Submit many spark applications

Posted by raksja <sh...@gmail.com>.
When you mean spark uses, did you meant this
https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala?

InProcessLauncher would just start a subprocess as you mentioned earlier.
How about this, does this makes a rest api call to yarn?

Do you think given my case where i have several concurrent jobs would you
recommend spark yarn client (mentioned above) over InProcessLauncher?






--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Submit many spark applications

Posted by Marcelo Vanzin <va...@cloudera.com>.
That's what Spark uses.

On Fri, May 25, 2018 at 10:09 AM, raksja <sh...@gmail.com> wrote:
> thanks for the reply.
>
> Have you tried submit a spark job directly to Yarn using YarnClient.
> https://hadoop.apache.org/docs/r2.6.0/api/org/apache/hadoop/yarn/client/api/YarnClient.html
>
> Not sure whether its performant and scalable?
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>



-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Submit many spark applications

Posted by raksja <sh...@gmail.com>.
thanks for the reply. 

Have you tried submit a spark job directly to Yarn using YarnClient. 
https://hadoop.apache.org/docs/r2.6.0/api/org/apache/hadoop/yarn/client/api/YarnClient.html

Not sure whether its performant and scalable?



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Submit many spark applications

Posted by Marcelo Vanzin <va...@cloudera.com>.
On Wed, May 23, 2018 at 12:04 PM, raksja <sh...@gmail.com> wrote:
> So InProcessLauncher wouldnt use the native memory, so will it overload the
> mem of parent process?

I will still use "native memory" (since the parent process will still
use memory), just less of it. But yes, it will use more memory in the
parent process.

> Is there any way that we can overcome this?

Try to launch less applications concurrently.


-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Submit many spark applications

Posted by raksja <sh...@gmail.com>.
Hi Marcelo, 

I'm facing same issue when making spark-submits from an ec2 instance and
reaching native memory limit sooner. we have the #1, but we are still in
spark 2.1.0, couldnt try #2. 

So InProcessLauncher wouldnt use the native memory, so will it overload the
mem of parent process?

Is there any way that we can overcome this?




--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Submit many spark applications

Posted by ayan guha <gu...@gmail.com>.
How about using Livy to submit jobs?

On Thu, 17 May 2018 at 7:24 am, Marcelo Vanzin <va...@cloudera.com> wrote:

> You can either:
>
> - set spark.yarn.submit.waitAppCompletion=false, which will make
> spark-submit go away once the app starts in cluster mode.
> - use the (new in 2.3) InProcessLauncher class + some custom Java code
> to submit all the apps from the same "launcher" process.
>
> On Wed, May 16, 2018 at 1:45 PM, Shiyuan <gs...@gmail.com> wrote:
> > Hi Spark-users,
> >  I want to submit as many spark applications as the resources permit. I
> am
> > using cluster mode on a yarn cluster.  Yarn can queue and launch these
> > applications without problems. The problem lies on spark-submit itself.
> > Spark-submit starts a jvm which could fail due to insufficient memory on
> the
> > machine where I run spark-submit if many spark-submit jvm are running.
> Any
> > suggestions on how to solve this problem? Thank you!
>
>
>
> --
> Marcelo
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
> --
Best Regards,
Ayan Guha

Re: Submit many spark applications

Posted by Marcelo Vanzin <va...@cloudera.com>.
You can either:

- set spark.yarn.submit.waitAppCompletion=false, which will make
spark-submit go away once the app starts in cluster mode.
- use the (new in 2.3) InProcessLauncher class + some custom Java code
to submit all the apps from the same "launcher" process.

On Wed, May 16, 2018 at 1:45 PM, Shiyuan <gs...@gmail.com> wrote:
> Hi Spark-users,
>  I want to submit as many spark applications as the resources permit. I am
> using cluster mode on a yarn cluster.  Yarn can queue and launch these
> applications without problems. The problem lies on spark-submit itself.
> Spark-submit starts a jvm which could fail due to insufficient memory on the
> machine where I run spark-submit if many spark-submit jvm are running. Any
> suggestions on how to solve this problem? Thank you!



-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Submit many spark applications

Posted by yncxcw <yn...@gmail.com>.
hi, 

please try to reduce the default heap size for the machine you use to submit
applications:

For example:
    export _JAVA_OPTIONS="-Xmx512M" 

The submitter which is also a JVM does not need to reserve lots of memory.


Wei 





--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org