You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Shiyuan <gs...@gmail.com> on 2018/05/16 20:45:20 UTC
Submit many spark applications
Hi Spark-users,
I want to submit as many spark applications as the resources permit. I am
using cluster mode on a yarn cluster. Yarn can queue and launch these
applications without problems. The problem lies on spark-submit itself.
Spark-submit starts a jvm which could fail due to insufficient memory on
the machine where I run spark-submit if many spark-submit jvm are running.
Any suggestions on how to solve this problem? Thank you!
Re: Submit many spark applications
Posted by Marcelo Vanzin <va...@cloudera.com>.
I already gave my recommendation in my very first reply to this thread...
On Fri, May 25, 2018 at 10:23 AM, raksja <sh...@gmail.com> wrote:
> ok, when to use what?
> do you have any recommendation?
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
--
Marcelo
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org
Re: Submit many spark applications
Posted by raksja <sh...@gmail.com>.
ok, when to use what?
do you have any recommendation?
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org
Re: Submit many spark applications
Posted by Marcelo Vanzin <va...@cloudera.com>.
On Fri, May 25, 2018 at 10:18 AM, raksja <sh...@gmail.com> wrote:
> InProcessLauncher would just start a subprocess as you mentioned earlier.
No. As the name says, it runs things in the same process.
--
Marcelo
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org
Re: Submit many spark applications
Posted by raksja <sh...@gmail.com>.
When you mean spark uses, did you meant this
https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala?
InProcessLauncher would just start a subprocess as you mentioned earlier.
How about this, does this makes a rest api call to yarn?
Do you think given my case where i have several concurrent jobs would you
recommend spark yarn client (mentioned above) over InProcessLauncher?
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org
Re: Submit many spark applications
Posted by Marcelo Vanzin <va...@cloudera.com>.
That's what Spark uses.
On Fri, May 25, 2018 at 10:09 AM, raksja <sh...@gmail.com> wrote:
> thanks for the reply.
>
> Have you tried submit a spark job directly to Yarn using YarnClient.
> https://hadoop.apache.org/docs/r2.6.0/api/org/apache/hadoop/yarn/client/api/YarnClient.html
>
> Not sure whether its performant and scalable?
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
--
Marcelo
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org
Re: Submit many spark applications
Posted by raksja <sh...@gmail.com>.
thanks for the reply.
Have you tried submit a spark job directly to Yarn using YarnClient.
https://hadoop.apache.org/docs/r2.6.0/api/org/apache/hadoop/yarn/client/api/YarnClient.html
Not sure whether its performant and scalable?
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org
Re: Submit many spark applications
Posted by Marcelo Vanzin <va...@cloudera.com>.
On Wed, May 23, 2018 at 12:04 PM, raksja <sh...@gmail.com> wrote:
> So InProcessLauncher wouldnt use the native memory, so will it overload the
> mem of parent process?
I will still use "native memory" (since the parent process will still
use memory), just less of it. But yes, it will use more memory in the
parent process.
> Is there any way that we can overcome this?
Try to launch less applications concurrently.
--
Marcelo
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org
Re: Submit many spark applications
Posted by raksja <sh...@gmail.com>.
Hi Marcelo,
I'm facing same issue when making spark-submits from an ec2 instance and
reaching native memory limit sooner. we have the #1, but we are still in
spark 2.1.0, couldnt try #2.
So InProcessLauncher wouldnt use the native memory, so will it overload the
mem of parent process?
Is there any way that we can overcome this?
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org
Re: Submit many spark applications
Posted by ayan guha <gu...@gmail.com>.
How about using Livy to submit jobs?
On Thu, 17 May 2018 at 7:24 am, Marcelo Vanzin <va...@cloudera.com> wrote:
> You can either:
>
> - set spark.yarn.submit.waitAppCompletion=false, which will make
> spark-submit go away once the app starts in cluster mode.
> - use the (new in 2.3) InProcessLauncher class + some custom Java code
> to submit all the apps from the same "launcher" process.
>
> On Wed, May 16, 2018 at 1:45 PM, Shiyuan <gs...@gmail.com> wrote:
> > Hi Spark-users,
> > I want to submit as many spark applications as the resources permit. I
> am
> > using cluster mode on a yarn cluster. Yarn can queue and launch these
> > applications without problems. The problem lies on spark-submit itself.
> > Spark-submit starts a jvm which could fail due to insufficient memory on
> the
> > machine where I run spark-submit if many spark-submit jvm are running.
> Any
> > suggestions on how to solve this problem? Thank you!
>
>
>
> --
> Marcelo
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
> --
Best Regards,
Ayan Guha
Re: Submit many spark applications
Posted by Marcelo Vanzin <va...@cloudera.com>.
You can either:
- set spark.yarn.submit.waitAppCompletion=false, which will make
spark-submit go away once the app starts in cluster mode.
- use the (new in 2.3) InProcessLauncher class + some custom Java code
to submit all the apps from the same "launcher" process.
On Wed, May 16, 2018 at 1:45 PM, Shiyuan <gs...@gmail.com> wrote:
> Hi Spark-users,
> I want to submit as many spark applications as the resources permit. I am
> using cluster mode on a yarn cluster. Yarn can queue and launch these
> applications without problems. The problem lies on spark-submit itself.
> Spark-submit starts a jvm which could fail due to insufficient memory on the
> machine where I run spark-submit if many spark-submit jvm are running. Any
> suggestions on how to solve this problem? Thank you!
--
Marcelo
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org
Re: Submit many spark applications
Posted by yncxcw <yn...@gmail.com>.
hi,
please try to reduce the default heap size for the machine you use to submit
applications:
For example:
export _JAVA_OPTIONS="-Xmx512M"
The submitter which is also a JVM does not need to reserve lots of memory.
Wei
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org