You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Tobias Pfeiffer <tg...@preferred.jp> on 2014/09/18 10:19:32 UTC

spark-submit: fire-and-forget mode?

Hi,

I am wondering: Is it possible to run spark-submit in a mode where it will
start an application on a YARN cluster (i.e., driver and executors run on
the cluster) and then forget about it in the sense that the Spark
application is completely independent from the host that ran the
spark-submit command and will not be affected if that controlling machine
shuts down etc.? I was using spark-submit with YARN in cluster mode, but
spark-submit stayed in the foreground and as far as I understood, it
terminated the application on the cluster when spark-submit was Ctrl+C'ed.

Thanks
Tobias

Re: spark-submit: fire-and-forget mode?

Posted by Tobias Pfeiffer <tg...@preferred.jp>.
Hi,

thanks for everyone's replies!

> On Thu, Sep 18, 2014 at 7:37 AM, Sandy Ryza <sa...@cloudera.com>
wrote:
>> YARN cluster mode should have the behavior you're looking for.  The
client
>> process will stick around to report on things, but should be able to be
>> killed without affecting the application.  If this isn't the behavior
you're
>> observing, and your application isn't failing for a different reason,
>> there's a bug.

Sandy, yes, you are right; I must have mis-interpreted some
results/behavior when I was trying this before.

On Thu, Sep 18, 2014 at 1:19 PM, Andrew Or <an...@databricks.com> wrote:

> Thanks Tobias, I have filed a JIRA for it.
>

Great, thanks for opening the issue! I think that's a very useful thing to
have.

Tobias

Re: spark-submit: fire-and-forget mode?

Posted by Nicholas Chammas <ni...@gmail.com>.
And for the record, the issue is here:
https://issues.apache.org/jira/browse/SPARK-3591

On Thu, Sep 18, 2014 at 1:19 PM, Andrew Or <an...@databricks.com> wrote:

> Thanks Tobias, I have filed a JIRA for it.
>
> 2014-09-18 10:09 GMT-07:00 Patrick Wendell <pw...@gmail.com>:
>
> I agree, that's a good idea Marcelo. There isn't AFAIK any reason the
>> client needs to hang there for correct operation.
>>
>> On Thu, Sep 18, 2014 at 9:39 AM, Marcelo Vanzin <va...@cloudera.com>
>> wrote:
>> > Yes, what Sandy said.
>> >
>> > On top of that, I would suggest filing a bug for a new command line
>> > argument for spark-submit to make the launcher process exit cleanly as
>> > soon as a cluster job starts successfully. That can be helpful for
>> > code that launches Spark jobs but monitors the job through different
>> > means.
>> >
>> > On Thu, Sep 18, 2014 at 7:37 AM, Sandy Ryza <sa...@cloudera.com>
>> wrote:
>> >> Hi Tobias,
>> >>
>> >> YARN cluster mode should have the behavior you're looking for.  The
>> client
>> >> process will stick around to report on things, but should be able to be
>> >> killed without affecting the application.  If this isn't the behavior
>> you're
>> >> observing, and your application isn't failing for a different reason,
>> >> there's a bug.
>> >>
>> >> -Sandy
>> >>
>> >> On Thu, Sep 18, 2014 at 10:20 AM, Nicholas Chammas
>> >> <ni...@gmail.com> wrote:
>> >>>
>> >>> Dunno about having the application be independent of whether
>> spark-submit
>> >>> is still alive, but you can have spark-submit run in a new session in
>> Linux
>> >>> using setsid.
>> >>>
>> >>> That way even if you terminate your SSH session, spark-submit will
>> keep
>> >>> running independently. Of course, if you terminate the host running
>> >>> spark-submit, you will still have problems.
>> >>>
>> >>>
>> >>> On Thu, Sep 18, 2014 at 4:19 AM, Tobias Pfeiffer <tg...@preferred.jp>
>> wrote:
>> >>>>
>> >>>> Hi,
>> >>>>
>> >>>> I am wondering: Is it possible to run spark-submit in a mode where it
>> >>>> will start an application on a YARN cluster (i.e., driver and
>> executors run
>> >>>> on the cluster) and then forget about it in the sense that the Spark
>> >>>> application is completely independent from the host that ran the
>> >>>> spark-submit command and will not be affected if that controlling
>> machine
>> >>>> shuts down etc.? I was using spark-submit with YARN in cluster mode,
>> but
>> >>>> spark-submit stayed in the foreground and as far as I understood, it
>> >>>> terminated the application on the cluster when spark-submit was
>> Ctrl+C'ed.
>> >>>>
>> >>>> Thanks
>> >>>> Tobias
>> >>>
>> >>>
>> >>
>> >
>> >
>> >
>> > --
>> > Marcelo
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> > For additional commands, e-mail: user-help@spark.apache.org
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>

Re: spark-submit: fire-and-forget mode?

Posted by Andrew Or <an...@databricks.com>.
Thanks Tobias, I have filed a JIRA for it.

2014-09-18 10:09 GMT-07:00 Patrick Wendell <pw...@gmail.com>:

> I agree, that's a good idea Marcelo. There isn't AFAIK any reason the
> client needs to hang there for correct operation.
>
> On Thu, Sep 18, 2014 at 9:39 AM, Marcelo Vanzin <va...@cloudera.com>
> wrote:
> > Yes, what Sandy said.
> >
> > On top of that, I would suggest filing a bug for a new command line
> > argument for spark-submit to make the launcher process exit cleanly as
> > soon as a cluster job starts successfully. That can be helpful for
> > code that launches Spark jobs but monitors the job through different
> > means.
> >
> > On Thu, Sep 18, 2014 at 7:37 AM, Sandy Ryza <sa...@cloudera.com>
> wrote:
> >> Hi Tobias,
> >>
> >> YARN cluster mode should have the behavior you're looking for.  The
> client
> >> process will stick around to report on things, but should be able to be
> >> killed without affecting the application.  If this isn't the behavior
> you're
> >> observing, and your application isn't failing for a different reason,
> >> there's a bug.
> >>
> >> -Sandy
> >>
> >> On Thu, Sep 18, 2014 at 10:20 AM, Nicholas Chammas
> >> <ni...@gmail.com> wrote:
> >>>
> >>> Dunno about having the application be independent of whether
> spark-submit
> >>> is still alive, but you can have spark-submit run in a new session in
> Linux
> >>> using setsid.
> >>>
> >>> That way even if you terminate your SSH session, spark-submit will keep
> >>> running independently. Of course, if you terminate the host running
> >>> spark-submit, you will still have problems.
> >>>
> >>>
> >>> On Thu, Sep 18, 2014 at 4:19 AM, Tobias Pfeiffer <tg...@preferred.jp>
> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> I am wondering: Is it possible to run spark-submit in a mode where it
> >>>> will start an application on a YARN cluster (i.e., driver and
> executors run
> >>>> on the cluster) and then forget about it in the sense that the Spark
> >>>> application is completely independent from the host that ran the
> >>>> spark-submit command and will not be affected if that controlling
> machine
> >>>> shuts down etc.? I was using spark-submit with YARN in cluster mode,
> but
> >>>> spark-submit stayed in the foreground and as far as I understood, it
> >>>> terminated the application on the cluster when spark-submit was
> Ctrl+C'ed.
> >>>>
> >>>> Thanks
> >>>> Tobias
> >>>
> >>>
> >>
> >
> >
> >
> > --
> > Marcelo
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> > For additional commands, e-mail: user-help@spark.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Re: spark-submit: fire-and-forget mode?

Posted by Patrick Wendell <pw...@gmail.com>.
I agree, that's a good idea Marcelo. There isn't AFAIK any reason the
client needs to hang there for correct operation.

On Thu, Sep 18, 2014 at 9:39 AM, Marcelo Vanzin <va...@cloudera.com> wrote:
> Yes, what Sandy said.
>
> On top of that, I would suggest filing a bug for a new command line
> argument for spark-submit to make the launcher process exit cleanly as
> soon as a cluster job starts successfully. That can be helpful for
> code that launches Spark jobs but monitors the job through different
> means.
>
> On Thu, Sep 18, 2014 at 7:37 AM, Sandy Ryza <sa...@cloudera.com> wrote:
>> Hi Tobias,
>>
>> YARN cluster mode should have the behavior you're looking for.  The client
>> process will stick around to report on things, but should be able to be
>> killed without affecting the application.  If this isn't the behavior you're
>> observing, and your application isn't failing for a different reason,
>> there's a bug.
>>
>> -Sandy
>>
>> On Thu, Sep 18, 2014 at 10:20 AM, Nicholas Chammas
>> <ni...@gmail.com> wrote:
>>>
>>> Dunno about having the application be independent of whether spark-submit
>>> is still alive, but you can have spark-submit run in a new session in Linux
>>> using setsid.
>>>
>>> That way even if you terminate your SSH session, spark-submit will keep
>>> running independently. Of course, if you terminate the host running
>>> spark-submit, you will still have problems.
>>>
>>>
>>> On Thu, Sep 18, 2014 at 4:19 AM, Tobias Pfeiffer <tg...@preferred.jp> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I am wondering: Is it possible to run spark-submit in a mode where it
>>>> will start an application on a YARN cluster (i.e., driver and executors run
>>>> on the cluster) and then forget about it in the sense that the Spark
>>>> application is completely independent from the host that ran the
>>>> spark-submit command and will not be affected if that controlling machine
>>>> shuts down etc.? I was using spark-submit with YARN in cluster mode, but
>>>> spark-submit stayed in the foreground and as far as I understood, it
>>>> terminated the application on the cluster when spark-submit was Ctrl+C'ed.
>>>>
>>>> Thanks
>>>> Tobias
>>>
>>>
>>
>
>
>
> --
> Marcelo
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: spark-submit: fire-and-forget mode?

Posted by Marcelo Vanzin <va...@cloudera.com>.
Yes, what Sandy said.

On top of that, I would suggest filing a bug for a new command line
argument for spark-submit to make the launcher process exit cleanly as
soon as a cluster job starts successfully. That can be helpful for
code that launches Spark jobs but monitors the job through different
means.

On Thu, Sep 18, 2014 at 7:37 AM, Sandy Ryza <sa...@cloudera.com> wrote:
> Hi Tobias,
>
> YARN cluster mode should have the behavior you're looking for.  The client
> process will stick around to report on things, but should be able to be
> killed without affecting the application.  If this isn't the behavior you're
> observing, and your application isn't failing for a different reason,
> there's a bug.
>
> -Sandy
>
> On Thu, Sep 18, 2014 at 10:20 AM, Nicholas Chammas
> <ni...@gmail.com> wrote:
>>
>> Dunno about having the application be independent of whether spark-submit
>> is still alive, but you can have spark-submit run in a new session in Linux
>> using setsid.
>>
>> That way even if you terminate your SSH session, spark-submit will keep
>> running independently. Of course, if you terminate the host running
>> spark-submit, you will still have problems.
>>
>>
>> On Thu, Sep 18, 2014 at 4:19 AM, Tobias Pfeiffer <tg...@preferred.jp> wrote:
>>>
>>> Hi,
>>>
>>> I am wondering: Is it possible to run spark-submit in a mode where it
>>> will start an application on a YARN cluster (i.e., driver and executors run
>>> on the cluster) and then forget about it in the sense that the Spark
>>> application is completely independent from the host that ran the
>>> spark-submit command and will not be affected if that controlling machine
>>> shuts down etc.? I was using spark-submit with YARN in cluster mode, but
>>> spark-submit stayed in the foreground and as far as I understood, it
>>> terminated the application on the cluster when spark-submit was Ctrl+C'ed.
>>>
>>> Thanks
>>> Tobias
>>
>>
>



-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: spark-submit: fire-and-forget mode?

Posted by Sandy Ryza <sa...@cloudera.com>.
Hi Tobias,

YARN cluster mode should have the behavior you're looking for.  The client
process will stick around to report on things, but should be able to be
killed without affecting the application.  If this isn't the behavior
you're observing, and your application isn't failing for a different
reason, there's a bug.

-Sandy

On Thu, Sep 18, 2014 at 10:20 AM, Nicholas Chammas <
nicholas.chammas@gmail.com> wrote:

> Dunno about having the application be independent of whether spark-submit
> is still alive, but you can have spark-submit run in a new session in
> Linux using setsid <http://unix.stackexchange.com/a/28877/70630>.
>
> That way even if you terminate your SSH session, spark-submit will keep
> running independently. Of course, if you terminate the host running
> spark-submit, you will still have problems.
> ​
>
> On Thu, Sep 18, 2014 at 4:19 AM, Tobias Pfeiffer <tg...@preferred.jp> wrote:
>
>> Hi,
>>
>> I am wondering: Is it possible to run spark-submit in a mode where it
>> will start an application on a YARN cluster (i.e., driver and executors run
>> on the cluster) and then forget about it in the sense that the Spark
>> application is completely independent from the host that ran the
>> spark-submit command and will not be affected if that controlling machine
>> shuts down etc.? I was using spark-submit with YARN in cluster mode, but
>> spark-submit stayed in the foreground and as far as I understood, it
>> terminated the application on the cluster when spark-submit was Ctrl+C'ed.
>>
>> Thanks
>> Tobias
>>
>
>

Re: spark-submit: fire-and-forget mode?

Posted by Nicholas Chammas <ni...@gmail.com>.
Dunno about having the application be independent of whether spark-submit
is still alive, but you can have spark-submit run in a new session in Linux
using setsid <http://unix.stackexchange.com/a/28877/70630>.

That way even if you terminate your SSH session, spark-submit will keep
running independently. Of course, if you terminate the host running
spark-submit, you will still have problems.
​

On Thu, Sep 18, 2014 at 4:19 AM, Tobias Pfeiffer <tg...@preferred.jp> wrote:

> Hi,
>
> I am wondering: Is it possible to run spark-submit in a mode where it will
> start an application on a YARN cluster (i.e., driver and executors run on
> the cluster) and then forget about it in the sense that the Spark
> application is completely independent from the host that ran the
> spark-submit command and will not be affected if that controlling machine
> shuts down etc.? I was using spark-submit with YARN in cluster mode, but
> spark-submit stayed in the foreground and as far as I understood, it
> terminated the application on the cluster when spark-submit was Ctrl+C'ed.
>
> Thanks
> Tobias
>