You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by jpuro <jp...@mustwin.com> on 2016/09/29 22:46:51 UTC

Running Spark master/slave instances in non Daemon mode

Hi,

I recently tried deploying Spark master and slave instances to container
based environments such as Docker, Nomad etc. There are two issues that I've
found with how the startup scripts work. The sbin/start-master.sh and
sbin/start-slave.sh start a daemon by default, but this isn't as compatible
with container deployments as one would think. The first issue is that the
daemon runs in the background and some container solutions require the apps
to run in the foreground or they consider the application to not be running
and they may close down the task. The second issue is that logs don't seem
to get integrated with the logging mechanism in the container solution. What
is the possibility of adding additional flags or startup scripts for
supporting Spark to run in the foreground? It would be great if a flag like
SPARK_NO_DAEMONIZE could be added or another script for foreground
execution.

Regards,

Jeff



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Running-Spark-master-slave-instances-in-non-Daemon-mode-tp19172.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: Running Spark master/slave instances in non Daemon mode

Posted by Jakob Odersky <ja...@odersky.com>.
Hi Mike,
I can imagine the trouble that daemonization is causing and I think
that having non-forking start script is a good idea. A simple,
non-intrusive, fix could be to change the "spark-daemon.sh" script to
conditionally omit the "nohup &".
Personally, I think the semantically correct approach would be to also
rename "spark-daemon" to something else (since it won't necessarily
start a background process anymore), however that may have the
potential to break things, in which case it is probably not worth
cosmetic rename.

best,
--Jakob


On Thu, Sep 29, 2016 at 6:47 PM, Mike Ihbe <mi...@mustwin.com> wrote:
> Our particular use case is for Nomad, using the "exec" configuration
> described here: https://www.nomadproject.io/docs/drivers/exec.html. It's not
> exactly a container, just a cgroup. It performs a simple fork/exec of a
> command and binds to the output fds from that process, so daemonizing is
> causing us minor hardship and seems like an easy thing to make optional.
> We'd be happy to make the PR as well.
>
> --Mike
>
> On Thu, Sep 29, 2016 at 5:25 PM, Jakob Odersky <ja...@odersky.com> wrote:
>>
>> I'm curious, what kind of container solutions require foreground
>> processes? Most init systems work fine with "starter" processes that
>> run other processes. IIRC systemd and start-stop-daemon have an option
>> called "fork", that will expect the main process to run another one in
>> the background and only consider the former complete when the latter
>> exits. I'm not against having a non-forking start script, I'm just
>> wondering where you'd run into issues.
>>
>> Regarding the logging, would it be an option to create a custom slf4j
>> logger that uses the standard mechanisms exposed by the system?
>>
>> best,
>> --Jakob
>>
>> On Thu, Sep 29, 2016 at 3:46 PM, jpuro <jp...@mustwin.com> wrote:
>> > Hi,
>> >
>> > I recently tried deploying Spark master and slave instances to container
>> > based environments such as Docker, Nomad etc. There are two issues that
>> > I've
>> > found with how the startup scripts work. The sbin/start-master.sh and
>> > sbin/start-slave.sh start a daemon by default, but this isn't as
>> > compatible
>> > with container deployments as one would think. The first issue is that
>> > the
>> > daemon runs in the background and some container solutions require the
>> > apps
>> > to run in the foreground or they consider the application to not be
>> > running
>> > and they may close down the task. The second issue is that logs don't
>> > seem
>> > to get integrated with the logging mechanism in the container solution.
>> > What
>> > is the possibility of adding additional flags or startup scripts for
>> > supporting Spark to run in the foreground? It would be great if a flag
>> > like
>> > SPARK_NO_DAEMONIZE could be added or another script for foreground
>> > execution.
>> >
>> > Regards,
>> >
>> > Jeff
>> >
>> >
>> >
>> > --
>> > View this message in context:
>> > http://apache-spark-developers-list.1001551.n3.nabble.com/Running-Spark-master-slave-instances-in-non-Daemon-mode-tp19172.html
>> > Sent from the Apache Spark Developers List mailing list archive at
>> > Nabble.com.
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>
>
>
>
> --
> Mike Ihbe
> MustWin - Principal
>
> mike@mustwin.com
> mikejihbe@gmail.com
> skype: mikeihbe
> Cell: 651.283.0815

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: Running Spark master/slave instances in non Daemon mode

Posted by Mike Ihbe <mi...@mustwin.com>.
Our particular use case is for Nomad, using the "exec" configuration
described here: https://www.nomadproject.io/docs/drivers/exec.html. It's
not exactly a container, just a cgroup. It performs a simple fork/exec of a
command and binds to the output fds from that process, so daemonizing is
causing us minor hardship and seems like an easy thing to make optional.
We'd be happy to make the PR as well.

--Mike

On Thu, Sep 29, 2016 at 5:25 PM, Jakob Odersky <ja...@odersky.com> wrote:

> I'm curious, what kind of container solutions require foreground
> processes? Most init systems work fine with "starter" processes that
> run other processes. IIRC systemd and start-stop-daemon have an option
> called "fork", that will expect the main process to run another one in
> the background and only consider the former complete when the latter
> exits. I'm not against having a non-forking start script, I'm just
> wondering where you'd run into issues.
>
> Regarding the logging, would it be an option to create a custom slf4j
> logger that uses the standard mechanisms exposed by the system?
>
> best,
> --Jakob
>
> On Thu, Sep 29, 2016 at 3:46 PM, jpuro <jp...@mustwin.com> wrote:
> > Hi,
> >
> > I recently tried deploying Spark master and slave instances to container
> > based environments such as Docker, Nomad etc. There are two issues that
> I've
> > found with how the startup scripts work. The sbin/start-master.sh and
> > sbin/start-slave.sh start a daemon by default, but this isn't as
> compatible
> > with container deployments as one would think. The first issue is that
> the
> > daemon runs in the background and some container solutions require the
> apps
> > to run in the foreground or they consider the application to not be
> running
> > and they may close down the task. The second issue is that logs don't
> seem
> > to get integrated with the logging mechanism in the container solution.
> What
> > is the possibility of adding additional flags or startup scripts for
> > supporting Spark to run in the foreground? It would be great if a flag
> like
> > SPARK_NO_DAEMONIZE could be added or another script for foreground
> > execution.
> >
> > Regards,
> >
> > Jeff
> >
> >
> >
> > --
> > View this message in context: http://apache-spark-developers
> -list.1001551.n3.nabble.com/Running-Spark-master-slave-
> instances-in-non-Daemon-mode-tp19172.html
> > Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>


-- 
Mike Ihbe
MustWin - Principal

mike@mustwin.com
mikejihbe@gmail.com
skype: mikeihbe
Cell: 651.283.0815

Re: Running Spark master/slave instances in non Daemon mode

Posted by Jakob Odersky <ja...@odersky.com>.
I'm curious, what kind of container solutions require foreground
processes? Most init systems work fine with "starter" processes that
run other processes. IIRC systemd and start-stop-daemon have an option
called "fork", that will expect the main process to run another one in
the background and only consider the former complete when the latter
exits. I'm not against having a non-forking start script, I'm just
wondering where you'd run into issues.

Regarding the logging, would it be an option to create a custom slf4j
logger that uses the standard mechanisms exposed by the system?

best,
--Jakob

On Thu, Sep 29, 2016 at 3:46 PM, jpuro <jp...@mustwin.com> wrote:
> Hi,
>
> I recently tried deploying Spark master and slave instances to container
> based environments such as Docker, Nomad etc. There are two issues that I've
> found with how the startup scripts work. The sbin/start-master.sh and
> sbin/start-slave.sh start a daemon by default, but this isn't as compatible
> with container deployments as one would think. The first issue is that the
> daemon runs in the background and some container solutions require the apps
> to run in the foreground or they consider the application to not be running
> and they may close down the task. The second issue is that logs don't seem
> to get integrated with the logging mechanism in the container solution. What
> is the possibility of adding additional flags or startup scripts for
> supporting Spark to run in the foreground? It would be great if a flag like
> SPARK_NO_DAEMONIZE could be added or another script for foreground
> execution.
>
> Regards,
>
> Jeff
>
>
>
> --
> View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Running-Spark-master-slave-instances-in-non-Daemon-mode-tp19172.html
> Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org