You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Megha Sharma <me...@gmail.com> on 2015/11/02 19:37:52 UTC

Re: MESOS-3545: Investigate restoring tasks/executors after machine reboot.

Hi All,
I was wondering if you got a chance to look at the design doc for mesos jira
- 3545 to handle restart of tasks/executors in the event of slave reboot or
disconnection from the master. Please take the time to comment.

https://issues.apache.org/jira/browse/MESOS-3545

Design doc:
https://docs.google.com/document/d/1l7goeISpYmCjM03l20lmjZ6_BMfdxBs31znEBRtzsuU/edit#heading=h.1i2fqek1ko3e

Thanks
Megha Sharma

On Fri, Oct 23, 2015 at 10:17 AM, Megha Sharma <
megha.hitesh.sharma@gmail.com> wrote:

> Hi All,
>
> I have posted the initial design draft for mesos jira - 3545 to handle
> restart of tasks/executors in the event of slave reboot or disconnection
> from the master. Please take the time to comment or provide feedback.
>
> https://issues.apache.org/jira/browse/MESOS-3545
>
> Thanks
> Megha Sharma
>
>
>

Re: MESOS-3545: Investigate restoring tasks/executors after machine reboot.

Posted by Benjamin Mahler <be...@gmail.com>.
Any reason that this doesn't mention the executor failing during the steady
state? I assume that the desire here is more generally to restart executors
according to a policy, without having the round-trip back to the scheduler
which may not be successful in many circumstances.

Also, any reason that this is focused on tasks instead of executors? It's
not clear to me what the semantics around restarting tasks are. Currently
we only persist a stripped version of TaskInfo called Task, which makes
task re-delivery impossible. Even if we persisted the potentially large
TaskInfos, does it make sense to re-deliver them? That seems to suggest
tasks are idempotent in the executor? If we don't re-deliver, are the
executors expected to checkpoint task state themselves across their own
restarts? When the executor restarts, are all the tasks considered
restarted but still not terminal?

Have you explored whether it makes sense to have the executor be
restartable vs the notion of a "persistent task"?

On Fri, Nov 6, 2015 at 12:10 PM, Anindya Sinha <an...@gmail.com>
wrote:

> As discussed with couple of folks yesterday, I just wanted to surface this
> thread to the top of  the dev@mesos list. I would really appreciate if we
> could have some attention on this proposal so that we can make progress on
> this JIRA.
>
> Thanks
> Anindya/Megha
>
> On Mon, Nov 2, 2015 at 10:37 AM, Megha Sharma <
> megha.hitesh.sharma@gmail.com
> > wrote:
>
> > Hi All,
> > I was wondering if you got a chance to look at the design doc for mesos
> > jira
> > - 3545 to handle restart of tasks/executors in the event of slave reboot
> or
> > disconnection from the master. Please take the time to comment.
> >
> > https://issues.apache.org/jira/browse/MESOS-3545
> >
> > Design doc:
> >
> >
> https://docs.google.com/document/d/1l7goeISpYmCjM03l20lmjZ6_BMfdxBs31znEBRtzsuU/edit#heading=h.1i2fqek1ko3e
> >
> > Thanks
> > Megha Sharma
> >
> > On Fri, Oct 23, 2015 at 10:17 AM, Megha Sharma <
> > megha.hitesh.sharma@gmail.com> wrote:
> >
> > > Hi All,
> > >
> > > I have posted the initial design draft for mesos jira - 3545 to handle
> > > restart of tasks/executors in the event of slave reboot or
> disconnection
> > > from the master. Please take the time to comment or provide feedback.
> > >
> > > https://issues.apache.org/jira/browse/MESOS-3545
> > >
> > > Thanks
> > > Megha Sharma
> > >
> > >
> > >
> >
>

Re: MESOS-3545: Investigate restoring tasks/executors after machine reboot.

Posted by Anindya Sinha <an...@gmail.com>.
As discussed with couple of folks yesterday, I just wanted to surface this
thread to the top of  the dev@mesos list. I would really appreciate if we
could have some attention on this proposal so that we can make progress on
this JIRA.

Thanks
Anindya/Megha

On Mon, Nov 2, 2015 at 10:37 AM, Megha Sharma <megha.hitesh.sharma@gmail.com
> wrote:

> Hi All,
> I was wondering if you got a chance to look at the design doc for mesos
> jira
> - 3545 to handle restart of tasks/executors in the event of slave reboot or
> disconnection from the master. Please take the time to comment.
>
> https://issues.apache.org/jira/browse/MESOS-3545
>
> Design doc:
>
> https://docs.google.com/document/d/1l7goeISpYmCjM03l20lmjZ6_BMfdxBs31znEBRtzsuU/edit#heading=h.1i2fqek1ko3e
>
> Thanks
> Megha Sharma
>
> On Fri, Oct 23, 2015 at 10:17 AM, Megha Sharma <
> megha.hitesh.sharma@gmail.com> wrote:
>
> > Hi All,
> >
> > I have posted the initial design draft for mesos jira - 3545 to handle
> > restart of tasks/executors in the event of slave reboot or disconnection
> > from the master. Please take the time to comment or provide feedback.
> >
> > https://issues.apache.org/jira/browse/MESOS-3545
> >
> > Thanks
> > Megha Sharma
> >
> >
> >
>