You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Anand Mazumdar <an...@mesosphere.io> on 2015/06/23 02:23:06 UTC

[Breaking Change 0.24, MESOS 1988] Silently ignore launchTask/acceptOffers calls when disconnected

Hi All,

We intend to introduce a breaking change [1] in the driver to silently ignore launchTasks/acceptOffers(…) calls when disconnected from the master in 0.24. The previous behavior was to send out “TASK_LOST” messages since there was no way to know that these task launches were dropped. However , with the advent of Task Reconciliation, this feature is redundant. Other calls like killTask/requestResource et al already had this behavior.

If your existing framework relied on this behavior, I would encourage you to use the Task Reconciliation API [2] in lieu of this feature/hack. Let me know if you have any queries/concerns.

Links:
[1] Tracking JIRA: https://issues.apache.org/jira/browse/MESOS-1988 <https://issues.apache.org/jira/browse/MESOS-1988>
[2] Task Reconciliation API : http://mesos.apache.org/documentation/latest/reconciliation/ <http://mesos.apache.org/documentation/latest/reconciliation/>

-anand

Re: [Breaking Change 0.24, MESOS 1988] Silently ignore launchTask/acceptOffers calls when disconnected

Posted by Anand Mazumdar <an...@mesosphere.io>.
Vinod can add a bit more color to it. 

This is not directly linked to the HTTP API per se, and hence was initially marked to be fixed for 0.22 version. However , it got delayed and it was decided to fix this behavior as part of the HTTP API epic primarily to ensure that future HTTP clients don't make/rely on the same erroneous promises.

-anand


> On Jun 22, 2015, at 6:22 PM, Benjamin Mahler <be...@gmail.com> wrote:
> 
> +vinod
> 
> Hm.. I can't tell from MESOS-1988, why is this required for the HTTP API? I
> see MESOS-1972 as a link to more context, but that is for validation. The
> disconnected case does not overlap with the master's validation logic, it
> is an artifact of the driver implementation (the scheduler can't tell when
> it's launch calls are enqueued behind a disconnected event).
> 
> On Mon, Jun 22, 2015 at 5:23 PM, Anand Mazumdar <anand@mesosphere.io <ma...@mesosphere.io>> wrote:
> 
>> Hi All,
>> 
>> We intend to introduce a breaking change [1] in the driver to silently
>> ignore launchTasks/acceptOffers(…) calls when disconnected from the master
>> in 0.24. The previous behavior was to send out “TASK_LOST” messages since
>> there was no way to know that these task launches were dropped. However ,
>> with the advent of Task Reconciliation, this feature is redundant. Other
>> calls like killTask/requestResource et al already had this behavior.
>> 
>> If your existing framework relied on this behavior, I would encourage you
>> to use the Task Reconciliation API [2] in lieu of this feature/hack. Let me
>> know if you have any queries/concerns.
>> 
>> Links:
>> [1] Tracking JIRA: https://issues.apache.org/jira/browse/MESOS-1988 <
>> https://issues.apache.org/jira/browse/MESOS-1988 <https://issues.apache.org/jira/browse/MESOS-1988>>
>> [2] Task Reconciliation API :
>> http://mesos.apache.org/documentation/latest/reconciliation/ <http://mesos.apache.org/documentation/latest/reconciliation/> <
>> http://mesos.apache.org/documentation/latest/reconciliation/ <http://mesos.apache.org/documentation/latest/reconciliation/>>
>> 
>> -anand


Re: [Breaking Change 0.24, MESOS 1988] Silently ignore launchTask/acceptOffers calls when disconnected

Posted by Benjamin Mahler <be...@gmail.com>.
+vinod

Hm.. I can't tell from MESOS-1988, why is this required for the HTTP API? I
see MESOS-1972 as a link to more context, but that is for validation. The
disconnected case does not overlap with the master's validation logic, it
is an artifact of the driver implementation (the scheduler can't tell when
it's launch calls are enqueued behind a disconnected event).

On Mon, Jun 22, 2015 at 5:23 PM, Anand Mazumdar <an...@mesosphere.io> wrote:

> Hi All,
>
> We intend to introduce a breaking change [1] in the driver to silently
> ignore launchTasks/acceptOffers(…) calls when disconnected from the master
> in 0.24. The previous behavior was to send out “TASK_LOST” messages since
> there was no way to know that these task launches were dropped. However ,
> with the advent of Task Reconciliation, this feature is redundant. Other
> calls like killTask/requestResource et al already had this behavior.
>
> If your existing framework relied on this behavior, I would encourage you
> to use the Task Reconciliation API [2] in lieu of this feature/hack. Let me
> know if you have any queries/concerns.
>
> Links:
> [1] Tracking JIRA: https://issues.apache.org/jira/browse/MESOS-1988 <
> https://issues.apache.org/jira/browse/MESOS-1988>
> [2] Task Reconciliation API :
> http://mesos.apache.org/documentation/latest/reconciliation/ <
> http://mesos.apache.org/documentation/latest/reconciliation/>
>
> -anand

Re: [Breaking Change 0.24, MESOS 1988] Silently ignore launchTask/acceptOffers calls when disconnected

Posted by Vinod Kone <vi...@gmail.com>.
Hi guys,

I asked Anand to sent that email out, so let me chime in.

First off, looks like the motivation for this change was not properly
communicated to the list. I was under the impression that we had a public
discussion about this when I originally introduced this behavior (back in
2013). But I can't seem to find such a thread, so maybe the discussions
were only internal. Sorry about that. I suggest we use this thread for
discussion.

Some background. This behavior was introduced originally as an optimization
because back then the scheduler driver didn't have a way to inform the
master about disconnections. So, for example, when the driver was
disconnected from master for prolonged periods (e.g., ZK is down), the
scheduler had no way of knowing that all its calls (including
launchTasks()) were being dropped. Since then we have added a few things to
the driver that reduces the need for this optimization. 1) disconnected()
callback was added to inform the scheduler of disconnection and 2)
reconciliation was added.

Admittedly, there are still races where a driver knows that it's
disconnected whereas the scheduler itself doesn't know it yet (e.g.,
disconnected() callback is queued). But this is no different from a race
where the disconnected event was itself queued on the driver, or the
launchTask message(s) was dropped after leaving the driver (e.g., master
goes down right when the message leaves the driver).

We wanted to remove this optimization because it's a bit hacky. For example
this is the only call that behaves differently when disconnected (e.g,
killTask drops the message silently when disconnected). The TASK_LOST
update's 'source' is set as 'SOURCE_MASTER', even though it is generated by
the driver. We also had to add a bunch of code in the driver to deal with
this special case status update. Finally, when we move to HTTP API there
will be no driver that schedulers will depend on. Note that a scheduler
using HTTP client to detect disconnections (in the HTTP API future) is very
similar to getting disconnected() from the driver. There are still races to
consider and it's best for the schedulers to be robust and consistent in
handling such cases.

Having said that, I guess it's OK for the schedulers to depend on this
crutch in the driver if they really want to. We can keep this hack in place
until we sunset the driver itself in favor of the HTTP API.

HTH,

On Tue, Jun 23, 2015 at 10:33 AM, Dave Lester <da...@davelester.org> wrote:

> Hi Marco and Anand,
>
> I see a difference between a brief conversation on the JIRA issue, and
> creating a separate thread to propose a breaking change -- particularly
> when it's one that affects framework writers who may not be active in
> the day-to-day changes of the core. Now that JIRA issue emails are sent
> to dev@ instead to issues@, I think it's even more-important that
> separate threads are created on dev@ to discuss such changes prior to
> having them committed.
>
> It looks like there's at least one comment on the JIRA issue since this
> notice went to the list, which is discussion that should really be
> happening up front rather than after it's committed.
>
> Lastly, I think it's important that we break out of the practice of
> encouraging folks to communicate off-list for changes like this. I
> understand your comment to "talk to BenH for greater detail" was meant
> with good intent, but I think it's important that we as a community
> collectively engage on the mailing list rather than relying on
> out-of-band communication when making decisions.
>
> Thanks for understanding and hearing me out!
>
> Dave
>
> On Tue, Jun 23, 2015, at 10:01 AM, Marco Massenzio wrote:
> > Hey Dave,
> >
> > sorry about the confusion, but the "deprecation cycle" is happening: this
> > change won't take place until 0.24 is out (as the title of this email
> > states); this will obviously be captured in the update notes from 0.23 to
> > 0.24: as you correctly pointed out, we wanted to give folks very early
> > notice of the impending change.
> >
> > The conversation has actually taken place on the MESOS-1988 ticket (
> > https://issues.apache.org/jira/browse/MESOS-1988) which also gets
> > forwarded
> > to the issues@ mailing list; this was also proposed and shepherded by
> > Vinod, so I would recommend you follow up with him if you want to further
> > clarify matters.
> >
> > In our limited understanding, this was an "undocumented" behavior so we
> > would expect the impact to be minor and the suggested solution to be a
> > more
> > desirable behavior.
> >
> > Please also feel free to reach out to Ben H to discuss in greater depth.
> >
> > Thanks for being vigilant!
> >
> > *Marco Massenzio*
> > *Distributed Systems Engineer*
> >
> > On Mon, Jun 22, 2015 at 9:38 PM, Dave Lester <da...@davelester.org>
> wrote:
> >
> > > Hi Anand,
> > >
> > > Was there a discussion thread on this?
> > >
> > > Breaking changes should only be introduced when the community has had a
> > > chance to discuss its impact and any necessary deprecation cycle -- I
> > > didn't see a discussion on the relevant thread, but perhaps I missed
> > > something?
> > >
> > > Thanks,
> > > Dave
> > >
> > > On Mon, Jun 22, 2015, at 05:23 PM, Anand Mazumdar wrote:
> > > > Hi All,
> > > >
> > > > We intend to introduce a breaking change [1] in the driver to
> silently
> > > > ignore launchTasks/acceptOffers(…) calls when disconnected from the
> > > > master in 0.24. The previous behavior was to send out “TASK_LOST”
> > > > messages since there was no way to know that these task launches were
> > > > dropped. However , with the advent of Task Reconciliation, this
> feature
> > > > is redundant. Other calls like killTask/requestResource et al
> already had
> > > > this behavior.
> > > >
> > > > If your existing framework relied on this behavior, I would
> encourage you
> > > > to use the Task Reconciliation API [2] in lieu of this feature/hack.
> Let
> > > > me know if you have any queries/concerns.
> > > >
> > > > Links:
> > > > [1] Tracking JIRA: https://issues.apache.org/jira/browse/MESOS-1988
> > > > <https://issues.apache.org/jira/browse/MESOS-1988>
> > > > [2] Task Reconciliation API :
> > > > http://mesos.apache.org/documentation/latest/reconciliation/
> > > > <http://mesos.apache.org/documentation/latest/reconciliation/>
> > > >
> > > > -anand
> > >
> > >
>

Re: [Breaking Change 0.24, MESOS 1988] Silently ignore launchTask/acceptOffers calls when disconnected

Posted by Dave Lester <da...@davelester.org>.
Hi Marco and Anand,

I see a difference between a brief conversation on the JIRA issue, and
creating a separate thread to propose a breaking change -- particularly
when it's one that affects framework writers who may not be active in
the day-to-day changes of the core. Now that JIRA issue emails are sent
to dev@ instead to issues@, I think it's even more-important that
separate threads are created on dev@ to discuss such changes prior to
having them committed. 

It looks like there's at least one comment on the JIRA issue since this
notice went to the list, which is discussion that should really be
happening up front rather than after it's committed.

Lastly, I think it's important that we break out of the practice of
encouraging folks to communicate off-list for changes like this. I
understand your comment to "talk to BenH for greater detail" was meant
with good intent, but I think it's important that we as a community
collectively engage on the mailing list rather than relying on
out-of-band communication when making decisions.

Thanks for understanding and hearing me out!

Dave

On Tue, Jun 23, 2015, at 10:01 AM, Marco Massenzio wrote:
> Hey Dave,
> 
> sorry about the confusion, but the "deprecation cycle" is happening: this
> change won't take place until 0.24 is out (as the title of this email
> states); this will obviously be captured in the update notes from 0.23 to
> 0.24: as you correctly pointed out, we wanted to give folks very early
> notice of the impending change.
> 
> The conversation has actually taken place on the MESOS-1988 ticket (
> https://issues.apache.org/jira/browse/MESOS-1988) which also gets
> forwarded
> to the issues@ mailing list; this was also proposed and shepherded by
> Vinod, so I would recommend you follow up with him if you want to further
> clarify matters.
> 
> In our limited understanding, this was an "undocumented" behavior so we
> would expect the impact to be minor and the suggested solution to be a
> more
> desirable behavior.
> 
> Please also feel free to reach out to Ben H to discuss in greater depth.
> 
> Thanks for being vigilant!
> 
> *Marco Massenzio*
> *Distributed Systems Engineer*
> 
> On Mon, Jun 22, 2015 at 9:38 PM, Dave Lester <da...@davelester.org> wrote:
> 
> > Hi Anand,
> >
> > Was there a discussion thread on this?
> >
> > Breaking changes should only be introduced when the community has had a
> > chance to discuss its impact and any necessary deprecation cycle -- I
> > didn't see a discussion on the relevant thread, but perhaps I missed
> > something?
> >
> > Thanks,
> > Dave
> >
> > On Mon, Jun 22, 2015, at 05:23 PM, Anand Mazumdar wrote:
> > > Hi All,
> > >
> > > We intend to introduce a breaking change [1] in the driver to silently
> > > ignore launchTasks/acceptOffers(…) calls when disconnected from the
> > > master in 0.24. The previous behavior was to send out “TASK_LOST”
> > > messages since there was no way to know that these task launches were
> > > dropped. However , with the advent of Task Reconciliation, this feature
> > > is redundant. Other calls like killTask/requestResource et al already had
> > > this behavior.
> > >
> > > If your existing framework relied on this behavior, I would encourage you
> > > to use the Task Reconciliation API [2] in lieu of this feature/hack. Let
> > > me know if you have any queries/concerns.
> > >
> > > Links:
> > > [1] Tracking JIRA: https://issues.apache.org/jira/browse/MESOS-1988
> > > <https://issues.apache.org/jira/browse/MESOS-1988>
> > > [2] Task Reconciliation API :
> > > http://mesos.apache.org/documentation/latest/reconciliation/
> > > <http://mesos.apache.org/documentation/latest/reconciliation/>
> > >
> > > -anand
> >
> >

Re: [Breaking Change 0.24, MESOS 1988] Silently ignore launchTask/acceptOffers calls when disconnected

Posted by Marco Massenzio <ma...@mesosphere.io>.
Hey Dave,

sorry about the confusion, but the "deprecation cycle" is happening: this
change won't take place until 0.24 is out (as the title of this email
states); this will obviously be captured in the update notes from 0.23 to
0.24: as you correctly pointed out, we wanted to give folks very early
notice of the impending change.

The conversation has actually taken place on the MESOS-1988 ticket (
https://issues.apache.org/jira/browse/MESOS-1988) which also gets forwarded
to the issues@ mailing list; this was also proposed and shepherded by
Vinod, so I would recommend you follow up with him if you want to further
clarify matters.

In our limited understanding, this was an "undocumented" behavior so we
would expect the impact to be minor and the suggested solution to be a more
desirable behavior.

Please also feel free to reach out to Ben H to discuss in greater depth.

Thanks for being vigilant!

*Marco Massenzio*
*Distributed Systems Engineer*

On Mon, Jun 22, 2015 at 9:38 PM, Dave Lester <da...@davelester.org> wrote:

> Hi Anand,
>
> Was there a discussion thread on this?
>
> Breaking changes should only be introduced when the community has had a
> chance to discuss its impact and any necessary deprecation cycle -- I
> didn't see a discussion on the relevant thread, but perhaps I missed
> something?
>
> Thanks,
> Dave
>
> On Mon, Jun 22, 2015, at 05:23 PM, Anand Mazumdar wrote:
> > Hi All,
> >
> > We intend to introduce a breaking change [1] in the driver to silently
> > ignore launchTasks/acceptOffers(…) calls when disconnected from the
> > master in 0.24. The previous behavior was to send out “TASK_LOST”
> > messages since there was no way to know that these task launches were
> > dropped. However , with the advent of Task Reconciliation, this feature
> > is redundant. Other calls like killTask/requestResource et al already had
> > this behavior.
> >
> > If your existing framework relied on this behavior, I would encourage you
> > to use the Task Reconciliation API [2] in lieu of this feature/hack. Let
> > me know if you have any queries/concerns.
> >
> > Links:
> > [1] Tracking JIRA: https://issues.apache.org/jira/browse/MESOS-1988
> > <https://issues.apache.org/jira/browse/MESOS-1988>
> > [2] Task Reconciliation API :
> > http://mesos.apache.org/documentation/latest/reconciliation/
> > <http://mesos.apache.org/documentation/latest/reconciliation/>
> >
> > -anand
>
>

Re: [Breaking Change 0.24, MESOS 1988] Silently ignore launchTask/acceptOffers calls when disconnected

Posted by Dave Lester <da...@davelester.org>.
Hi Anand,

Was there a discussion thread on this?

Breaking changes should only be introduced when the community has had a
chance to discuss its impact and any necessary deprecation cycle -- I
didn't see a discussion on the relevant thread, but perhaps I missed
something?

Thanks,
Dave

On Mon, Jun 22, 2015, at 05:23 PM, Anand Mazumdar wrote:
> Hi All,
> 
> We intend to introduce a breaking change [1] in the driver to silently
> ignore launchTasks/acceptOffers(…) calls when disconnected from the
> master in 0.24. The previous behavior was to send out “TASK_LOST”
> messages since there was no way to know that these task launches were
> dropped. However , with the advent of Task Reconciliation, this feature
> is redundant. Other calls like killTask/requestResource et al already had
> this behavior.
> 
> If your existing framework relied on this behavior, I would encourage you
> to use the Task Reconciliation API [2] in lieu of this feature/hack. Let
> me know if you have any queries/concerns.
> 
> Links:
> [1] Tracking JIRA: https://issues.apache.org/jira/browse/MESOS-1988
> <https://issues.apache.org/jira/browse/MESOS-1988>
> [2] Task Reconciliation API :
> http://mesos.apache.org/documentation/latest/reconciliation/
> <http://mesos.apache.org/documentation/latest/reconciliation/>
> 
> -anand