You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@aurora.apache.org by Mohit Jaggi <mo...@uber.com> on 2017/12/10 00:11:08 UTC

shutdown vs kill API is Mesos

Folks,
Our Mesos team is wondering why Aurora chose KILL over SHUTDOWN for killing
tasks. As Aurora has an executor per task, won't SHUTDOWN work better? It
will avoid zombie executors.

Mohit.

Re: shutdown vs kill API is Mesos

Posted by Mohit Jaggi <mo...@uber.com>.
I understand. You don't agree with the second point of the summary. What
about this:

If I change Driver.kill it to have a method Driver.destroy that calls
either KILL or SHUTDOWN as follows:

void destroy(taskId, executorId, agentId) {

if(driver instanceOf Versioned....)
   driver.shutdown(executorId, agentId)
else
   driver.kill(taskId)

}

Note the change in the signature to include 2 more params...

Any other ideas?


On Fri, Jan 12, 2018 at 11:39 AM, David McLaughlin <dm...@apache.org>
wrote:

> I'm not sure I agree with the summary. Bill's proposal was using shutdown
> only when using the new API. I would also support this if it's possible.
>
> On Fri, Jan 12, 2018 at 11:14 AM, Mohit Jaggi <mo...@uber.com>
> wrote:
>
>> Summary so far:
>> - Bill supports making this change
>> - This change cannot be made in a backward compatible manner
>> - David (Twitter) does not want to use HTTP APIs due to performance
>> concerns. I conclude that folks from Twitter don't support this change
>>
>> Question:
>> - Are there other users that want this change?
>>
>>
>>
>

Re: shutdown vs kill API is Mesos

Posted by Mohit Jaggi <mo...@uber.com>.
Thanks Stephan. Please read inline.

On Sat, Jan 20, 2018 at 5:03 AM, Stephan Erb <se...@apache.org> wrote:

> Q1: Does Aurora use COMMAND or DEFAULT executor?
>
>
> Aurora is currently using neither. In Mesos terms Thermos is a CUSTOM
> executor. On top, Aurora supports alternative custom executors [1] such as
> the Docker compose executor [2].
>
> Mesos seems to be betting on the new DEFAULT executor. It should be
> possible to make Thermos fit the DEFAULT executor model (as it supports
> task groups), but I have no real estimate how much refactoring this would
> require.
>
>
This was about a point Bill made earlier. I am wondering if "without an
executor" is COMMAND or DEFAULT.
```

> But do we really need the command line option?


*Aurora can run tasks without an executor.*  I'm assuming the shutdown call
is incompatible with that mode.
```



>
> Q2: I think that this is ok as Aurora's reconciliation will still work...
> Right?
>
>
> Aurora assumes a correspondence of one task per executor, so I believe
> this is correct.
>
>
Great.


> Q3: Does thermos executor need any changes to respond to SHUTDOWN or does
> it already handle that?
>
>
> I have never tried it, but I believe it should work out of the box [3].
>

Indeed looks like it is already handled.


> [1] https://github.com/apache/aurora/blob/master/docs/
> features/custom-executors.md
> [2] https://github.com/mesos/docker-compose-executor
> [3] https://github.com/apache/aurora/blob/8af269f52f162faa36cd2778979626
> eefcbe8181/src/main/python/apache/aurora/executor/aurora_
> executor.py#L301-L313
>
>
> Best regards,
> Stephan
>
>
> On Wed, 2018-01-17 at 16:45 -0800, Mohit Jaggi wrote:
>
> FYI....I had a quick chat with Vinod from the Mesos team. I have some
> questions for Aurora users inline:
>
>
> *Originally the default was the COMMAND executor. In this world the
> scheduler has no visibility into the command executor.*
> *More recently, we added a DEFAULT executor which is used by frameworks
> when they want to launch pod like task groups*
>
> *The SHUTDOWN executor call is only applicable if a scheduler uses CUSTOM
> or DEFAULT executor *and* uses v1 scheduler API.*
>
> Q1: Does Aurora use COMMAND or DEFAULT executor?
>
>
> *note that SHUTDOWN is not as robust as you might think
> :slightly_smiling_face:*
> *for one, there is no reconciliation API for the executor state. it is
> very much best effort. *
> *KILL is more robust for killing tasks, because task status updates are
> reliably delivered and there is reconciliation API*
>
> Q2: I think that this is ok as Aurora's reconciliation will still work as
> we don't have "executor state". "task state" will be a good and correct
> proxy for that. Aurora will send SHUTDOWN again and again until it succeeds
> in the same way as it does now with KILL. Right?
>
> Q3: Does thermos executor need any changes to respond to SHUTDOWN or does
> it already handle that?
>
>
>
>
> On Tue, Jan 16, 2018 at 4:48 PM, Mohit Jaggi <mo...@uber.com> wrote:
>
> So that is pretty much what I proposed...
>
> If the method signature has to change, we can keep the executorId as it
> is, unless we want to take this opportunity to clean that up. I will check
> if the SHUTDOWN works in non-executor cases also.
>
> On Tue, Jan 16, 2018 at 3:03 PM, Bill Farner <wf...@apache.org> wrote:
>
> We still need "Agent ID" for the shutdown call.
>
>
> Darn.  In that case, how about we change the method signature in Driver to
> accept agentId and ignore that param in MesosSchedulerDriver.
>
> But do we really need the command line option?
>
>
> Aurora can run tasks without an executor.  I'm assuming the shutdown call
> is incompatible with that mode.
>
> On Tue, Jan 16, 2018 at 1:57 PM, Mohit Jaggi <mo...@uber.com> wrote:
>
> We still need "Agent ID" for the shutdown call.
>
> On Tue, Jan 16, 2018 at 1:57 PM, Mohit Jaggi <mo...@uber.com> wrote:
>
> Sounds good. But do we really need the command line option? One can use an
> older Driver if KILL is preferred for some reason.
>
> On Tue, Jan 16, 2018 at 1:51 PM, Bill Farner <wf...@apache.org> wrote:
>
> This situation is much simpler if task ID == executor ID.  I can't come up
> with a good reason why this is not the case today.  Our executor IDs
> originally included static prefix, though i do not recall any rationale for
> this.  When Renan added custom executor support, this static prefix was
> made configurable.  Again, i do not believe there was any rationale for the
> utility of executor IDs.
>
> I propose the following:
> - Change relevant code in MesosTaskFactory to
> setExecutorId(task.getTaskId())
> - Add a command line parameter (default false) to toggle use of executor
> shutdown in VersionedSchedulerDriverService.killTask
>
> Does anyone see an issue with this approach?
>
> On Tue, Jan 16, 2018 at 11:15 AM, Mohit Jaggi <mo...@uber.com>
> wrote:
>
> To do this in a backward compatible manner, one way is :
>
> ```
> void destroy(taskId, executorId, agentId) {
>
> if(driver instanceOf Versioned....)
>    (Versioned...)driver.shutdown(executorId, agentId)
> else
>    driver.kill(taskId)
>
> }
> ```
>
> Any other opinions?
>
> On Tue, Jan 16, 2018 at 11:12 AM, David McLaughlin <dmclaughlin@apache.org
> > wrote:
>
> Nope, I support getting SHUTDOWN in for users of the new API.
>
> On Tue, Jan 16, 2018 at 11:06 AM, Mohit Jaggi <mo...@uber.com>
> wrote:
>
> Are you suggesting that we delay the switch to SHUTDOWN call until this
> working group can resolve the API perf issue?
>
> On Mon, Jan 15, 2018 at 3:55 PM, David McLaughlin <dm...@apache.org>
> wrote:
>
> We are working with Mesos folks to resolve it. There is a Mesos
> performance working group that folks can join if they'd like to contribute:
> http://mesos.apache.org/blog/performance-working-group-progress-report/
>
> I'm not sure what you mean by branch. Everything we used to scale test is
> on master.
>
> On Mon, Jan 15, 2018 at 10:08 AM, Meghdoot bhattacharya <
> meghdoot_b@yahoo.com> wrote:
>
> David, should twitter try against mesos 1.5 to see if things are better
> with the new api instead of libmesos. This is going to be a drift over time
> that will stop us from adopting new features.
>
> If it was sometime back it would be good to rerun the tests and open a
> ticket in Mesos if issues exist. All aurora users can then push for
> resolution.
>
> Also details on branch etc that has the api integration?
>
> Thx
>
> On Jan 12, 2018, at 11:39 AM, David McLaughlin <dm...@apache.org>
> wrote:
>
> I'm not sure I agree with the summary. Bill's proposal was using shutdown
> only when using the new API. I would also support this if it's possible.
>
> On Fri, Jan 12, 2018 at 11:14 AM, Mohit Jaggi <mo...@uber.com>
> wrote:
>
> Summary so far:
> - Bill supports making this change
> - This change cannot be made in a backward compatible manner
> - David (Twitter) does not want to use HTTP APIs due to performance
> concerns. I conclude that folks from Twitter don't support this change
>
> Question:
> - Are there other users that want this change?
>
>
>
>
>
>
>
>
>
>
>
>
>
>

Re: shutdown vs kill API is Mesos

Posted by Stephan Erb <se...@apache.org>.
> Q1: Does Aurora use COMMAND or DEFAULT executor? 

Aurora is currently using neither. In Mesos terms Thermos is a CUSTOM executor. On top, Aurora supports alternative custom executors [1] such as the Docker compose executor [2].
Mesos seems to be betting on the new DEFAULT executor. It should be possible to make Thermos fit the DEFAULT executor model (as it supports task groups), but I have no real estimate how much refactoring this would require. 

> Q2: I think that this is ok as Aurora's reconciliation will still work... Right?

Aurora assumes a correspondence of one task per executor, so I believe this is correct.

> Q3: Does thermos executor need any changes to respond to SHUTDOWN or does it already handle that?

I have never tried it, but I believe it should work out of the box [3].

[1] https://github.com/apache/aurora/blob/master/docs/features/custom-executors.md
[2] https://github.com/mesos/docker-compose-executor
[3] https://github.com/apache/aurora/blob/8af269f52f162faa36cd2778979626eefcbe8181/src/main/python/apache/aurora/executor/aurora_executor.py#L301-L313



Best regards,
Stephan


On Wed, 2018-01-17 at 16:45 -0800, Mohit Jaggi wrote:
> FYI....I had a quick chat with Vinod from the Mesos team. I have some questions for Aurora users inline:
> 
> Originally the default was the COMMAND executor. In this world the scheduler has no visibility into the command executor.
> More recently, we added a DEFAULT executor which is used by frameworks when they want to launch pod like task groups
> The SHUTDOWN executor call is only applicable if a scheduler uses CUSTOM or DEFAULT executor *and* uses v1 scheduler API.
> 
> 
> Q1: Does Aurora use COMMAND or DEFAULT executor? 
> 
> 
> note that SHUTDOWN is not as robust as you might think :slightly_smiling_face:
> for one, there is no reconciliation API for the executor state. it is very much best effort. 
> KILL is more robust for killing tasks, because task status updates are reliably delivered and there is reconciliation API
> 
> Q2: I think that this is ok as Aurora's reconciliation will still work as we don't have "executor state". "task state" will be a good and correct proxy for that. Aurora will send SHUTDOWN again and again until it succeeds in the same way as it does now with KILL. Right?
> 
> Q3: Does thermos executor need any changes to respond to SHUTDOWN or does it already handle that?
> 
> 
> 
> 
> On Tue, Jan 16, 2018 at 4:48 PM, Mohit Jaggi <mo...@uber.com> wrote:
> > So that is pretty much what I proposed...
> > If the method signature has to change, we can keep the executorId as it is, unless we want to take this opportunity to clean that up. I will check if the SHUTDOWN works in non-executor cases also.
> > 
> > On Tue, Jan 16, 2018 at 3:03 PM, Bill Farner <wf...@apache.org> wrote:
> > > > We still need "Agent ID" for the shutdown call.
> > > 
> > > Darn.  In that case, how about we change the method signature in Driver to accept agentId and ignore that param in MesosSchedulerDriver.
> > > > But do we really need the command line option?
> > > 
> > > Aurora can run tasks without an executor.  I'm assuming the shutdown call is incompatible with that mode.
> > > 
> > > On Tue, Jan 16, 2018 at 1:57 PM, Mohit Jaggi <mo...@uber.com> wrote:
> > > > We still need "Agent ID" for the shutdown call.
> > > > 
> > > > On Tue, Jan 16, 2018 at 1:57 PM, Mohit Jaggi <mo...@uber.com> wrote:
> > > > > Sounds good. But do we really need the command line option? One can use an older Driver if KILL is preferred for some reason.
> > > > > 
> > > > > On Tue, Jan 16, 2018 at 1:51 PM, Bill Farner <wf...@apache.org> wrote:
> > > > > > This situation is much simpler if task ID == executor ID.  I can't come up with a good reason why this is not the case today.  Our executor IDs originally included static prefix, though i do not recall any rationale for this.  When Renan added custom executor support, this static prefix was made configurable.  Again, i do not believe there was any rationale for the utility of executor IDs.
> > > > > > I propose the following:
> > > > > > - Change relevant code in MesosTaskFactory to setExecutorId(task.getTaskId())
> > > > > > - Add a command line parameter (default false) to toggle use of executor shutdown in VersionedSchedulerDriverService.killTask
> > > > > > 
> > > > > > 
> > > > > > Does anyone see an issue with this approach?
> > > > > > 
> > > > > > On Tue, Jan 16, 2018 at 11:15 AM, Mohit Jaggi <mo...@uber.com> wrote:
> > > > > > > To do this in a backward compatible manner, one way is :
> > > > > > > ```
> > > > > > > void destroy(taskId, executorId, agentId) {
> > > > > > > 
> > > > > > > 
> > > > > > > if(driver instanceOf Versioned....)   (Versioned...)driver.shutdown(executorId, agentId)
> > > > > > > else
> > > > > > >    driver.kill(taskId)
> > > > > > > 
> > > > > > > 
> > > > > > > }
> > > > > > > ```
> > > > > > > 
> > > > > > > Any other opinions?
> > > > > > > 
> > > > > > > On Tue, Jan 16, 2018 at 11:12 AM, David McLaughlin <dm...@apache.org> wrote:
> > > > > > > > Nope, I support getting SHUTDOWN in for users of the new API. 
> > > > > > > > 
> > > > > > > > On Tue, Jan 16, 2018 at 11:06 AM, Mohit Jaggi <mo...@uber.com> wrote:
> > > > > > > > > Are you suggesting that we delay the switch to SHUTDOWN call until this working group can resolve the API perf issue?
> > > > > > > > > 
> > > > > > > > > On Mon, Jan 15, 2018 at 3:55 PM, David McLaughlin <dm...@apache.org> wrote:
> > > > > > > > > > We are working with Mesos folks to resolve it. There is a Mesos performance working group that folks can join if they'd like to contribute:http://mesos.apache.org/blog/performance-working-group-progress-report/
> > > > > > > > > > 
> > > > > > > > > > I'm not sure what you mean by branch. Everything we used to scale test is on master.
> > > > > > > > > > 
> > > > > > > > > > On Mon, Jan 15, 2018 at 10:08 AM, Meghdoot bhattacharya <me...@yahoo.com> wrote:
> > > > > > > > > > > David, should twitter try against mesos 1.5 to see if things are better with the new api instead of libmesos. This is going to be a drift over time that will stop us from adopting new features.
> > > > > > > > > > > 
> > > > > > > > > > > If it was sometime back it would be good to rerun the tests and open a ticket in Mesos if issues exist. All aurora users can then push for resolution.
> > > > > > > > > > > 
> > > > > > > > > > > Also details on branch etc that has the api integration?
> > > > > > > > > > > 
> > > > > > > > > > > Thx
> > > > > > > > > > > 
> > > > > > > > > > > On Jan 12, 2018, at 11:39 AM, David McLaughlin <dm...@apache.org> wrote:
> > > > > > > > > > > 
> > > > > > > > > > > > I'm not sure I agree with the summary. Bill's proposal was using shutdown only when using the new API. I would also support this if it's possible.  
> > > > > > > > > > > > On Fri, Jan 12, 2018 at 11:14 AM, Mohit Jaggi <mo...@uber.com> wrote:
> > > > > > > > > > > > > Summary so far:- Bill supports making this change
> > > > > > > > > > > > > - This change cannot be made in a backward compatible manner
> > > > > > > > > > > > > - David (Twitter) does not want to use HTTP APIs due to performance concerns. I conclude that folks from Twitter don't support this change
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Question:
> > > > > > > > > > > > > - Are there other users that want this change?
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 

Re: shutdown vs kill API is Mesos

Posted by Mohit Jaggi <mo...@uber.com>.
FYI....I had a quick chat with Vinod from the Mesos team. I have some
questions for Aurora users inline:


*Originally the default was the COMMAND executor. In this world the
scheduler has no visibility into the command executor.*
*More recently, we added a DEFAULT executor which is used by frameworks
when they want to launch pod like task groups*

*The SHUTDOWN executor call is only applicable if a scheduler uses CUSTOM
or DEFAULT executor *and* uses v1 scheduler API.*

Q1: Does Aurora use COMMAND or DEFAULT executor?


*note that SHUTDOWN is not as robust as you might think
:slightly_smiling_face:*
*for one, there is no reconciliation API for the executor state. it is very
much best effort. *
*KILL is more robust for killing tasks, because task status updates are
reliably delivered and there is reconciliation API*

Q2: I think that this is ok as Aurora's reconciliation will still work as
we don't have "executor state". "task state" will be a good and correct
proxy for that. Aurora will send SHUTDOWN again and again until it succeeds
in the same way as it does now with KILL. Right?

Q3: Does thermos executor need any changes to respond to SHUTDOWN or does
it already handle that?




On Tue, Jan 16, 2018 at 4:48 PM, Mohit Jaggi <mo...@uber.com> wrote:

> So that is pretty much what I proposed...
>
> If the method signature has to change, we can keep the executorId as it
> is, unless we want to take this opportunity to clean that up. I will check
> if the SHUTDOWN works in non-executor cases also.
>
> On Tue, Jan 16, 2018 at 3:03 PM, Bill Farner <wf...@apache.org> wrote:
>
>> We still need "Agent ID" for the shutdown call.
>>
>>
>> Darn.  In that case, how about we change the method signature in Driver
>> to accept agentId and ignore that param in MesosSchedulerDriver.
>>
>> But do we really need the command line option?
>>
>>
>> Aurora can run tasks without an executor.  I'm assuming the shutdown call
>> is incompatible with that mode.
>>
>> On Tue, Jan 16, 2018 at 1:57 PM, Mohit Jaggi <mo...@uber.com>
>> wrote:
>>
>>> We still need "Agent ID" for the shutdown call.
>>>
>>> On Tue, Jan 16, 2018 at 1:57 PM, Mohit Jaggi <mo...@uber.com>
>>> wrote:
>>>
>>>> Sounds good. But do we really need the command line option? One can use
>>>> an older Driver if KILL is preferred for some reason.
>>>>
>>>> On Tue, Jan 16, 2018 at 1:51 PM, Bill Farner <wf...@apache.org>
>>>> wrote:
>>>>
>>>>> This situation is much simpler if task ID == executor ID.  I can't
>>>>> come up with a good reason why this is not the case today.  Our executor
>>>>> IDs originally included static prefix, though i do not recall any rationale
>>>>> for this.  When Renan added custom executor support, this static prefix was
>>>>> made configurable.  Again, i do not believe there was any rationale for the
>>>>> utility of executor IDs.
>>>>>
>>>>> I propose the following:
>>>>> - Change relevant code in MesosTaskFactory to
>>>>> setExecutorId(task.getTaskId())
>>>>> - Add a command line parameter (default false) to toggle use of
>>>>> executor shutdown in VersionedSchedulerDriverService.killTask
>>>>>
>>>>> Does anyone see an issue with this approach?
>>>>>
>>>>> On Tue, Jan 16, 2018 at 11:15 AM, Mohit Jaggi <mo...@uber.com>
>>>>> wrote:
>>>>>
>>>>>> To do this in a backward compatible manner, one way is :
>>>>>>
>>>>>> ```
>>>>>> void destroy(taskId, executorId, agentId) {
>>>>>>
>>>>>> if(driver instanceOf Versioned....)
>>>>>>    (Versioned...)driver.shutdown(executorId, agentId)
>>>>>> else
>>>>>>    driver.kill(taskId)
>>>>>>
>>>>>> }
>>>>>> ```
>>>>>>
>>>>>> Any other opinions?
>>>>>>
>>>>>> On Tue, Jan 16, 2018 at 11:12 AM, David McLaughlin <
>>>>>> dmclaughlin@apache.org> wrote:
>>>>>>
>>>>>>> Nope, I support getting SHUTDOWN in for users of the new API.
>>>>>>>
>>>>>>> On Tue, Jan 16, 2018 at 11:06 AM, Mohit Jaggi <mo...@uber.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Are you suggesting that we delay the switch to SHUTDOWN call until
>>>>>>>> this working group can resolve the API perf issue?
>>>>>>>>
>>>>>>>> On Mon, Jan 15, 2018 at 3:55 PM, David McLaughlin <
>>>>>>>> dmclaughlin@apache.org> wrote:
>>>>>>>>
>>>>>>>>> We are working with Mesos folks to resolve it. There is a Mesos
>>>>>>>>> performance working group that folks can join if they'd like to contribute:
>>>>>>>>> http://mesos.apache.org/blog/performance-working-group-progr
>>>>>>>>> ess-report/
>>>>>>>>>
>>>>>>>>> I'm not sure what you mean by branch. Everything we used to scale
>>>>>>>>> test is on master.
>>>>>>>>>
>>>>>>>>> On Mon, Jan 15, 2018 at 10:08 AM, Meghdoot bhattacharya <
>>>>>>>>> meghdoot_b@yahoo.com> wrote:
>>>>>>>>>
>>>>>>>>>> David, should twitter try against mesos 1.5 to see if things are
>>>>>>>>>> better with the new api instead of libmesos. This is going to be a drift
>>>>>>>>>> over time that will stop us from adopting new features.
>>>>>>>>>>
>>>>>>>>>> If it was sometime back it would be good to rerun the tests and
>>>>>>>>>> open a ticket in Mesos if issues exist. All aurora users can then push for
>>>>>>>>>> resolution.
>>>>>>>>>>
>>>>>>>>>> Also details on branch etc that has the api integration?
>>>>>>>>>>
>>>>>>>>>> Thx
>>>>>>>>>>
>>>>>>>>>> On Jan 12, 2018, at 11:39 AM, David McLaughlin <
>>>>>>>>>> dmclaughlin@apache.org> wrote:
>>>>>>>>>>
>>>>>>>>>> I'm not sure I agree with the summary. Bill's proposal was using
>>>>>>>>>> shutdown only when using the new API. I would also support this if it's
>>>>>>>>>> possible.
>>>>>>>>>>
>>>>>>>>>> On Fri, Jan 12, 2018 at 11:14 AM, Mohit Jaggi <
>>>>>>>>>> mohit.jaggi@uber.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Summary so far:
>>>>>>>>>>> - Bill supports making this change
>>>>>>>>>>> - This change cannot be made in a backward compatible manner
>>>>>>>>>>> - David (Twitter) does not want to use HTTP APIs due to
>>>>>>>>>>> performance concerns. I conclude that folks from Twitter don't support this
>>>>>>>>>>> change
>>>>>>>>>>>
>>>>>>>>>>> Question:
>>>>>>>>>>> - Are there other users that want this change?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: shutdown vs kill API is Mesos

Posted by Mohit Jaggi <mo...@uber.com>.
So that is pretty much what I proposed...

If the method signature has to change, we can keep the executorId as it is,
unless we want to take this opportunity to clean that up. I will check if
the SHUTDOWN works in non-executor cases also.

On Tue, Jan 16, 2018 at 3:03 PM, Bill Farner <wf...@apache.org> wrote:

> We still need "Agent ID" for the shutdown call.
>
>
> Darn.  In that case, how about we change the method signature in Driver to
> accept agentId and ignore that param in MesosSchedulerDriver.
>
> But do we really need the command line option?
>
>
> Aurora can run tasks without an executor.  I'm assuming the shutdown call
> is incompatible with that mode.
>
> On Tue, Jan 16, 2018 at 1:57 PM, Mohit Jaggi <mo...@uber.com> wrote:
>
>> We still need "Agent ID" for the shutdown call.
>>
>> On Tue, Jan 16, 2018 at 1:57 PM, Mohit Jaggi <mo...@uber.com>
>> wrote:
>>
>>> Sounds good. But do we really need the command line option? One can use
>>> an older Driver if KILL is preferred for some reason.
>>>
>>> On Tue, Jan 16, 2018 at 1:51 PM, Bill Farner <wf...@apache.org> wrote:
>>>
>>>> This situation is much simpler if task ID == executor ID.  I can't come
>>>> up with a good reason why this is not the case today.  Our executor IDs
>>>> originally included static prefix, though i do not recall any rationale for
>>>> this.  When Renan added custom executor support, this static prefix was
>>>> made configurable.  Again, i do not believe there was any rationale for the
>>>> utility of executor IDs.
>>>>
>>>> I propose the following:
>>>> - Change relevant code in MesosTaskFactory to
>>>> setExecutorId(task.getTaskId())
>>>> - Add a command line parameter (default false) to toggle use of
>>>> executor shutdown in VersionedSchedulerDriverService.killTask
>>>>
>>>> Does anyone see an issue with this approach?
>>>>
>>>> On Tue, Jan 16, 2018 at 11:15 AM, Mohit Jaggi <mo...@uber.com>
>>>> wrote:
>>>>
>>>>> To do this in a backward compatible manner, one way is :
>>>>>
>>>>> ```
>>>>> void destroy(taskId, executorId, agentId) {
>>>>>
>>>>> if(driver instanceOf Versioned....)
>>>>>    (Versioned...)driver.shutdown(executorId, agentId)
>>>>> else
>>>>>    driver.kill(taskId)
>>>>>
>>>>> }
>>>>> ```
>>>>>
>>>>> Any other opinions?
>>>>>
>>>>> On Tue, Jan 16, 2018 at 11:12 AM, David McLaughlin <
>>>>> dmclaughlin@apache.org> wrote:
>>>>>
>>>>>> Nope, I support getting SHUTDOWN in for users of the new API.
>>>>>>
>>>>>> On Tue, Jan 16, 2018 at 11:06 AM, Mohit Jaggi <mo...@uber.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Are you suggesting that we delay the switch to SHUTDOWN call until
>>>>>>> this working group can resolve the API perf issue?
>>>>>>>
>>>>>>> On Mon, Jan 15, 2018 at 3:55 PM, David McLaughlin <
>>>>>>> dmclaughlin@apache.org> wrote:
>>>>>>>
>>>>>>>> We are working with Mesos folks to resolve it. There is a Mesos
>>>>>>>> performance working group that folks can join if they'd like to contribute:
>>>>>>>> http://mesos.apache.org/blog/performance-working-group-progr
>>>>>>>> ess-report/
>>>>>>>>
>>>>>>>> I'm not sure what you mean by branch. Everything we used to scale
>>>>>>>> test is on master.
>>>>>>>>
>>>>>>>> On Mon, Jan 15, 2018 at 10:08 AM, Meghdoot bhattacharya <
>>>>>>>> meghdoot_b@yahoo.com> wrote:
>>>>>>>>
>>>>>>>>> David, should twitter try against mesos 1.5 to see if things are
>>>>>>>>> better with the new api instead of libmesos. This is going to be a drift
>>>>>>>>> over time that will stop us from adopting new features.
>>>>>>>>>
>>>>>>>>> If it was sometime back it would be good to rerun the tests and
>>>>>>>>> open a ticket in Mesos if issues exist. All aurora users can then push for
>>>>>>>>> resolution.
>>>>>>>>>
>>>>>>>>> Also details on branch etc that has the api integration?
>>>>>>>>>
>>>>>>>>> Thx
>>>>>>>>>
>>>>>>>>> On Jan 12, 2018, at 11:39 AM, David McLaughlin <
>>>>>>>>> dmclaughlin@apache.org> wrote:
>>>>>>>>>
>>>>>>>>> I'm not sure I agree with the summary. Bill's proposal was using
>>>>>>>>> shutdown only when using the new API. I would also support this if it's
>>>>>>>>> possible.
>>>>>>>>>
>>>>>>>>> On Fri, Jan 12, 2018 at 11:14 AM, Mohit Jaggi <
>>>>>>>>> mohit.jaggi@uber.com> wrote:
>>>>>>>>>
>>>>>>>>>> Summary so far:
>>>>>>>>>> - Bill supports making this change
>>>>>>>>>> - This change cannot be made in a backward compatible manner
>>>>>>>>>> - David (Twitter) does not want to use HTTP APIs due to
>>>>>>>>>> performance concerns. I conclude that folks from Twitter don't support this
>>>>>>>>>> change
>>>>>>>>>>
>>>>>>>>>> Question:
>>>>>>>>>> - Are there other users that want this change?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: shutdown vs kill API is Mesos

Posted by Bill Farner <wf...@apache.org>.
>
> We still need "Agent ID" for the shutdown call.


Darn.  In that case, how about we change the method signature in Driver to
accept agentId and ignore that param in MesosSchedulerDriver.

But do we really need the command line option?


Aurora can run tasks without an executor.  I'm assuming the shutdown call
is incompatible with that mode.

On Tue, Jan 16, 2018 at 1:57 PM, Mohit Jaggi <mo...@uber.com> wrote:

> We still need "Agent ID" for the shutdown call.
>
> On Tue, Jan 16, 2018 at 1:57 PM, Mohit Jaggi <mo...@uber.com> wrote:
>
>> Sounds good. But do we really need the command line option? One can use
>> an older Driver if KILL is preferred for some reason.
>>
>> On Tue, Jan 16, 2018 at 1:51 PM, Bill Farner <wf...@apache.org> wrote:
>>
>>> This situation is much simpler if task ID == executor ID.  I can't come
>>> up with a good reason why this is not the case today.  Our executor IDs
>>> originally included static prefix, though i do not recall any rationale for
>>> this.  When Renan added custom executor support, this static prefix was
>>> made configurable.  Again, i do not believe there was any rationale for the
>>> utility of executor IDs.
>>>
>>> I propose the following:
>>> - Change relevant code in MesosTaskFactory to
>>> setExecutorId(task.getTaskId())
>>> - Add a command line parameter (default false) to toggle use of executor
>>> shutdown in VersionedSchedulerDriverService.killTask
>>>
>>> Does anyone see an issue with this approach?
>>>
>>> On Tue, Jan 16, 2018 at 11:15 AM, Mohit Jaggi <mo...@uber.com>
>>> wrote:
>>>
>>>> To do this in a backward compatible manner, one way is :
>>>>
>>>> ```
>>>> void destroy(taskId, executorId, agentId) {
>>>>
>>>> if(driver instanceOf Versioned....)
>>>>    (Versioned...)driver.shutdown(executorId, agentId)
>>>> else
>>>>    driver.kill(taskId)
>>>>
>>>> }
>>>> ```
>>>>
>>>> Any other opinions?
>>>>
>>>> On Tue, Jan 16, 2018 at 11:12 AM, David McLaughlin <
>>>> dmclaughlin@apache.org> wrote:
>>>>
>>>>> Nope, I support getting SHUTDOWN in for users of the new API.
>>>>>
>>>>> On Tue, Jan 16, 2018 at 11:06 AM, Mohit Jaggi <mo...@uber.com>
>>>>> wrote:
>>>>>
>>>>>> Are you suggesting that we delay the switch to SHUTDOWN call until
>>>>>> this working group can resolve the API perf issue?
>>>>>>
>>>>>> On Mon, Jan 15, 2018 at 3:55 PM, David McLaughlin <
>>>>>> dmclaughlin@apache.org> wrote:
>>>>>>
>>>>>>> We are working with Mesos folks to resolve it. There is a Mesos
>>>>>>> performance working group that folks can join if they'd like to contribute:
>>>>>>> http://mesos.apache.org/blog/performance-working-group-progr
>>>>>>> ess-report/
>>>>>>>
>>>>>>> I'm not sure what you mean by branch. Everything we used to scale
>>>>>>> test is on master.
>>>>>>>
>>>>>>> On Mon, Jan 15, 2018 at 10:08 AM, Meghdoot bhattacharya <
>>>>>>> meghdoot_b@yahoo.com> wrote:
>>>>>>>
>>>>>>>> David, should twitter try against mesos 1.5 to see if things are
>>>>>>>> better with the new api instead of libmesos. This is going to be a drift
>>>>>>>> over time that will stop us from adopting new features.
>>>>>>>>
>>>>>>>> If it was sometime back it would be good to rerun the tests and
>>>>>>>> open a ticket in Mesos if issues exist. All aurora users can then push for
>>>>>>>> resolution.
>>>>>>>>
>>>>>>>> Also details on branch etc that has the api integration?
>>>>>>>>
>>>>>>>> Thx
>>>>>>>>
>>>>>>>> On Jan 12, 2018, at 11:39 AM, David McLaughlin <
>>>>>>>> dmclaughlin@apache.org> wrote:
>>>>>>>>
>>>>>>>> I'm not sure I agree with the summary. Bill's proposal was using
>>>>>>>> shutdown only when using the new API. I would also support this if it's
>>>>>>>> possible.
>>>>>>>>
>>>>>>>> On Fri, Jan 12, 2018 at 11:14 AM, Mohit Jaggi <mohit.jaggi@uber.com
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> Summary so far:
>>>>>>>>> - Bill supports making this change
>>>>>>>>> - This change cannot be made in a backward compatible manner
>>>>>>>>> - David (Twitter) does not want to use HTTP APIs due to
>>>>>>>>> performance concerns. I conclude that folks from Twitter don't support this
>>>>>>>>> change
>>>>>>>>>
>>>>>>>>> Question:
>>>>>>>>> - Are there other users that want this change?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: shutdown vs kill API is Mesos

Posted by Mohit Jaggi <mo...@uber.com>.
We still need "Agent ID" for the shutdown call.

On Tue, Jan 16, 2018 at 1:57 PM, Mohit Jaggi <mo...@uber.com> wrote:

> Sounds good. But do we really need the command line option? One can use an
> older Driver if KILL is preferred for some reason.
>
> On Tue, Jan 16, 2018 at 1:51 PM, Bill Farner <wf...@apache.org> wrote:
>
>> This situation is much simpler if task ID == executor ID.  I can't come
>> up with a good reason why this is not the case today.  Our executor IDs
>> originally included static prefix, though i do not recall any rationale for
>> this.  When Renan added custom executor support, this static prefix was
>> made configurable.  Again, i do not believe there was any rationale for the
>> utility of executor IDs.
>>
>> I propose the following:
>> - Change relevant code in MesosTaskFactory to
>> setExecutorId(task.getTaskId())
>> - Add a command line parameter (default false) to toggle use of executor
>> shutdown in VersionedSchedulerDriverService.killTask
>>
>> Does anyone see an issue with this approach?
>>
>> On Tue, Jan 16, 2018 at 11:15 AM, Mohit Jaggi <mo...@uber.com>
>> wrote:
>>
>>> To do this in a backward compatible manner, one way is :
>>>
>>> ```
>>> void destroy(taskId, executorId, agentId) {
>>>
>>> if(driver instanceOf Versioned....)
>>>    (Versioned...)driver.shutdown(executorId, agentId)
>>> else
>>>    driver.kill(taskId)
>>>
>>> }
>>> ```
>>>
>>> Any other opinions?
>>>
>>> On Tue, Jan 16, 2018 at 11:12 AM, David McLaughlin <
>>> dmclaughlin@apache.org> wrote:
>>>
>>>> Nope, I support getting SHUTDOWN in for users of the new API.
>>>>
>>>> On Tue, Jan 16, 2018 at 11:06 AM, Mohit Jaggi <mo...@uber.com>
>>>> wrote:
>>>>
>>>>> Are you suggesting that we delay the switch to SHUTDOWN call until
>>>>> this working group can resolve the API perf issue?
>>>>>
>>>>> On Mon, Jan 15, 2018 at 3:55 PM, David McLaughlin <
>>>>> dmclaughlin@apache.org> wrote:
>>>>>
>>>>>> We are working with Mesos folks to resolve it. There is a Mesos
>>>>>> performance working group that folks can join if they'd like to contribute:
>>>>>> http://mesos.apache.org/blog/performance-working-group-progr
>>>>>> ess-report/
>>>>>>
>>>>>> I'm not sure what you mean by branch. Everything we used to scale
>>>>>> test is on master.
>>>>>>
>>>>>> On Mon, Jan 15, 2018 at 10:08 AM, Meghdoot bhattacharya <
>>>>>> meghdoot_b@yahoo.com> wrote:
>>>>>>
>>>>>>> David, should twitter try against mesos 1.5 to see if things are
>>>>>>> better with the new api instead of libmesos. This is going to be a drift
>>>>>>> over time that will stop us from adopting new features.
>>>>>>>
>>>>>>> If it was sometime back it would be good to rerun the tests and open
>>>>>>> a ticket in Mesos if issues exist. All aurora users can then push for
>>>>>>> resolution.
>>>>>>>
>>>>>>> Also details on branch etc that has the api integration?
>>>>>>>
>>>>>>> Thx
>>>>>>>
>>>>>>> On Jan 12, 2018, at 11:39 AM, David McLaughlin <
>>>>>>> dmclaughlin@apache.org> wrote:
>>>>>>>
>>>>>>> I'm not sure I agree with the summary. Bill's proposal was using
>>>>>>> shutdown only when using the new API. I would also support this if it's
>>>>>>> possible.
>>>>>>>
>>>>>>> On Fri, Jan 12, 2018 at 11:14 AM, Mohit Jaggi <mo...@uber.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Summary so far:
>>>>>>>> - Bill supports making this change
>>>>>>>> - This change cannot be made in a backward compatible manner
>>>>>>>> - David (Twitter) does not want to use HTTP APIs due to performance
>>>>>>>> concerns. I conclude that folks from Twitter don't support this change
>>>>>>>>
>>>>>>>> Question:
>>>>>>>> - Are there other users that want this change?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: shutdown vs kill API is Mesos

Posted by Mohit Jaggi <mo...@uber.com>.
Sounds good. But do we really need the command line option? One can use an
older Driver if KILL is preferred for some reason.

On Tue, Jan 16, 2018 at 1:51 PM, Bill Farner <wf...@apache.org> wrote:

> This situation is much simpler if task ID == executor ID.  I can't come up
> with a good reason why this is not the case today.  Our executor IDs
> originally included static prefix, though i do not recall any rationale for
> this.  When Renan added custom executor support, this static prefix was
> made configurable.  Again, i do not believe there was any rationale for the
> utility of executor IDs.
>
> I propose the following:
> - Change relevant code in MesosTaskFactory to
> setExecutorId(task.getTaskId())
> - Add a command line parameter (default false) to toggle use of executor
> shutdown in VersionedSchedulerDriverService.killTask
>
> Does anyone see an issue with this approach?
>
> On Tue, Jan 16, 2018 at 11:15 AM, Mohit Jaggi <mo...@uber.com>
> wrote:
>
>> To do this in a backward compatible manner, one way is :
>>
>> ```
>> void destroy(taskId, executorId, agentId) {
>>
>> if(driver instanceOf Versioned....)
>>    (Versioned...)driver.shutdown(executorId, agentId)
>> else
>>    driver.kill(taskId)
>>
>> }
>> ```
>>
>> Any other opinions?
>>
>> On Tue, Jan 16, 2018 at 11:12 AM, David McLaughlin <
>> dmclaughlin@apache.org> wrote:
>>
>>> Nope, I support getting SHUTDOWN in for users of the new API.
>>>
>>> On Tue, Jan 16, 2018 at 11:06 AM, Mohit Jaggi <mo...@uber.com>
>>> wrote:
>>>
>>>> Are you suggesting that we delay the switch to SHUTDOWN call until this
>>>> working group can resolve the API perf issue?
>>>>
>>>> On Mon, Jan 15, 2018 at 3:55 PM, David McLaughlin <
>>>> dmclaughlin@apache.org> wrote:
>>>>
>>>>> We are working with Mesos folks to resolve it. There is a Mesos
>>>>> performance working group that folks can join if they'd like to contribute:
>>>>> http://mesos.apache.org/blog/performance-working-group-progr
>>>>> ess-report/
>>>>>
>>>>> I'm not sure what you mean by branch. Everything we used to scale test
>>>>> is on master.
>>>>>
>>>>> On Mon, Jan 15, 2018 at 10:08 AM, Meghdoot bhattacharya <
>>>>> meghdoot_b@yahoo.com> wrote:
>>>>>
>>>>>> David, should twitter try against mesos 1.5 to see if things are
>>>>>> better with the new api instead of libmesos. This is going to be a drift
>>>>>> over time that will stop us from adopting new features.
>>>>>>
>>>>>> If it was sometime back it would be good to rerun the tests and open
>>>>>> a ticket in Mesos if issues exist. All aurora users can then push for
>>>>>> resolution.
>>>>>>
>>>>>> Also details on branch etc that has the api integration?
>>>>>>
>>>>>> Thx
>>>>>>
>>>>>> On Jan 12, 2018, at 11:39 AM, David McLaughlin <
>>>>>> dmclaughlin@apache.org> wrote:
>>>>>>
>>>>>> I'm not sure I agree with the summary. Bill's proposal was using
>>>>>> shutdown only when using the new API. I would also support this if it's
>>>>>> possible.
>>>>>>
>>>>>> On Fri, Jan 12, 2018 at 11:14 AM, Mohit Jaggi <mo...@uber.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Summary so far:
>>>>>>> - Bill supports making this change
>>>>>>> - This change cannot be made in a backward compatible manner
>>>>>>> - David (Twitter) does not want to use HTTP APIs due to performance
>>>>>>> concerns. I conclude that folks from Twitter don't support this change
>>>>>>>
>>>>>>> Question:
>>>>>>> - Are there other users that want this change?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: shutdown vs kill API is Mesos

Posted by Bill Farner <wf...@apache.org>.
This situation is much simpler if task ID == executor ID.  I can't come up
with a good reason why this is not the case today.  Our executor IDs
originally included static prefix, though i do not recall any rationale for
this.  When Renan added custom executor support, this static prefix was
made configurable.  Again, i do not believe there was any rationale for the
utility of executor IDs.

I propose the following:
- Change relevant code in MesosTaskFactory to setExecutorId(task.getTaskId()
)
- Add a command line parameter (default false) to toggle use of executor
shutdown in VersionedSchedulerDriverService.killTask

Does anyone see an issue with this approach?

On Tue, Jan 16, 2018 at 11:15 AM, Mohit Jaggi <mo...@uber.com> wrote:

> To do this in a backward compatible manner, one way is :
>
> ```
> void destroy(taskId, executorId, agentId) {
>
> if(driver instanceOf Versioned....)
>    (Versioned...)driver.shutdown(executorId, agentId)
> else
>    driver.kill(taskId)
>
> }
> ```
>
> Any other opinions?
>
> On Tue, Jan 16, 2018 at 11:12 AM, David McLaughlin <dmclaughlin@apache.org
> > wrote:
>
>> Nope, I support getting SHUTDOWN in for users of the new API.
>>
>> On Tue, Jan 16, 2018 at 11:06 AM, Mohit Jaggi <mo...@uber.com>
>> wrote:
>>
>>> Are you suggesting that we delay the switch to SHUTDOWN call until this
>>> working group can resolve the API perf issue?
>>>
>>> On Mon, Jan 15, 2018 at 3:55 PM, David McLaughlin <
>>> dmclaughlin@apache.org> wrote:
>>>
>>>> We are working with Mesos folks to resolve it. There is a Mesos
>>>> performance working group that folks can join if they'd like to contribute:
>>>> http://mesos.apache.org/blog/performance-working-group-progress-report/
>>>>
>>>> I'm not sure what you mean by branch. Everything we used to scale test
>>>> is on master.
>>>>
>>>> On Mon, Jan 15, 2018 at 10:08 AM, Meghdoot bhattacharya <
>>>> meghdoot_b@yahoo.com> wrote:
>>>>
>>>>> David, should twitter try against mesos 1.5 to see if things are
>>>>> better with the new api instead of libmesos. This is going to be a drift
>>>>> over time that will stop us from adopting new features.
>>>>>
>>>>> If it was sometime back it would be good to rerun the tests and open a
>>>>> ticket in Mesos if issues exist. All aurora users can then push for
>>>>> resolution.
>>>>>
>>>>> Also details on branch etc that has the api integration?
>>>>>
>>>>> Thx
>>>>>
>>>>> On Jan 12, 2018, at 11:39 AM, David McLaughlin <dm...@apache.org>
>>>>> wrote:
>>>>>
>>>>> I'm not sure I agree with the summary. Bill's proposal was using
>>>>> shutdown only when using the new API. I would also support this if it's
>>>>> possible.
>>>>>
>>>>> On Fri, Jan 12, 2018 at 11:14 AM, Mohit Jaggi <mo...@uber.com>
>>>>> wrote:
>>>>>
>>>>>> Summary so far:
>>>>>> - Bill supports making this change
>>>>>> - This change cannot be made in a backward compatible manner
>>>>>> - David (Twitter) does not want to use HTTP APIs due to performance
>>>>>> concerns. I conclude that folks from Twitter don't support this change
>>>>>>
>>>>>> Question:
>>>>>> - Are there other users that want this change?
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: shutdown vs kill API is Mesos

Posted by Mohit Jaggi <mo...@uber.com>.
To do this in a backward compatible manner, one way is :

```
void destroy(taskId, executorId, agentId) {

if(driver instanceOf Versioned....)
   (Versioned...)driver.shutdown(executorId, agentId)
else
   driver.kill(taskId)

}
```

Any other opinions?

On Tue, Jan 16, 2018 at 11:12 AM, David McLaughlin <dm...@apache.org>
wrote:

> Nope, I support getting SHUTDOWN in for users of the new API.
>
> On Tue, Jan 16, 2018 at 11:06 AM, Mohit Jaggi <mo...@uber.com>
> wrote:
>
>> Are you suggesting that we delay the switch to SHUTDOWN call until this
>> working group can resolve the API perf issue?
>>
>> On Mon, Jan 15, 2018 at 3:55 PM, David McLaughlin <dmclaughlin@apache.org
>> > wrote:
>>
>>> We are working with Mesos folks to resolve it. There is a Mesos
>>> performance working group that folks can join if they'd like to contribute:
>>> http://mesos.apache.org/blog/performance-working-group-progress-report/
>>>
>>> I'm not sure what you mean by branch. Everything we used to scale test
>>> is on master.
>>>
>>> On Mon, Jan 15, 2018 at 10:08 AM, Meghdoot bhattacharya <
>>> meghdoot_b@yahoo.com> wrote:
>>>
>>>> David, should twitter try against mesos 1.5 to see if things are better
>>>> with the new api instead of libmesos. This is going to be a drift over time
>>>> that will stop us from adopting new features.
>>>>
>>>> If it was sometime back it would be good to rerun the tests and open a
>>>> ticket in Mesos if issues exist. All aurora users can then push for
>>>> resolution.
>>>>
>>>> Also details on branch etc that has the api integration?
>>>>
>>>> Thx
>>>>
>>>> On Jan 12, 2018, at 11:39 AM, David McLaughlin <dm...@apache.org>
>>>> wrote:
>>>>
>>>> I'm not sure I agree with the summary. Bill's proposal was using
>>>> shutdown only when using the new API. I would also support this if it's
>>>> possible.
>>>>
>>>> On Fri, Jan 12, 2018 at 11:14 AM, Mohit Jaggi <mo...@uber.com>
>>>> wrote:
>>>>
>>>>> Summary so far:
>>>>> - Bill supports making this change
>>>>> - This change cannot be made in a backward compatible manner
>>>>> - David (Twitter) does not want to use HTTP APIs due to performance
>>>>> concerns. I conclude that folks from Twitter don't support this change
>>>>>
>>>>> Question:
>>>>> - Are there other users that want this change?
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: shutdown vs kill API is Mesos

Posted by David McLaughlin <dm...@apache.org>.
Nope, I support getting SHUTDOWN in for users of the new API.

On Tue, Jan 16, 2018 at 11:06 AM, Mohit Jaggi <mo...@uber.com> wrote:

> Are you suggesting that we delay the switch to SHUTDOWN call until this
> working group can resolve the API perf issue?
>
> On Mon, Jan 15, 2018 at 3:55 PM, David McLaughlin <dm...@apache.org>
> wrote:
>
>> We are working with Mesos folks to resolve it. There is a Mesos
>> performance working group that folks can join if they'd like to contribute:
>> http://mesos.apache.org/blog/performance-working-group-progress-report/
>>
>> I'm not sure what you mean by branch. Everything we used to scale test is
>> on master.
>>
>> On Mon, Jan 15, 2018 at 10:08 AM, Meghdoot bhattacharya <
>> meghdoot_b@yahoo.com> wrote:
>>
>>> David, should twitter try against mesos 1.5 to see if things are better
>>> with the new api instead of libmesos. This is going to be a drift over time
>>> that will stop us from adopting new features.
>>>
>>> If it was sometime back it would be good to rerun the tests and open a
>>> ticket in Mesos if issues exist. All aurora users can then push for
>>> resolution.
>>>
>>> Also details on branch etc that has the api integration?
>>>
>>> Thx
>>>
>>> On Jan 12, 2018, at 11:39 AM, David McLaughlin <dm...@apache.org>
>>> wrote:
>>>
>>> I'm not sure I agree with the summary. Bill's proposal was using
>>> shutdown only when using the new API. I would also support this if it's
>>> possible.
>>>
>>> On Fri, Jan 12, 2018 at 11:14 AM, Mohit Jaggi <mo...@uber.com>
>>> wrote:
>>>
>>>> Summary so far:
>>>> - Bill supports making this change
>>>> - This change cannot be made in a backward compatible manner
>>>> - David (Twitter) does not want to use HTTP APIs due to performance
>>>> concerns. I conclude that folks from Twitter don't support this change
>>>>
>>>> Question:
>>>> - Are there other users that want this change?
>>>>
>>>>
>>>>
>>>
>>
>

Re: shutdown vs kill API is Mesos

Posted by Mohit Jaggi <mo...@uber.com>.
Are you suggesting that we delay the switch to SHUTDOWN call until this
working group can resolve the API perf issue?

On Mon, Jan 15, 2018 at 3:55 PM, David McLaughlin <dm...@apache.org>
wrote:

> We are working with Mesos folks to resolve it. There is a Mesos
> performance working group that folks can join if they'd like to contribute:
> http://mesos.apache.org/blog/performance-working-group-progress-report/
>
> I'm not sure what you mean by branch. Everything we used to scale test is
> on master.
>
> On Mon, Jan 15, 2018 at 10:08 AM, Meghdoot bhattacharya <
> meghdoot_b@yahoo.com> wrote:
>
>> David, should twitter try against mesos 1.5 to see if things are better
>> with the new api instead of libmesos. This is going to be a drift over time
>> that will stop us from adopting new features.
>>
>> If it was sometime back it would be good to rerun the tests and open a
>> ticket in Mesos if issues exist. All aurora users can then push for
>> resolution.
>>
>> Also details on branch etc that has the api integration?
>>
>> Thx
>>
>> On Jan 12, 2018, at 11:39 AM, David McLaughlin <dm...@apache.org>
>> wrote:
>>
>> I'm not sure I agree with the summary. Bill's proposal was using shutdown
>> only when using the new API. I would also support this if it's possible.
>>
>> On Fri, Jan 12, 2018 at 11:14 AM, Mohit Jaggi <mo...@uber.com>
>> wrote:
>>
>>> Summary so far:
>>> - Bill supports making this change
>>> - This change cannot be made in a backward compatible manner
>>> - David (Twitter) does not want to use HTTP APIs due to performance
>>> concerns. I conclude that folks from Twitter don't support this change
>>>
>>> Question:
>>> - Are there other users that want this change?
>>>
>>>
>>>
>>
>

Re: shutdown vs kill API is Mesos

Posted by David McLaughlin <dm...@apache.org>.
We are working with Mesos folks to resolve it. There is a Mesos performance
working group that folks can join if they'd like to contribute:
http://mesos.apache.org/blog/performance-working-group-progress-report/

I'm not sure what you mean by branch. Everything we used to scale test is
on master.

On Mon, Jan 15, 2018 at 10:08 AM, Meghdoot bhattacharya <
meghdoot_b@yahoo.com> wrote:

> David, should twitter try against mesos 1.5 to see if things are better
> with the new api instead of libmesos. This is going to be a drift over time
> that will stop us from adopting new features.
>
> If it was sometime back it would be good to rerun the tests and open a
> ticket in Mesos if issues exist. All aurora users can then push for
> resolution.
>
> Also details on branch etc that has the api integration?
>
> Thx
>
> On Jan 12, 2018, at 11:39 AM, David McLaughlin <dm...@apache.org>
> wrote:
>
> I'm not sure I agree with the summary. Bill's proposal was using shutdown
> only when using the new API. I would also support this if it's possible.
>
> On Fri, Jan 12, 2018 at 11:14 AM, Mohit Jaggi <mo...@uber.com>
> wrote:
>
>> Summary so far:
>> - Bill supports making this change
>> - This change cannot be made in a backward compatible manner
>> - David (Twitter) does not want to use HTTP APIs due to performance
>> concerns. I conclude that folks from Twitter don't support this change
>>
>> Question:
>> - Are there other users that want this change?
>>
>>
>>
>

Re: shutdown vs kill API is Mesos

Posted by Meghdoot bhattacharya <me...@yahoo.com>.
David, should twitter try against mesos 1.5 to see if things are better with the new api instead of libmesos. This is going to be a drift over time that will stop us from adopting new features.

If it was sometime back it would be good to rerun the tests and open a ticket in Mesos if issues exist. All aurora users can then push for resolution.

Also details on branch etc that has the api integration?

Thx

> On Jan 12, 2018, at 11:39 AM, David McLaughlin <dm...@apache.org> wrote:
> 
> I'm not sure I agree with the summary. Bill's proposal was using shutdown only when using the new API. I would also support this if it's possible.  
> 
>> On Fri, Jan 12, 2018 at 11:14 AM, Mohit Jaggi <mo...@uber.com> wrote:
>> Summary so far:
>> - Bill supports making this change
>> - This change cannot be made in a backward compatible manner
>> - David (Twitter) does not want to use HTTP APIs due to performance concerns. I conclude that folks from Twitter don't support this change
>> 
>> Question:
>> - Are there other users that want this change?
>> 
>> 
> 

Re: shutdown vs kill API is Mesos

Posted by David McLaughlin <dm...@apache.org>.
I'm not sure I agree with the summary. Bill's proposal was using shutdown
only when using the new API. I would also support this if it's possible.

On Fri, Jan 12, 2018 at 11:14 AM, Mohit Jaggi <mo...@uber.com> wrote:

> Summary so far:
> - Bill supports making this change
> - This change cannot be made in a backward compatible manner
> - David (Twitter) does not want to use HTTP APIs due to performance
> concerns. I conclude that folks from Twitter don't support this change
>
> Question:
> - Are there other users that want this change?
>
>
>

Re: shutdown vs kill API is Mesos

Posted by Mohit Jaggi <mo...@uber.com>.
Summary so far:
- Bill supports making this change
- This change cannot be made in a backward compatible manner
- David (Twitter) does not want to use HTTP APIs due to performance
concerns. I conclude that folks from Twitter don't support this change

Question:
- Are there other users that want this change?

Re: shutdown vs kill API is Mesos

Posted by Renan DelValle <re...@gmail.com>.
Sorry, I guess referring to it as the libmesos way of talking to the Mesos
master is a bit misleading.

And I stand corrected, the V0 is only an adaptor to the V1 interface which
still uses the undocumented RPC way of talking to the master (
https://github.com/apache/mesos/blob/master/src/java/jni/org_apache_mesos_MesosSchedulerDriver.cpp)
while using V1 versioned protobufs.

V1 one on the other hand talks to Mesos via a well defined HTTP API.
There's still a dependency on libmesos because the implementation of the
code that handles the HTTP requests is made available via JNI. The big
difference here being that someone else can implement their own Java only
version of the driver and the dependency on libmesos would be gone.

Apologies for the confusion.

On Thu, Jan 11, 2018 at 2:03 PM, Mohit Jaggi <mo...@uber.com> wrote:

> David,
> - LCD makes sense. Does that mean that Twitter is using the
>  SCHEDULER_DRIVER
> <https://github.com/apache/aurora/blob/aae2b0dc73b7534c66982ed07b1f029150e245de/src/main/java/org/apache/aurora/scheduler/mesos/SchedulerDriverModule.java#L72>
>  version?
> - I don't see Bill's proposal on this thread. Did I miss it?
>
> Renan,
> VersionedDriverFactory
> <https://github.com/apache/aurora/blob/2e1ca42887bc8ea1e8c6cddebe9d1cf29268c714/src/main/java/org/apache/aurora/scheduler/mesos/VersionedDriverFactory.java#L24>'s
> comments indicate that libmesos is still used. What am I missing?
>
> BTW, with the patch for Thermos (from Stephan I think), the need for
> switching to SHUTDOWN is reduced.
> Mohit.
>
> On Thu, Jan 11, 2018 at 2:01 PM, David McLaughlin <dm...@apache.org>
> wrote:
>
>> Sorry, the other approach outlined by Bill would in theory work too, but
>> it sounds like in practice it also needs more changes on the Mesos side.
>>
>> On Thu, Jan 11, 2018 at 1:55 PM, David McLaughlin <dmclaughlin@apache.org
>> > wrote:
>>
>>> Right. In order to keep the current abstraction in Aurora (both APIs),
>>> we obviously have to bind to the lower common denominator API methods. So
>>> the only way to integrate with shutdown will be to fix the performance
>>> issues so we can switch to the new API.
>>>
>>> The performance issue we ran into at Twitter was that with status
>>> updates that were similar to our production volume, they started to get
>>> dropped and tasks end up being LOST and unnecessarily killed. So it's a
>>> definite blocker for us to adopt in its current state. We have someone who
>>> has fixing this on the Mesos side in their backlog, but it's currently not
>>> the highest priority for us.
>>>
>>> On Thu, Jan 11, 2018 at 1:45 PM, Renan DelValle <
>>> renanidelvalle@gmail.com> wrote:
>>>
>>>> The HTTP API is what is used under the hood for V0 and V1 (instead of
>>>> libmesos), I believe that's what David was referencing when he mentioned
>>>> the HTTP performance issues. Here's a better explanation from the original
>>>> patch submitted by Zameer: https://github.com/apa
>>>> che/aurora/commit/705dbc7cd7c3ff477bcf766cdafe49a68ab47dee#d
>>>> iff-75bd5a98db87502a2332e9110d2eafc6
>>>>
>>>> I'm not sure about the Shutdown call, as you mentioned, the versioned
>>>> driver seems to have the method but the driver interface does not. This
>>>> might get tricky from here on in since Mesos has V1 only compatible calls.
>>>>
>>>> On Thu, Jan 11, 2018 at 1:24 PM, Mohit Jaggi <mo...@uber.com>
>>>> wrote:
>>>>
>>>>> Thanks Renan. I saw that code. "Driver" interface does not have
>>>>> SHUTDOWN...so it is not "compatible". I was trying to change to
>>>>> VersionedSchedulerDriverService all over the code (that wreaks havoc
>>>>> across the tests!) but Mesos's Java wrapper doesn't seem to have that
>>>>> call either. Perhaps, that is why David referred to the HTTP API.
>>>>>
>>>>> On Thu, Jan 11, 2018 at 1:14 PM, Renan DelValle <
>>>>> renanidelvalle@gmail.com> wrote:
>>>>>
>>>>>> https://github.com/apache/aurora/blob/aae2b0dc73b7534c66982e
>>>>>> d07b1f029150e245de/src/main/java/org/apache/aurora/scheduler
>>>>>> /mesos/SchedulerDriverModule.java
>>>>>>
>>>>>> https://github.com/apache/aurora/blob/aae2b0dc73b7534c66982e
>>>>>> d07b1f029150e245de/src/main/java/org/apache/aurora/scheduler
>>>>>> /mesos/VersionedSchedulerDriverService.java#L50
>>>>>>
>>>>>> On Tue, Jan 9, 2018 at 1:21 PM, Mohit Jaggi <mo...@uber.com>
>>>>>> wrote:
>>>>>>
>>>>>>> David,
>>>>>>> Where can I find this code?
>>>>>>>
>>>>>>> Mohit.
>>>>>>>
>>>>>>> On Sat, Dec 9, 2017 at 4:27 PM, David McLaughlin <
>>>>>>> dmclaughlin@apache.org> wrote:
>>>>>>>
>>>>>>>> The new API is present in Aurora in a compatibility layer, but the
>>>>>>>> HTTP performance issues still exist so we can't make it the default.
>>>>>>>>
>>>>>>>> On Sat, Dec 9, 2017 at 4:24 PM, Bill Farner <wf...@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Aurora pre-dates SHUTDOWN by several years, so the option was not
>>>>>>>>> present.  Additionally, the SHUTDOWN call is not available in the API used
>>>>>>>>> by Aurora.  Last i knew, Aurora could not use the "new" API because of
>>>>>>>>> performance issues in the implementation, but i do not know where that
>>>>>>>>> stands today.
>>>>>>>>>
>>>>>>>>> https://mesos.apache.org/documentation/latest/scheduler-http
>>>>>>>>> -api/#shutdown
>>>>>>>>>
>>>>>>>>>> NOTE: This is a new call that was not present in the old API
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sat, Dec 9, 2017 at 4:11 PM, Mohit Jaggi <mo...@uber.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Folks,
>>>>>>>>>> Our Mesos team is wondering why Aurora chose KILL over SHUTDOWN
>>>>>>>>>> for killing tasks. As Aurora has an executor per task, won't SHUTDOWN work
>>>>>>>>>> better? It will avoid zombie executors.
>>>>>>>>>>
>>>>>>>>>> Mohit.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: shutdown vs kill API is Mesos

Posted by Mohit Jaggi <mo...@uber.com>.
David,
- LCD makes sense. Does that mean that Twitter is using the SCHEDULER_DRIVER
<https://github.com/apache/aurora/blob/aae2b0dc73b7534c66982ed07b1f029150e245de/src/main/java/org/apache/aurora/scheduler/mesos/SchedulerDriverModule.java#L72>
 version?
- I don't see Bill's proposal on this thread. Did I miss it?

Renan,
VersionedDriverFactory
<https://github.com/apache/aurora/blob/2e1ca42887bc8ea1e8c6cddebe9d1cf29268c714/src/main/java/org/apache/aurora/scheduler/mesos/VersionedDriverFactory.java#L24>'s
comments indicate that libmesos is still used. What am I missing?

BTW, with the patch for Thermos (from Stephan I think), the need for
switching to SHUTDOWN is reduced.
Mohit.

On Thu, Jan 11, 2018 at 2:01 PM, David McLaughlin <dm...@apache.org>
wrote:

> Sorry, the other approach outlined by Bill would in theory work too, but
> it sounds like in practice it also needs more changes on the Mesos side.
>
> On Thu, Jan 11, 2018 at 1:55 PM, David McLaughlin <dm...@apache.org>
> wrote:
>
>> Right. In order to keep the current abstraction in Aurora (both APIs), we
>> obviously have to bind to the lower common denominator API methods. So the
>> only way to integrate with shutdown will be to fix the performance issues
>> so we can switch to the new API.
>>
>> The performance issue we ran into at Twitter was that with status updates
>> that were similar to our production volume, they started to get dropped and
>> tasks end up being LOST and unnecessarily killed. So it's a definite
>> blocker for us to adopt in its current state. We have someone who has
>> fixing this on the Mesos side in their backlog, but it's currently not the
>> highest priority for us.
>>
>> On Thu, Jan 11, 2018 at 1:45 PM, Renan DelValle <renanidelvalle@gmail.com
>> > wrote:
>>
>>> The HTTP API is what is used under the hood for V0 and V1 (instead of
>>> libmesos), I believe that's what David was referencing when he mentioned
>>> the HTTP performance issues. Here's a better explanation from the original
>>> patch submitted by Zameer: https://github.com/apa
>>> che/aurora/commit/705dbc7cd7c3ff477bcf766cdafe49a68ab47dee#d
>>> iff-75bd5a98db87502a2332e9110d2eafc6
>>>
>>> I'm not sure about the Shutdown call, as you mentioned, the versioned
>>> driver seems to have the method but the driver interface does not. This
>>> might get tricky from here on in since Mesos has V1 only compatible calls.
>>>
>>> On Thu, Jan 11, 2018 at 1:24 PM, Mohit Jaggi <mo...@uber.com>
>>> wrote:
>>>
>>>> Thanks Renan. I saw that code. "Driver" interface does not have
>>>> SHUTDOWN...so it is not "compatible". I was trying to change to
>>>> VersionedSchedulerDriverService all over the code (that wreaks havoc
>>>> across the tests!) but Mesos's Java wrapper doesn't seem to have that
>>>> call either. Perhaps, that is why David referred to the HTTP API.
>>>>
>>>> On Thu, Jan 11, 2018 at 1:14 PM, Renan DelValle <
>>>> renanidelvalle@gmail.com> wrote:
>>>>
>>>>> https://github.com/apache/aurora/blob/aae2b0dc73b7534c66982e
>>>>> d07b1f029150e245de/src/main/java/org/apache/aurora/scheduler
>>>>> /mesos/SchedulerDriverModule.java
>>>>>
>>>>> https://github.com/apache/aurora/blob/aae2b0dc73b7534c66982e
>>>>> d07b1f029150e245de/src/main/java/org/apache/aurora/scheduler
>>>>> /mesos/VersionedSchedulerDriverService.java#L50
>>>>>
>>>>> On Tue, Jan 9, 2018 at 1:21 PM, Mohit Jaggi <mo...@uber.com>
>>>>> wrote:
>>>>>
>>>>>> David,
>>>>>> Where can I find this code?
>>>>>>
>>>>>> Mohit.
>>>>>>
>>>>>> On Sat, Dec 9, 2017 at 4:27 PM, David McLaughlin <
>>>>>> dmclaughlin@apache.org> wrote:
>>>>>>
>>>>>>> The new API is present in Aurora in a compatibility layer, but the
>>>>>>> HTTP performance issues still exist so we can't make it the default.
>>>>>>>
>>>>>>> On Sat, Dec 9, 2017 at 4:24 PM, Bill Farner <wf...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Aurora pre-dates SHUTDOWN by several years, so the option was not
>>>>>>>> present.  Additionally, the SHUTDOWN call is not available in the API used
>>>>>>>> by Aurora.  Last i knew, Aurora could not use the "new" API because of
>>>>>>>> performance issues in the implementation, but i do not know where that
>>>>>>>> stands today.
>>>>>>>>
>>>>>>>> https://mesos.apache.org/documentation/latest/scheduler-http
>>>>>>>> -api/#shutdown
>>>>>>>>
>>>>>>>>> NOTE: This is a new call that was not present in the old API
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sat, Dec 9, 2017 at 4:11 PM, Mohit Jaggi <mo...@uber.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Folks,
>>>>>>>>> Our Mesos team is wondering why Aurora chose KILL over SHUTDOWN
>>>>>>>>> for killing tasks. As Aurora has an executor per task, won't SHUTDOWN work
>>>>>>>>> better? It will avoid zombie executors.
>>>>>>>>>
>>>>>>>>> Mohit.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: shutdown vs kill API is Mesos

Posted by David McLaughlin <dm...@apache.org>.
Sorry, the other approach outlined by Bill would in theory work too, but it
sounds like in practice it also needs more changes on the Mesos side.

On Thu, Jan 11, 2018 at 1:55 PM, David McLaughlin <dm...@apache.org>
wrote:

> Right. In order to keep the current abstraction in Aurora (both APIs), we
> obviously have to bind to the lower common denominator API methods. So the
> only way to integrate with shutdown will be to fix the performance issues
> so we can switch to the new API.
>
> The performance issue we ran into at Twitter was that with status updates
> that were similar to our production volume, they started to get dropped and
> tasks end up being LOST and unnecessarily killed. So it's a definite
> blocker for us to adopt in its current state. We have someone who has
> fixing this on the Mesos side in their backlog, but it's currently not the
> highest priority for us.
>
> On Thu, Jan 11, 2018 at 1:45 PM, Renan DelValle <re...@gmail.com>
> wrote:
>
>> The HTTP API is what is used under the hood for V0 and V1 (instead of
>> libmesos), I believe that's what David was referencing when he mentioned
>> the HTTP performance issues. Here's a better explanation from the original
>> patch submitted by Zameer: https://github.com/apa
>> che/aurora/commit/705dbc7cd7c3ff477bcf766cdafe49a68ab47dee#
>> diff-75bd5a98db87502a2332e9110d2eafc6
>>
>> I'm not sure about the Shutdown call, as you mentioned, the versioned
>> driver seems to have the method but the driver interface does not. This
>> might get tricky from here on in since Mesos has V1 only compatible calls.
>>
>> On Thu, Jan 11, 2018 at 1:24 PM, Mohit Jaggi <mo...@uber.com>
>> wrote:
>>
>>> Thanks Renan. I saw that code. "Driver" interface does not have
>>> SHUTDOWN...so it is not "compatible". I was trying to change to
>>> VersionedSchedulerDriverService all over the code (that wreaks havoc
>>> across the tests!) but Mesos's Java wrapper doesn't seem to have that
>>> call either. Perhaps, that is why David referred to the HTTP API.
>>>
>>> On Thu, Jan 11, 2018 at 1:14 PM, Renan DelValle <
>>> renanidelvalle@gmail.com> wrote:
>>>
>>>> https://github.com/apache/aurora/blob/aae2b0dc73b7534c66982e
>>>> d07b1f029150e245de/src/main/java/org/apache/aurora/scheduler
>>>> /mesos/SchedulerDriverModule.java
>>>>
>>>> https://github.com/apache/aurora/blob/aae2b0dc73b7534c66982e
>>>> d07b1f029150e245de/src/main/java/org/apache/aurora/scheduler
>>>> /mesos/VersionedSchedulerDriverService.java#L50
>>>>
>>>> On Tue, Jan 9, 2018 at 1:21 PM, Mohit Jaggi <mo...@uber.com>
>>>> wrote:
>>>>
>>>>> David,
>>>>> Where can I find this code?
>>>>>
>>>>> Mohit.
>>>>>
>>>>> On Sat, Dec 9, 2017 at 4:27 PM, David McLaughlin <
>>>>> dmclaughlin@apache.org> wrote:
>>>>>
>>>>>> The new API is present in Aurora in a compatibility layer, but the
>>>>>> HTTP performance issues still exist so we can't make it the default.
>>>>>>
>>>>>> On Sat, Dec 9, 2017 at 4:24 PM, Bill Farner <wf...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Aurora pre-dates SHUTDOWN by several years, so the option was not
>>>>>>> present.  Additionally, the SHUTDOWN call is not available in the API used
>>>>>>> by Aurora.  Last i knew, Aurora could not use the "new" API because of
>>>>>>> performance issues in the implementation, but i do not know where that
>>>>>>> stands today.
>>>>>>>
>>>>>>> https://mesos.apache.org/documentation/latest/scheduler-http
>>>>>>> -api/#shutdown
>>>>>>>
>>>>>>>> NOTE: This is a new call that was not present in the old API
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Dec 9, 2017 at 4:11 PM, Mohit Jaggi <mo...@uber.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Folks,
>>>>>>>> Our Mesos team is wondering why Aurora chose KILL over SHUTDOWN for
>>>>>>>> killing tasks. As Aurora has an executor per task, won't SHUTDOWN work
>>>>>>>> better? It will avoid zombie executors.
>>>>>>>>
>>>>>>>> Mohit.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: shutdown vs kill API is Mesos

Posted by David McLaughlin <dm...@apache.org>.
Right. In order to keep the current abstraction in Aurora (both APIs), we
obviously have to bind to the lower common denominator API methods. So the
only way to integrate with shutdown will be to fix the performance issues
so we can switch to the new API.

The performance issue we ran into at Twitter was that with status updates
that were similar to our production volume, they started to get dropped and
tasks end up being LOST and unnecessarily killed. So it's a definite
blocker for us to adopt in its current state. We have someone who has
fixing this on the Mesos side in their backlog, but it's currently not the
highest priority for us.

On Thu, Jan 11, 2018 at 1:45 PM, Renan DelValle <re...@gmail.com>
wrote:

> The HTTP API is what is used under the hood for V0 and V1 (instead of
> libmesos), I believe that's what David was referencing when he mentioned
> the HTTP performance issues. Here's a better explanation from the original
> patch submitted by Zameer: https://github.com/apache/aurora/commit/
> 705dbc7cd7c3ff477bcf766cdafe49a68ab47dee#diff-
> 75bd5a98db87502a2332e9110d2eafc6
>
> I'm not sure about the Shutdown call, as you mentioned, the versioned
> driver seems to have the method but the driver interface does not. This
> might get tricky from here on in since Mesos has V1 only compatible calls.
>
> On Thu, Jan 11, 2018 at 1:24 PM, Mohit Jaggi <mo...@uber.com> wrote:
>
>> Thanks Renan. I saw that code. "Driver" interface does not have
>> SHUTDOWN...so it is not "compatible". I was trying to change to
>> VersionedSchedulerDriverService all over the code (that wreaks havoc
>> across the tests!) but Mesos's Java wrapper doesn't seem to have that
>> call either. Perhaps, that is why David referred to the HTTP API.
>>
>> On Thu, Jan 11, 2018 at 1:14 PM, Renan DelValle <renanidelvalle@gmail.com
>> > wrote:
>>
>>> https://github.com/apache/aurora/blob/aae2b0dc73b7534c66982e
>>> d07b1f029150e245de/src/main/java/org/apache/aurora/scheduler
>>> /mesos/SchedulerDriverModule.java
>>>
>>> https://github.com/apache/aurora/blob/aae2b0dc73b7534c66982e
>>> d07b1f029150e245de/src/main/java/org/apache/aurora/scheduler
>>> /mesos/VersionedSchedulerDriverService.java#L50
>>>
>>> On Tue, Jan 9, 2018 at 1:21 PM, Mohit Jaggi <mo...@uber.com>
>>> wrote:
>>>
>>>> David,
>>>> Where can I find this code?
>>>>
>>>> Mohit.
>>>>
>>>> On Sat, Dec 9, 2017 at 4:27 PM, David McLaughlin <
>>>> dmclaughlin@apache.org> wrote:
>>>>
>>>>> The new API is present in Aurora in a compatibility layer, but the
>>>>> HTTP performance issues still exist so we can't make it the default.
>>>>>
>>>>> On Sat, Dec 9, 2017 at 4:24 PM, Bill Farner <wf...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Aurora pre-dates SHUTDOWN by several years, so the option was not
>>>>>> present.  Additionally, the SHUTDOWN call is not available in the API used
>>>>>> by Aurora.  Last i knew, Aurora could not use the "new" API because of
>>>>>> performance issues in the implementation, but i do not know where that
>>>>>> stands today.
>>>>>>
>>>>>> https://mesos.apache.org/documentation/latest/scheduler-http
>>>>>> -api/#shutdown
>>>>>>
>>>>>>> NOTE: This is a new call that was not present in the old API
>>>>>>
>>>>>>
>>>>>> On Sat, Dec 9, 2017 at 4:11 PM, Mohit Jaggi <mo...@uber.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Folks,
>>>>>>> Our Mesos team is wondering why Aurora chose KILL over SHUTDOWN for
>>>>>>> killing tasks. As Aurora has an executor per task, won't SHUTDOWN work
>>>>>>> better? It will avoid zombie executors.
>>>>>>>
>>>>>>> Mohit.
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: shutdown vs kill API is Mesos

Posted by Renan DelValle <re...@gmail.com>.
The HTTP API is what is used under the hood for V0 and V1 (instead of
libmesos), I believe that's what David was referencing when he mentioned
the HTTP performance issues. Here's a better explanation from the original
patch submitted by Zameer:
https://github.com/apache/aurora/commit/705dbc7cd7c3ff477bcf766cdafe49a68ab47dee#diff-75bd5a98db87502a2332e9110d2eafc6

I'm not sure about the Shutdown call, as you mentioned, the versioned
driver seems to have the method but the driver interface does not. This
might get tricky from here on in since Mesos has V1 only compatible calls.

On Thu, Jan 11, 2018 at 1:24 PM, Mohit Jaggi <mo...@uber.com> wrote:

> Thanks Renan. I saw that code. "Driver" interface does not have
> SHUTDOWN...so it is not "compatible". I was trying to change to
> VersionedSchedulerDriverService all over the code (that wreaks havoc
> across the tests!) but Mesos's Java wrapper doesn't seem to have that
> call either. Perhaps, that is why David referred to the HTTP API.
>
> On Thu, Jan 11, 2018 at 1:14 PM, Renan DelValle <re...@gmail.com>
> wrote:
>
>> https://github.com/apache/aurora/blob/aae2b0dc73b7534c66982e
>> d07b1f029150e245de/src/main/java/org/apache/aurora/
>> scheduler/mesos/SchedulerDriverModule.java
>>
>> https://github.com/apache/aurora/blob/aae2b0dc73b7534c66982e
>> d07b1f029150e245de/src/main/java/org/apache/aurora/
>> scheduler/mesos/VersionedSchedulerDriverService.java#L50
>>
>> On Tue, Jan 9, 2018 at 1:21 PM, Mohit Jaggi <mo...@uber.com> wrote:
>>
>>> David,
>>> Where can I find this code?
>>>
>>> Mohit.
>>>
>>> On Sat, Dec 9, 2017 at 4:27 PM, David McLaughlin <dmclaughlin@apache.org
>>> > wrote:
>>>
>>>> The new API is present in Aurora in a compatibility layer, but the HTTP
>>>> performance issues still exist so we can't make it the default.
>>>>
>>>> On Sat, Dec 9, 2017 at 4:24 PM, Bill Farner <wf...@apache.org> wrote:
>>>>
>>>>> Aurora pre-dates SHUTDOWN by several years, so the option was not
>>>>> present.  Additionally, the SHUTDOWN call is not available in the API used
>>>>> by Aurora.  Last i knew, Aurora could not use the "new" API because of
>>>>> performance issues in the implementation, but i do not know where that
>>>>> stands today.
>>>>>
>>>>> https://mesos.apache.org/documentation/latest/scheduler-http
>>>>> -api/#shutdown
>>>>>
>>>>>> NOTE: This is a new call that was not present in the old API
>>>>>
>>>>>
>>>>> On Sat, Dec 9, 2017 at 4:11 PM, Mohit Jaggi <mo...@uber.com>
>>>>> wrote:
>>>>>
>>>>>> Folks,
>>>>>> Our Mesos team is wondering why Aurora chose KILL over SHUTDOWN for
>>>>>> killing tasks. As Aurora has an executor per task, won't SHUTDOWN work
>>>>>> better? It will avoid zombie executors.
>>>>>>
>>>>>> Mohit.
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: shutdown vs kill API is Mesos

Posted by Mohit Jaggi <mo...@uber.com>.
Thanks Renan. I saw that code. "Driver" interface does not have
SHUTDOWN...so it is not "compatible". I was trying to change to
VersionedSchedulerDriverService all over the code (that wreaks havoc across
the tests!) but Mesos's Java wrapper <Thanks Vinod. Is there a
V1SchedulerDriver.java file? I see
https://github.com/apache/mesos/tree/72752fc6deb8ebcbfbd5448dc599ef3774339d31/src/java/src/org/apache/mesos/v1/scheduler>
doesn't seem to have that call either. Perhaps, that is why David referred
to the HTTP API.

On Thu, Jan 11, 2018 at 1:14 PM, Renan DelValle <re...@gmail.com>
wrote:

> https://github.com/apache/aurora/blob/aae2b0dc73b7534c66982ed07b1f02
> 9150e245de/src/main/java/org/apache/aurora/scheduler/mesos/
> SchedulerDriverModule.java
>
> https://github.com/apache/aurora/blob/aae2b0dc73b7534c66982ed07b1f02
> 9150e245de/src/main/java/org/apache/aurora/scheduler/mesos/
> VersionedSchedulerDriverService.java#L50
>
> On Tue, Jan 9, 2018 at 1:21 PM, Mohit Jaggi <mo...@uber.com> wrote:
>
>> David,
>> Where can I find this code?
>>
>> Mohit.
>>
>> On Sat, Dec 9, 2017 at 4:27 PM, David McLaughlin <dm...@apache.org>
>> wrote:
>>
>>> The new API is present in Aurora in a compatibility layer, but the HTTP
>>> performance issues still exist so we can't make it the default.
>>>
>>> On Sat, Dec 9, 2017 at 4:24 PM, Bill Farner <wf...@apache.org> wrote:
>>>
>>>> Aurora pre-dates SHUTDOWN by several years, so the option was not
>>>> present.  Additionally, the SHUTDOWN call is not available in the API used
>>>> by Aurora.  Last i knew, Aurora could not use the "new" API because of
>>>> performance issues in the implementation, but i do not know where that
>>>> stands today.
>>>>
>>>> https://mesos.apache.org/documentation/latest/scheduler-http
>>>> -api/#shutdown
>>>>
>>>>> NOTE: This is a new call that was not present in the old API
>>>>
>>>>
>>>> On Sat, Dec 9, 2017 at 4:11 PM, Mohit Jaggi <mo...@uber.com>
>>>> wrote:
>>>>
>>>>> Folks,
>>>>> Our Mesos team is wondering why Aurora chose KILL over SHUTDOWN for
>>>>> killing tasks. As Aurora has an executor per task, won't SHUTDOWN work
>>>>> better? It will avoid zombie executors.
>>>>>
>>>>> Mohit.
>>>>>
>>>>
>>>>
>>>
>>
>

Re: shutdown vs kill API is Mesos

Posted by Renan DelValle <re...@gmail.com>.
https://github.com/apache/aurora/blob/aae2b0dc73b7534c66982ed07b1f029150e245de/src/main/java/org/apache/aurora/scheduler/mesos/SchedulerDriverModule.java

https://github.com/apache/aurora/blob/aae2b0dc73b7534c66982ed07b1f029150e245de/src/main/java/org/apache/aurora/scheduler/mesos/VersionedSchedulerDriverService.java#L50

On Tue, Jan 9, 2018 at 1:21 PM, Mohit Jaggi <mo...@uber.com> wrote:

> David,
> Where can I find this code?
>
> Mohit.
>
> On Sat, Dec 9, 2017 at 4:27 PM, David McLaughlin <dm...@apache.org>
> wrote:
>
>> The new API is present in Aurora in a compatibility layer, but the HTTP
>> performance issues still exist so we can't make it the default.
>>
>> On Sat, Dec 9, 2017 at 4:24 PM, Bill Farner <wf...@apache.org> wrote:
>>
>>> Aurora pre-dates SHUTDOWN by several years, so the option was not
>>> present.  Additionally, the SHUTDOWN call is not available in the API used
>>> by Aurora.  Last i knew, Aurora could not use the "new" API because of
>>> performance issues in the implementation, but i do not know where that
>>> stands today.
>>>
>>> https://mesos.apache.org/documentation/latest/scheduler-http
>>> -api/#shutdown
>>>
>>>> NOTE: This is a new call that was not present in the old API
>>>
>>>
>>> On Sat, Dec 9, 2017 at 4:11 PM, Mohit Jaggi <mo...@uber.com>
>>> wrote:
>>>
>>>> Folks,
>>>> Our Mesos team is wondering why Aurora chose KILL over SHUTDOWN for
>>>> killing tasks. As Aurora has an executor per task, won't SHUTDOWN work
>>>> better? It will avoid zombie executors.
>>>>
>>>> Mohit.
>>>>
>>>
>>>
>>
>

Re: shutdown vs kill API is Mesos

Posted by Mohit Jaggi <mo...@uber.com>.
David,
Where can I find this code?

Mohit.

On Sat, Dec 9, 2017 at 4:27 PM, David McLaughlin <dm...@apache.org>
wrote:

> The new API is present in Aurora in a compatibility layer, but the HTTP
> performance issues still exist so we can't make it the default.
>
> On Sat, Dec 9, 2017 at 4:24 PM, Bill Farner <wf...@apache.org> wrote:
>
>> Aurora pre-dates SHUTDOWN by several years, so the option was not
>> present.  Additionally, the SHUTDOWN call is not available in the API used
>> by Aurora.  Last i knew, Aurora could not use the "new" API because of
>> performance issues in the implementation, but i do not know where that
>> stands today.
>>
>> https://mesos.apache.org/documentation/latest/scheduler-
>> http-api/#shutdown
>>
>>> NOTE: This is a new call that was not present in the old API
>>
>>
>> On Sat, Dec 9, 2017 at 4:11 PM, Mohit Jaggi <mo...@uber.com> wrote:
>>
>>> Folks,
>>> Our Mesos team is wondering why Aurora chose KILL over SHUTDOWN for
>>> killing tasks. As Aurora has an executor per task, won't SHUTDOWN work
>>> better? It will avoid zombie executors.
>>>
>>> Mohit.
>>>
>>
>>
>

Re: shutdown vs kill API is Mesos

Posted by Mohit Jaggi <mo...@uber.com>.
Filed https://issues.apache.org/jira/browse/AURORA-1960

On Sat, Dec 9, 2017 at 4:45 PM, Bill Farner <wf...@apache.org> wrote:

> The new API is present in Aurora in a compatibility layer
>
>
> Aha!  I had not explored that code
> <https://github.com/apache/aurora/blob/47c689956f77ed635d26f7ec659689002bd047af/src/main/java/org/apache/aurora/scheduler/mesos/VersionedSchedulerDriverService.java#L180-L185>
> yet.  It does seem that SHUTDOWN provides the behavior that we aim for
> when killing tasks.  The global executor shutdown timeout (
> --executor_shutdown_grace_period) potentially interferes with our
> graceful_shutdown_wait_secs job-level configuration.  However, an
> operator could use the former as an upper limit to the latter.
>
> From what i see, i'd support a patch to switch to SHUTDOWN when using
> DriverKind.V0_DRIVER or DriverKind.V1_DRIVER.
>
> On Sat, Dec 9, 2017 at 4:27 PM, David McLaughlin <dm...@apache.org>
> wrote:
>
>> The new API is present in Aurora in a compatibility layer, but the HTTP
>> performance issues still exist so we can't make it the default.
>>
>> On Sat, Dec 9, 2017 at 4:24 PM, Bill Farner <wf...@apache.org> wrote:
>>
>>> Aurora pre-dates SHUTDOWN by several years, so the option was not
>>> present.  Additionally, the SHUTDOWN call is not available in the API used
>>> by Aurora.  Last i knew, Aurora could not use the "new" API because of
>>> performance issues in the implementation, but i do not know where that
>>> stands today.
>>>
>>> https://mesos.apache.org/documentation/latest/scheduler-http
>>> -api/#shutdown
>>>
>>>> NOTE: This is a new call that was not present in the old API
>>>
>>>
>>> On Sat, Dec 9, 2017 at 4:11 PM, Mohit Jaggi <mo...@uber.com>
>>> wrote:
>>>
>>>> Folks,
>>>> Our Mesos team is wondering why Aurora chose KILL over SHUTDOWN for
>>>> killing tasks. As Aurora has an executor per task, won't SHUTDOWN work
>>>> better? It will avoid zombie executors.
>>>>
>>>> Mohit.
>>>>
>>>
>>>
>>
>

Re: shutdown vs kill API is Mesos

Posted by Bill Farner <wf...@apache.org>.
>
> The new API is present in Aurora in a compatibility layer


Aha!  I had not explored that code
<https://github.com/apache/aurora/blob/47c689956f77ed635d26f7ec659689002bd047af/src/main/java/org/apache/aurora/scheduler/mesos/VersionedSchedulerDriverService.java#L180-L185>
yet.  It does seem that SHUTDOWN provides the behavior that we aim for when
killing tasks.  The global executor shutdown timeout (
--executor_shutdown_grace_period) potentially interferes with our
graceful_shutdown_wait_secs job-level configuration.  However, an operator
could use the former as an upper limit to the latter.

From what i see, i'd support a patch to switch to SHUTDOWN when using
DriverKind.V0_DRIVER or DriverKind.V1_DRIVER.

On Sat, Dec 9, 2017 at 4:27 PM, David McLaughlin <dm...@apache.org>
wrote:

> The new API is present in Aurora in a compatibility layer, but the HTTP
> performance issues still exist so we can't make it the default.
>
> On Sat, Dec 9, 2017 at 4:24 PM, Bill Farner <wf...@apache.org> wrote:
>
>> Aurora pre-dates SHUTDOWN by several years, so the option was not
>> present.  Additionally, the SHUTDOWN call is not available in the API used
>> by Aurora.  Last i knew, Aurora could not use the "new" API because of
>> performance issues in the implementation, but i do not know where that
>> stands today.
>>
>> https://mesos.apache.org/documentation/latest/scheduler-
>> http-api/#shutdown
>>
>>> NOTE: This is a new call that was not present in the old API
>>
>>
>> On Sat, Dec 9, 2017 at 4:11 PM, Mohit Jaggi <mo...@uber.com> wrote:
>>
>>> Folks,
>>> Our Mesos team is wondering why Aurora chose KILL over SHUTDOWN for
>>> killing tasks. As Aurora has an executor per task, won't SHUTDOWN work
>>> better? It will avoid zombie executors.
>>>
>>> Mohit.
>>>
>>
>>
>

Re: shutdown vs kill API is Mesos

Posted by David McLaughlin <dm...@apache.org>.
The new API is present in Aurora in a compatibility layer, but the HTTP
performance issues still exist so we can't make it the default.

On Sat, Dec 9, 2017 at 4:24 PM, Bill Farner <wf...@apache.org> wrote:

> Aurora pre-dates SHUTDOWN by several years, so the option was not
> present.  Additionally, the SHUTDOWN call is not available in the API used
> by Aurora.  Last i knew, Aurora could not use the "new" API because of
> performance issues in the implementation, but i do not know where that
> stands today.
>
> https://mesos.apache.org/documentation/latest/scheduler-http-api/#shutdown
>
>> NOTE: This is a new call that was not present in the old API
>
>
> On Sat, Dec 9, 2017 at 4:11 PM, Mohit Jaggi <mo...@uber.com> wrote:
>
>> Folks,
>> Our Mesos team is wondering why Aurora chose KILL over SHUTDOWN for
>> killing tasks. As Aurora has an executor per task, won't SHUTDOWN work
>> better? It will avoid zombie executors.
>>
>> Mohit.
>>
>
>

Re: shutdown vs kill API is Mesos

Posted by Bill Farner <wf...@apache.org>.
Aurora pre-dates SHUTDOWN by several years, so the option was not present.
Additionally, the SHUTDOWN call is not available in the API used by
Aurora.  Last i knew, Aurora could not use the "new" API because of
performance issues in the implementation, but i do not know where that
stands today.

https://mesos.apache.org/documentation/latest/scheduler-http-api/#shutdown

> NOTE: This is a new call that was not present in the old API


On Sat, Dec 9, 2017 at 4:11 PM, Mohit Jaggi <mo...@uber.com> wrote:

> Folks,
> Our Mesos team is wondering why Aurora chose KILL over SHUTDOWN for
> killing tasks. As Aurora has an executor per task, won't SHUTDOWN work
> better? It will avoid zombie executors.
>
> Mohit.
>