You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@aurora.apache.org by Kevin Burg <kb...@foursquare.com> on 2014/07/12 02:24:20 UTC

Task Constraints

Hi,

I'm having trouble getting the task constraint resolver worker with
attributes other than 'host' and 'rack.' Are arbitrary attribute keys in
the mesos slaves supported currently?

Here is the setup.

The slaves are configured to run with
`--attributes=host:<host>;rack:<rack>;staging:true`

(I've also tried this with staging:1, and staging:foo)

The constraint generated from the .aurora config looks like the following
Constraint(name:staging, constraint:<TaskConstraint
value:ValueConstraint(negated:false, values:[true])>)

The schedule request then gets vetoed with the following veto object:
Veto{reason=Constraint not satisfied: staging, score=1000,
valueMismatch=true}]

The constraints generated for 'host' and 'rack' look identical except for
the different name of course. I've even tried bouncing every mesos and
aurora process on the machine to see if maybe stale attributes were being
assigned to the slaves. All the offers being made to the master look
correct though, which leads me to believe that the constraint solver just
doesn't work for arbitrary attributes.

We would appreciate any help you can offer.

Thanks,
Kevin

Re: Task Constraints

Posted by Bill Farner <wf...@apache.org>.
I've taken on the ticket and have a fix posted, hopefully to be committed
today.

-=Bill


On Wed, Jul 16, 2014 at 12:21 PM, Josh Adams <jo...@foursquare.com> wrote:

> +Leo Kim who is looking at the compiler error with us.
>
>
> On Wed, Jul 16, 2014 at 8:25 AM, Kevin Burg <kb...@foursquare.com> wrote:
>
> > The idea with the fix is to read the slave's attributes right off the
> > offer rather than going into 'AttributeStore' and keying on the slave's
> > name. The slave's resources are read off the offer in this way, so I
> don't
> > see why it can't be done with attributes as well.
> >
> > Someone who understands all the places where SchedulingFilter.filter is
> > used might be able to fix this better than I can.
> >
> >
> > On Wed, Jul 16, 2014 at 6:40 AM, Josh Adams <jo...@foursquare.com> wrote:
> >
> >> Hi there,
> >>
> >> Given that we would need to disrupt running jobs to add constraints in
> >> the future we are blocking on
> >> https://issues.apache.org/jira/browse/AURORA-582 before we can push any
> >> of our services on to Aurora in production.
> >>
> >> Kevin Burg attempted to resolve the related bug
> >> https://issues.apache.org/jira/browse/AURORA-328 by making some changes
> >> here:
> >>
> https://github.com/foursquare/incubator-aurora/commit/b1962fad3fe9ef76954fa107abed25d78b809331
> >> but we seem to be getting a type mismatch when compiling the code.
> >>
> >> Any help and/or info on the bugfix progress would be much appreciated.
> >> Aside from AURORA-582 we are ready to roll (pun intended!)
> >>
> >> Best,
> >> Josh
> >>
> >>
> >> On Mon, Jul 14, 2014 at 11:42 AM, Josh Adams <jo...@foursquare.com>
> wrote:
> >>
> >>> Ah, makes sense. We'll try that. Thanks for clarifying this Kevin.
> >>>
> >>> Josh
> >>>
> >>>
> >>> On Mon, Jul 14, 2014 at 11:30 AM, Kevin Sweeney <ke...@apache.org>
> >>> wrote:
> >>>
> >>>> Slaves persist their attributes (including attributes) across restarts
> >>>> due to slave recovery (that's what allows you to upgrade mesos
> in-place
> >>>> without killing the tasks they're managing). Unfortunately to change
> >>>> attributes you need to remove persisted slave metadata (the "meta"
> >>>> directory). This will kill all of a slave's underlying tasks but the
> newly
> >>>> registered slave should have the correct attributes.
> >>>>
> >>>>
> >>>> On Mon, Jul 14, 2014 at 11:26 AM, Kevin Burg <kb...@foursquare.com>
> >>>> wrote:
> >>>>
> >>>>> I've confirmed by looking at that endpoint that new attributes are
> not
> >>>>> being picked up and modified attributes are retaining their old
> values.
> >>>>> This is after restarting both the slaves and the scheduler process.
> >>>>>
> >>>>>
> >>>>> On Mon, Jul 14, 2014 at 11:09 AM, Josh Adams <jo...@foursquare.com>
> >>>>> wrote:
> >>>>>
> >>>>> > Thanks Brian. Kevin should have some followup questions shortly.
> >>>>> >
> >>>>> > Josh
> >>>>> >
> >>>>> >
> >>>>> > On Mon, Jul 14, 2014 at 10:37 AM, Brian Wickman <
> wickman@apache.org>
> >>>>> > wrote:
> >>>>> >
> >>>>> >> host/rack should not be treated specially.
> >>>>> >>
> >>>>> >> If you go to the "/slaves" endpoint on the scheduler UI, what does
> >>>>> it
> >>>>> >> report as attributes being exported by your slaves?  You might
> want
> >>>>> to
> >>>>> >> validate there that the "staging" attribute got picked up
> properly.
> >>>>>  If
> >>>>> >> it's not getting picked up (e.g. the attributes are getting cached
> >>>>> >> incorrectly by the scheduler?) then you should file an issue.
> >>>>> >>
> >>>>> >>
> >>>>> >> On Fri, Jul 11, 2014 at 5:24 PM, Kevin Burg <kburg@foursquare.com
> >
> >>>>> wrote:
> >>>>> >>
> >>>>> >>> Hi,
> >>>>> >>>
> >>>>> >>> I'm having trouble getting the task constraint resolver worker
> with
> >>>>> >>> attributes other than 'host' and 'rack.' Are arbitrary attribute
> >>>>> keys in
> >>>>> >>> the mesos slaves supported currently?
> >>>>> >>>
> >>>>> >>> Here is the setup.
> >>>>> >>>
> >>>>> >>> The slaves are configured to run with
> >>>>> >>> `--attributes=host:<host>;rack:<rack>;staging:true`
> >>>>> >>>
> >>>>> >>> (I've also tried this with staging:1, and staging:foo)
> >>>>> >>>
> >>>>> >>> The constraint generated from the .aurora config looks like the
> >>>>> following
> >>>>> >>> Constraint(name:staging, constraint:<TaskConstraint
> >>>>> >>> value:ValueConstraint(negated:false, values:[true])>)
> >>>>> >>>
> >>>>> >>> The schedule request then gets vetoed with the following veto
> >>>>> object:
> >>>>> >>> Veto{reason=Constraint not satisfied: staging, score=1000,
> >>>>> >>> valueMismatch=true}]
> >>>>> >>>
> >>>>> >>> The constraints generated for 'host' and 'rack' look identical
> >>>>> except for
> >>>>> >>> the different name of course. I've even tried bouncing every
> mesos
> >>>>> and
> >>>>> >>> aurora process on the machine to see if maybe stale attributes
> >>>>> were being
> >>>>> >>> assigned to the slaves. All the offers being made to the master
> >>>>> look
> >>>>> >>> correct though, which leads me to believe that the constraint
> >>>>> solver just
> >>>>> >>> doesn't work for arbitrary attributes.
> >>>>> >>>
> >>>>> >>> We would appreciate any help you can offer.
> >>>>> >>>
> >>>>> >>> Thanks,
> >>>>> >>> Kevin
> >>>>> >>>
> >>>>> >>
> >>>>> >>
> >>>>> >
> >>>>> >
> >>>>> > --
> >>>>> > ===============
> >>>>> > josh adams
> >>>>> > production engineer
> >>>>> > foursquare
> >>>>> >
> >>>>> > (gv) 415-830-4106
> >>>>> > ===============
> >>>>> > foursquare.com/jobs
> >>>>> >
> >>>>>
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> ===============
> >>> josh adams
> >>> production engineer
> >>> foursquare
> >>>
> >>> (gv) 415-830-4106
> >>> ===============
> >>> foursquare.com/jobs
> >>>
> >>
> >>
> >>
> >> --
> >> ===============
> >> josh adams
> >> production engineer
> >> foursquare
> >>
> >> (gv) 415-830-4106
> >> ===============
> >> foursquare.com/jobs
> >>
> >
> >
>
>
> --
> ===============
> josh adams
> production engineer
> foursquare
>
> (gv) 415-830-4106
> ===============
> foursquare.com/jobs
>

Re: Task Constraints

Posted by Josh Adams <jo...@foursquare.com>.
+Leo Kim who is looking at the compiler error with us.


On Wed, Jul 16, 2014 at 8:25 AM, Kevin Burg <kb...@foursquare.com> wrote:

> The idea with the fix is to read the slave's attributes right off the
> offer rather than going into 'AttributeStore' and keying on the slave's
> name. The slave's resources are read off the offer in this way, so I don't
> see why it can't be done with attributes as well.
>
> Someone who understands all the places where SchedulingFilter.filter is
> used might be able to fix this better than I can.
>
>
> On Wed, Jul 16, 2014 at 6:40 AM, Josh Adams <jo...@foursquare.com> wrote:
>
>> Hi there,
>>
>> Given that we would need to disrupt running jobs to add constraints in
>> the future we are blocking on
>> https://issues.apache.org/jira/browse/AURORA-582 before we can push any
>> of our services on to Aurora in production.
>>
>> Kevin Burg attempted to resolve the related bug
>> https://issues.apache.org/jira/browse/AURORA-328 by making some changes
>> here:
>> https://github.com/foursquare/incubator-aurora/commit/b1962fad3fe9ef76954fa107abed25d78b809331
>> but we seem to be getting a type mismatch when compiling the code.
>>
>> Any help and/or info on the bugfix progress would be much appreciated.
>> Aside from AURORA-582 we are ready to roll (pun intended!)
>>
>> Best,
>> Josh
>>
>>
>> On Mon, Jul 14, 2014 at 11:42 AM, Josh Adams <jo...@foursquare.com> wrote:
>>
>>> Ah, makes sense. We'll try that. Thanks for clarifying this Kevin.
>>>
>>> Josh
>>>
>>>
>>> On Mon, Jul 14, 2014 at 11:30 AM, Kevin Sweeney <ke...@apache.org>
>>> wrote:
>>>
>>>> Slaves persist their attributes (including attributes) across restarts
>>>> due to slave recovery (that's what allows you to upgrade mesos in-place
>>>> without killing the tasks they're managing). Unfortunately to change
>>>> attributes you need to remove persisted slave metadata (the "meta"
>>>> directory). This will kill all of a slave's underlying tasks but the newly
>>>> registered slave should have the correct attributes.
>>>>
>>>>
>>>> On Mon, Jul 14, 2014 at 11:26 AM, Kevin Burg <kb...@foursquare.com>
>>>> wrote:
>>>>
>>>>> I've confirmed by looking at that endpoint that new attributes are not
>>>>> being picked up and modified attributes are retaining their old values.
>>>>> This is after restarting both the slaves and the scheduler process.
>>>>>
>>>>>
>>>>> On Mon, Jul 14, 2014 at 11:09 AM, Josh Adams <jo...@foursquare.com>
>>>>> wrote:
>>>>>
>>>>> > Thanks Brian. Kevin should have some followup questions shortly.
>>>>> >
>>>>> > Josh
>>>>> >
>>>>> >
>>>>> > On Mon, Jul 14, 2014 at 10:37 AM, Brian Wickman <wi...@apache.org>
>>>>> > wrote:
>>>>> >
>>>>> >> host/rack should not be treated specially.
>>>>> >>
>>>>> >> If you go to the "/slaves" endpoint on the scheduler UI, what does
>>>>> it
>>>>> >> report as attributes being exported by your slaves?  You might want
>>>>> to
>>>>> >> validate there that the "staging" attribute got picked up properly.
>>>>>  If
>>>>> >> it's not getting picked up (e.g. the attributes are getting cached
>>>>> >> incorrectly by the scheduler?) then you should file an issue.
>>>>> >>
>>>>> >>
>>>>> >> On Fri, Jul 11, 2014 at 5:24 PM, Kevin Burg <kb...@foursquare.com>
>>>>> wrote:
>>>>> >>
>>>>> >>> Hi,
>>>>> >>>
>>>>> >>> I'm having trouble getting the task constraint resolver worker with
>>>>> >>> attributes other than 'host' and 'rack.' Are arbitrary attribute
>>>>> keys in
>>>>> >>> the mesos slaves supported currently?
>>>>> >>>
>>>>> >>> Here is the setup.
>>>>> >>>
>>>>> >>> The slaves are configured to run with
>>>>> >>> `--attributes=host:<host>;rack:<rack>;staging:true`
>>>>> >>>
>>>>> >>> (I've also tried this with staging:1, and staging:foo)
>>>>> >>>
>>>>> >>> The constraint generated from the .aurora config looks like the
>>>>> following
>>>>> >>> Constraint(name:staging, constraint:<TaskConstraint
>>>>> >>> value:ValueConstraint(negated:false, values:[true])>)
>>>>> >>>
>>>>> >>> The schedule request then gets vetoed with the following veto
>>>>> object:
>>>>> >>> Veto{reason=Constraint not satisfied: staging, score=1000,
>>>>> >>> valueMismatch=true}]
>>>>> >>>
>>>>> >>> The constraints generated for 'host' and 'rack' look identical
>>>>> except for
>>>>> >>> the different name of course. I've even tried bouncing every mesos
>>>>> and
>>>>> >>> aurora process on the machine to see if maybe stale attributes
>>>>> were being
>>>>> >>> assigned to the slaves. All the offers being made to the master
>>>>> look
>>>>> >>> correct though, which leads me to believe that the constraint
>>>>> solver just
>>>>> >>> doesn't work for arbitrary attributes.
>>>>> >>>
>>>>> >>> We would appreciate any help you can offer.
>>>>> >>>
>>>>> >>> Thanks,
>>>>> >>> Kevin
>>>>> >>>
>>>>> >>
>>>>> >>
>>>>> >
>>>>> >
>>>>> > --
>>>>> > ===============
>>>>> > josh adams
>>>>> > production engineer
>>>>> > foursquare
>>>>> >
>>>>> > (gv) 415-830-4106
>>>>> > ===============
>>>>> > foursquare.com/jobs
>>>>> >
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> ===============
>>> josh adams
>>> production engineer
>>> foursquare
>>>
>>> (gv) 415-830-4106
>>> ===============
>>> foursquare.com/jobs
>>>
>>
>>
>>
>> --
>> ===============
>> josh adams
>> production engineer
>> foursquare
>>
>> (gv) 415-830-4106
>> ===============
>> foursquare.com/jobs
>>
>
>


-- 
===============
josh adams
production engineer
foursquare

(gv) 415-830-4106
===============
foursquare.com/jobs

Re: Task Constraints

Posted by Kevin Burg <kb...@foursquare.com>.
The idea with the fix is to read the slave's attributes right off the offer
rather than going into 'AttributeStore' and keying on the slave's name. The
slave's resources are read off the offer in this way, so I don't see why it
can't be done with attributes as well.

Someone who understands all the places where SchedulingFilter.filter is
used might be able to fix this better than I can.


On Wed, Jul 16, 2014 at 6:40 AM, Josh Adams <jo...@foursquare.com> wrote:

> Hi there,
>
> Given that we would need to disrupt running jobs to add constraints in the
> future we are blocking on https://issues.apache.org/jira/browse/AURORA-582
> before we can push any of our services on to Aurora in production.
>
> Kevin Burg attempted to resolve the related bug
> https://issues.apache.org/jira/browse/AURORA-328 by making some changes
> here:
> https://github.com/foursquare/incubator-aurora/commit/b1962fad3fe9ef76954fa107abed25d78b809331
> but we seem to be getting a type mismatch when compiling the code.
>
> Any help and/or info on the bugfix progress would be much appreciated.
> Aside from AURORA-582 we are ready to roll (pun intended!)
>
> Best,
> Josh
>
>
> On Mon, Jul 14, 2014 at 11:42 AM, Josh Adams <jo...@foursquare.com> wrote:
>
>> Ah, makes sense. We'll try that. Thanks for clarifying this Kevin.
>>
>> Josh
>>
>>
>> On Mon, Jul 14, 2014 at 11:30 AM, Kevin Sweeney <ke...@apache.org>
>> wrote:
>>
>>> Slaves persist their attributes (including attributes) across restarts
>>> due to slave recovery (that's what allows you to upgrade mesos in-place
>>> without killing the tasks they're managing). Unfortunately to change
>>> attributes you need to remove persisted slave metadata (the "meta"
>>> directory). This will kill all of a slave's underlying tasks but the newly
>>> registered slave should have the correct attributes.
>>>
>>>
>>> On Mon, Jul 14, 2014 at 11:26 AM, Kevin Burg <kb...@foursquare.com>
>>> wrote:
>>>
>>>> I've confirmed by looking at that endpoint that new attributes are not
>>>> being picked up and modified attributes are retaining their old values.
>>>> This is after restarting both the slaves and the scheduler process.
>>>>
>>>>
>>>> On Mon, Jul 14, 2014 at 11:09 AM, Josh Adams <jo...@foursquare.com>
>>>> wrote:
>>>>
>>>> > Thanks Brian. Kevin should have some followup questions shortly.
>>>> >
>>>> > Josh
>>>> >
>>>> >
>>>> > On Mon, Jul 14, 2014 at 10:37 AM, Brian Wickman <wi...@apache.org>
>>>> > wrote:
>>>> >
>>>> >> host/rack should not be treated specially.
>>>> >>
>>>> >> If you go to the "/slaves" endpoint on the scheduler UI, what does it
>>>> >> report as attributes being exported by your slaves?  You might want
>>>> to
>>>> >> validate there that the "staging" attribute got picked up properly.
>>>>  If
>>>> >> it's not getting picked up (e.g. the attributes are getting cached
>>>> >> incorrectly by the scheduler?) then you should file an issue.
>>>> >>
>>>> >>
>>>> >> On Fri, Jul 11, 2014 at 5:24 PM, Kevin Burg <kb...@foursquare.com>
>>>> wrote:
>>>> >>
>>>> >>> Hi,
>>>> >>>
>>>> >>> I'm having trouble getting the task constraint resolver worker with
>>>> >>> attributes other than 'host' and 'rack.' Are arbitrary attribute
>>>> keys in
>>>> >>> the mesos slaves supported currently?
>>>> >>>
>>>> >>> Here is the setup.
>>>> >>>
>>>> >>> The slaves are configured to run with
>>>> >>> `--attributes=host:<host>;rack:<rack>;staging:true`
>>>> >>>
>>>> >>> (I've also tried this with staging:1, and staging:foo)
>>>> >>>
>>>> >>> The constraint generated from the .aurora config looks like the
>>>> following
>>>> >>> Constraint(name:staging, constraint:<TaskConstraint
>>>> >>> value:ValueConstraint(negated:false, values:[true])>)
>>>> >>>
>>>> >>> The schedule request then gets vetoed with the following veto
>>>> object:
>>>> >>> Veto{reason=Constraint not satisfied: staging, score=1000,
>>>> >>> valueMismatch=true}]
>>>> >>>
>>>> >>> The constraints generated for 'host' and 'rack' look identical
>>>> except for
>>>> >>> the different name of course. I've even tried bouncing every mesos
>>>> and
>>>> >>> aurora process on the machine to see if maybe stale attributes were
>>>> being
>>>> >>> assigned to the slaves. All the offers being made to the master look
>>>> >>> correct though, which leads me to believe that the constraint
>>>> solver just
>>>> >>> doesn't work for arbitrary attributes.
>>>> >>>
>>>> >>> We would appreciate any help you can offer.
>>>> >>>
>>>> >>> Thanks,
>>>> >>> Kevin
>>>> >>>
>>>> >>
>>>> >>
>>>> >
>>>> >
>>>> > --
>>>> > ===============
>>>> > josh adams
>>>> > production engineer
>>>> > foursquare
>>>> >
>>>> > (gv) 415-830-4106
>>>> > ===============
>>>> > foursquare.com/jobs
>>>> >
>>>>
>>>
>>>
>>
>>
>> --
>> ===============
>> josh adams
>> production engineer
>> foursquare
>>
>> (gv) 415-830-4106
>> ===============
>> foursquare.com/jobs
>>
>
>
>
> --
> ===============
> josh adams
> production engineer
> foursquare
>
> (gv) 415-830-4106
> ===============
> foursquare.com/jobs
>

Re: Task Constraints

Posted by Josh Adams <jo...@foursquare.com>.
Hi there,

Given that we would need to disrupt running jobs to add constraints in the
future we are blocking on https://issues.apache.org/jira/browse/AURORA-582
before we can push any of our services on to Aurora in production.

Kevin Burg attempted to resolve the related bug
https://issues.apache.org/jira/browse/AURORA-328 by making some changes
here:
https://github.com/foursquare/incubator-aurora/commit/b1962fad3fe9ef76954fa107abed25d78b809331
but we seem to be getting a type mismatch when compiling the code.

Any help and/or info on the bugfix progress would be much appreciated.
Aside from AURORA-582 we are ready to roll (pun intended!)

Best,
Josh


On Mon, Jul 14, 2014 at 11:42 AM, Josh Adams <jo...@foursquare.com> wrote:

> Ah, makes sense. We'll try that. Thanks for clarifying this Kevin.
>
> Josh
>
>
> On Mon, Jul 14, 2014 at 11:30 AM, Kevin Sweeney <ke...@apache.org>
> wrote:
>
>> Slaves persist their attributes (including attributes) across restarts
>> due to slave recovery (that's what allows you to upgrade mesos in-place
>> without killing the tasks they're managing). Unfortunately to change
>> attributes you need to remove persisted slave metadata (the "meta"
>> directory). This will kill all of a slave's underlying tasks but the newly
>> registered slave should have the correct attributes.
>>
>>
>> On Mon, Jul 14, 2014 at 11:26 AM, Kevin Burg <kb...@foursquare.com>
>> wrote:
>>
>>> I've confirmed by looking at that endpoint that new attributes are not
>>> being picked up and modified attributes are retaining their old values.
>>> This is after restarting both the slaves and the scheduler process.
>>>
>>>
>>> On Mon, Jul 14, 2014 at 11:09 AM, Josh Adams <jo...@foursquare.com>
>>> wrote:
>>>
>>> > Thanks Brian. Kevin should have some followup questions shortly.
>>> >
>>> > Josh
>>> >
>>> >
>>> > On Mon, Jul 14, 2014 at 10:37 AM, Brian Wickman <wi...@apache.org>
>>> > wrote:
>>> >
>>> >> host/rack should not be treated specially.
>>> >>
>>> >> If you go to the "/slaves" endpoint on the scheduler UI, what does it
>>> >> report as attributes being exported by your slaves?  You might want to
>>> >> validate there that the "staging" attribute got picked up properly.
>>>  If
>>> >> it's not getting picked up (e.g. the attributes are getting cached
>>> >> incorrectly by the scheduler?) then you should file an issue.
>>> >>
>>> >>
>>> >> On Fri, Jul 11, 2014 at 5:24 PM, Kevin Burg <kb...@foursquare.com>
>>> wrote:
>>> >>
>>> >>> Hi,
>>> >>>
>>> >>> I'm having trouble getting the task constraint resolver worker with
>>> >>> attributes other than 'host' and 'rack.' Are arbitrary attribute
>>> keys in
>>> >>> the mesos slaves supported currently?
>>> >>>
>>> >>> Here is the setup.
>>> >>>
>>> >>> The slaves are configured to run with
>>> >>> `--attributes=host:<host>;rack:<rack>;staging:true`
>>> >>>
>>> >>> (I've also tried this with staging:1, and staging:foo)
>>> >>>
>>> >>> The constraint generated from the .aurora config looks like the
>>> following
>>> >>> Constraint(name:staging, constraint:<TaskConstraint
>>> >>> value:ValueConstraint(negated:false, values:[true])>)
>>> >>>
>>> >>> The schedule request then gets vetoed with the following veto object:
>>> >>> Veto{reason=Constraint not satisfied: staging, score=1000,
>>> >>> valueMismatch=true}]
>>> >>>
>>> >>> The constraints generated for 'host' and 'rack' look identical
>>> except for
>>> >>> the different name of course. I've even tried bouncing every mesos
>>> and
>>> >>> aurora process on the machine to see if maybe stale attributes were
>>> being
>>> >>> assigned to the slaves. All the offers being made to the master look
>>> >>> correct though, which leads me to believe that the constraint solver
>>> just
>>> >>> doesn't work for arbitrary attributes.
>>> >>>
>>> >>> We would appreciate any help you can offer.
>>> >>>
>>> >>> Thanks,
>>> >>> Kevin
>>> >>>
>>> >>
>>> >>
>>> >
>>> >
>>> > --
>>> > ===============
>>> > josh adams
>>> > production engineer
>>> > foursquare
>>> >
>>> > (gv) 415-830-4106
>>> > ===============
>>> > foursquare.com/jobs
>>> >
>>>
>>
>>
>
>
> --
> ===============
> josh adams
> production engineer
> foursquare
>
> (gv) 415-830-4106
> ===============
> foursquare.com/jobs
>



-- 
===============
josh adams
production engineer
foursquare

(gv) 415-830-4106
===============
foursquare.com/jobs

Re: Task Constraints

Posted by Kevin Burg <kb...@foursquare.com>.
Removing the meta directory does not fix the issue. Upon further
inspection, the scheduler seems to be using very old slave ids. These slave
ids aren't even in "mesos/slave/workdir/slaves" anymore. I should add that
the "/offers" endpoint on the scheduler shows all the up to date
information including correct slave_ids and attributes.

The slaves are not failing and logging during any of these attribute
changes.


On Mon, Jul 14, 2014 at 12:12 PM, Bill Farner <wf...@apache.org> wrote:

> However, the slave should be failing and logging this (rather than
> silently working with old attributes).  If you find otherwise, you should
> file a bug against mesos.
>
>
> On Monday, July 14, 2014, Josh Adams <jo...@foursquare.com> wrote:
>
>> Ah, makes sense. We'll try that. Thanks for clarifying this Kevin.
>>
>> Josh
>>
>>
>> On Mon, Jul 14, 2014 at 11:30 AM, Kevin Sweeney <ke...@apache.org>
>> wrote:
>>
>> > Slaves persist their attributes (including attributes) across restarts
>> due
>> > to slave recovery (that's what allows you to upgrade mesos in-place
>> without
>> > killing the tasks they're managing). Unfortunately to change attributes
>> you
>> > need to remove persisted slave metadata (the "meta" directory). This
>> will
>> > kill all of a slave's underlying tasks but the newly registered slave
>> > should have the correct attributes.
>> >
>> >
>> > On Mon, Jul 14, 2014 at 11:26 AM, Kevin Burg <kb...@foursquare.com>
>> wrote:
>> >
>> >> I've confirmed by looking at that endpoint that new attributes are not
>> >> being picked up and modified attributes are retaining their old values.
>> >> This is after restarting both the slaves and the scheduler process.
>> >>
>> >>
>> >> On Mon, Jul 14, 2014 at 11:09 AM, Josh Adams <jo...@foursquare.com>
>> wrote:
>> >>
>> >> > Thanks Brian. Kevin should have some followup questions shortly.
>> >> >
>> >> > Josh
>> >> >
>> >> >
>> >> > On Mon, Jul 14, 2014 at 10:37 AM, Brian Wickman <wi...@apache.org>
>> >> > wrote:
>> >> >
>> >> >> host/rack should not be treated specially.
>> >> >>
>> >> >> If you go to the "/slaves" endpoint on the scheduler UI, what does
>> it
>> >> >> report as attributes being exported by your slaves?  You might want
>> to
>> >> >> validate there that the "staging" attribute got picked up properly.
>>  If
>> >> >> it's not getting picked up (e.g. the attributes are getting cached
>> >> >> incorrectly by the scheduler?) then you should file an issue.
>> >> >>
>> >> >>
>> >> >> On Fri, Jul 11, 2014 at 5:24 PM, Kevin Burg <kb...@foursquare.com>
>> >> wrote:
>> >> >>
>> >> >>> Hi,
>> >> >>>
>> >> >>> I'm having trouble getting the task constraint resolver worker with
>> >> >>> attributes other than 'host' and 'rack.' Are arbitrary attribute
>> keys
>> >> in
>> >> >>> the mesos slaves supported currently?
>> >> >>>
>> >> >>> Here is the setup.
>> >> >>>
>> >> >>> The slaves are configured to run with
>> >> >>> `--attributes=host:<host>;rack:<rack>;staging:true`
>> >> >>>
>> >> >>> (I've also tried this with staging:1, and staging:foo)
>> >> >>>
>> >> >>> The constraint generated from the .aurora config looks like the
>> >> following
>> >> >>> Constraint(name:staging, constraint:<TaskConstraint
>> >> >>> value:ValueConstraint(negated:false, values:[true])>)
>> >> >>>
>> >> >>> The schedule request then gets vetoed with the following veto
>> object:
>> >> >>> Veto{reason=Constraint not satisfied: staging, score=1000,
>> >> >>> valueMismatch=true}]
>> >> >>>
>> >> >>> The constraints generated for 'host' and 'rack' look identical
>> except
>> >> for
>> >> >>> the different name of course. I've even tried bouncing every mesos
>> and
>> >> >>> aurora process on the machine to see if maybe stale attributes were
>> >> being
>> >> >>> assigned to the slaves. All the offers being made to the master
>> look
>> >> >>> correct though, which leads me to believe that the constraint
>> solver
>> >> just
>> >> >>> doesn't work for arbitrary attributes.
>> >> >>>
>> >> >>> We would appreciate any help you can offer.
>> >> >>>
>> >> >>> Thanks,
>> >> >>> Kevin
>> >> >>>
>> >> >>
>> >> >>
>> >> >
>> >> >
>> >> > --
>> >> > ===============
>> >> > josh adams
>> >> > production engineer
>> >> > foursquare
>> >> >
>> >> > (gv) 415-830-4106
>> >> > ===============
>> >> > foursquare.com/jobs
>> >> >
>> >>
>> >
>> >
>>
>>
>> --
>> ===============
>> josh adams
>> production engineer
>> foursquare
>>
>> (gv) 415-830-4106
>> ===============
>> foursquare.com/jobs
>>
>
>
> --
> -=Bill
>
>

Re: Task Constraints

Posted by Bill Farner <wf...@apache.org>.
However, the slave should be failing and logging this (rather than silently
working with old attributes).  If you find otherwise, you should file a bug
against mesos.

On Monday, July 14, 2014, Josh Adams <jo...@foursquare.com> wrote:

> Ah, makes sense. We'll try that. Thanks for clarifying this Kevin.
>
> Josh
>
>
> On Mon, Jul 14, 2014 at 11:30 AM, Kevin Sweeney <kevints@apache.org
> <javascript:;>> wrote:
>
> > Slaves persist their attributes (including attributes) across restarts
> due
> > to slave recovery (that's what allows you to upgrade mesos in-place
> without
> > killing the tasks they're managing). Unfortunately to change attributes
> you
> > need to remove persisted slave metadata (the "meta" directory). This will
> > kill all of a slave's underlying tasks but the newly registered slave
> > should have the correct attributes.
> >
> >
> > On Mon, Jul 14, 2014 at 11:26 AM, Kevin Burg <kburg@foursquare.com
> <javascript:;>> wrote:
> >
> >> I've confirmed by looking at that endpoint that new attributes are not
> >> being picked up and modified attributes are retaining their old values.
> >> This is after restarting both the slaves and the scheduler process.
> >>
> >>
> >> On Mon, Jul 14, 2014 at 11:09 AM, Josh Adams <josh@foursquare.com
> <javascript:;>> wrote:
> >>
> >> > Thanks Brian. Kevin should have some followup questions shortly.
> >> >
> >> > Josh
> >> >
> >> >
> >> > On Mon, Jul 14, 2014 at 10:37 AM, Brian Wickman <wickman@apache.org
> <javascript:;>>
> >> > wrote:
> >> >
> >> >> host/rack should not be treated specially.
> >> >>
> >> >> If you go to the "/slaves" endpoint on the scheduler UI, what does it
> >> >> report as attributes being exported by your slaves?  You might want
> to
> >> >> validate there that the "staging" attribute got picked up properly.
>  If
> >> >> it's not getting picked up (e.g. the attributes are getting cached
> >> >> incorrectly by the scheduler?) then you should file an issue.
> >> >>
> >> >>
> >> >> On Fri, Jul 11, 2014 at 5:24 PM, Kevin Burg <kburg@foursquare.com
> <javascript:;>>
> >> wrote:
> >> >>
> >> >>> Hi,
> >> >>>
> >> >>> I'm having trouble getting the task constraint resolver worker with
> >> >>> attributes other than 'host' and 'rack.' Are arbitrary attribute
> keys
> >> in
> >> >>> the mesos slaves supported currently?
> >> >>>
> >> >>> Here is the setup.
> >> >>>
> >> >>> The slaves are configured to run with
> >> >>> `--attributes=host:<host>;rack:<rack>;staging:true`
> >> >>>
> >> >>> (I've also tried this with staging:1, and staging:foo)
> >> >>>
> >> >>> The constraint generated from the .aurora config looks like the
> >> following
> >> >>> Constraint(name:staging, constraint:<TaskConstraint
> >> >>> value:ValueConstraint(negated:false, values:[true])>)
> >> >>>
> >> >>> The schedule request then gets vetoed with the following veto
> object:
> >> >>> Veto{reason=Constraint not satisfied: staging, score=1000,
> >> >>> valueMismatch=true}]
> >> >>>
> >> >>> The constraints generated for 'host' and 'rack' look identical
> except
> >> for
> >> >>> the different name of course. I've even tried bouncing every mesos
> and
> >> >>> aurora process on the machine to see if maybe stale attributes were
> >> being
> >> >>> assigned to the slaves. All the offers being made to the master look
> >> >>> correct though, which leads me to believe that the constraint solver
> >> just
> >> >>> doesn't work for arbitrary attributes.
> >> >>>
> >> >>> We would appreciate any help you can offer.
> >> >>>
> >> >>> Thanks,
> >> >>> Kevin
> >> >>>
> >> >>
> >> >>
> >> >
> >> >
> >> > --
> >> > ===============
> >> > josh adams
> >> > production engineer
> >> > foursquare
> >> >
> >> > (gv) 415-830-4106
> >> > ===============
> >> > foursquare.com/jobs
> >> >
> >>
> >
> >
>
>
> --
> ===============
> josh adams
> production engineer
> foursquare
>
> (gv) 415-830-4106
> ===============
> foursquare.com/jobs
>


-- 
-=Bill

Re: Task Constraints

Posted by Josh Adams <jo...@foursquare.com>.
Ah, makes sense. We'll try that. Thanks for clarifying this Kevin.

Josh


On Mon, Jul 14, 2014 at 11:30 AM, Kevin Sweeney <ke...@apache.org> wrote:

> Slaves persist their attributes (including attributes) across restarts due
> to slave recovery (that's what allows you to upgrade mesos in-place without
> killing the tasks they're managing). Unfortunately to change attributes you
> need to remove persisted slave metadata (the "meta" directory). This will
> kill all of a slave's underlying tasks but the newly registered slave
> should have the correct attributes.
>
>
> On Mon, Jul 14, 2014 at 11:26 AM, Kevin Burg <kb...@foursquare.com> wrote:
>
>> I've confirmed by looking at that endpoint that new attributes are not
>> being picked up and modified attributes are retaining their old values.
>> This is after restarting both the slaves and the scheduler process.
>>
>>
>> On Mon, Jul 14, 2014 at 11:09 AM, Josh Adams <jo...@foursquare.com> wrote:
>>
>> > Thanks Brian. Kevin should have some followup questions shortly.
>> >
>> > Josh
>> >
>> >
>> > On Mon, Jul 14, 2014 at 10:37 AM, Brian Wickman <wi...@apache.org>
>> > wrote:
>> >
>> >> host/rack should not be treated specially.
>> >>
>> >> If you go to the "/slaves" endpoint on the scheduler UI, what does it
>> >> report as attributes being exported by your slaves?  You might want to
>> >> validate there that the "staging" attribute got picked up properly.  If
>> >> it's not getting picked up (e.g. the attributes are getting cached
>> >> incorrectly by the scheduler?) then you should file an issue.
>> >>
>> >>
>> >> On Fri, Jul 11, 2014 at 5:24 PM, Kevin Burg <kb...@foursquare.com>
>> wrote:
>> >>
>> >>> Hi,
>> >>>
>> >>> I'm having trouble getting the task constraint resolver worker with
>> >>> attributes other than 'host' and 'rack.' Are arbitrary attribute keys
>> in
>> >>> the mesos slaves supported currently?
>> >>>
>> >>> Here is the setup.
>> >>>
>> >>> The slaves are configured to run with
>> >>> `--attributes=host:<host>;rack:<rack>;staging:true`
>> >>>
>> >>> (I've also tried this with staging:1, and staging:foo)
>> >>>
>> >>> The constraint generated from the .aurora config looks like the
>> following
>> >>> Constraint(name:staging, constraint:<TaskConstraint
>> >>> value:ValueConstraint(negated:false, values:[true])>)
>> >>>
>> >>> The schedule request then gets vetoed with the following veto object:
>> >>> Veto{reason=Constraint not satisfied: staging, score=1000,
>> >>> valueMismatch=true}]
>> >>>
>> >>> The constraints generated for 'host' and 'rack' look identical except
>> for
>> >>> the different name of course. I've even tried bouncing every mesos and
>> >>> aurora process on the machine to see if maybe stale attributes were
>> being
>> >>> assigned to the slaves. All the offers being made to the master look
>> >>> correct though, which leads me to believe that the constraint solver
>> just
>> >>> doesn't work for arbitrary attributes.
>> >>>
>> >>> We would appreciate any help you can offer.
>> >>>
>> >>> Thanks,
>> >>> Kevin
>> >>>
>> >>
>> >>
>> >
>> >
>> > --
>> > ===============
>> > josh adams
>> > production engineer
>> > foursquare
>> >
>> > (gv) 415-830-4106
>> > ===============
>> > foursquare.com/jobs
>> >
>>
>
>


-- 
===============
josh adams
production engineer
foursquare

(gv) 415-830-4106
===============
foursquare.com/jobs

Re: Task Constraints

Posted by Kevin Sweeney <ke...@apache.org>.
Slaves persist their attributes (including attributes) across restarts due
to slave recovery (that's what allows you to upgrade mesos in-place without
killing the tasks they're managing). Unfortunately to change attributes you
need to remove persisted slave metadata (the "meta" directory). This will
kill all of a slave's underlying tasks but the newly registered slave
should have the correct attributes.


On Mon, Jul 14, 2014 at 11:26 AM, Kevin Burg <kb...@foursquare.com> wrote:

> I've confirmed by looking at that endpoint that new attributes are not
> being picked up and modified attributes are retaining their old values.
> This is after restarting both the slaves and the scheduler process.
>
>
> On Mon, Jul 14, 2014 at 11:09 AM, Josh Adams <jo...@foursquare.com> wrote:
>
> > Thanks Brian. Kevin should have some followup questions shortly.
> >
> > Josh
> >
> >
> > On Mon, Jul 14, 2014 at 10:37 AM, Brian Wickman <wi...@apache.org>
> > wrote:
> >
> >> host/rack should not be treated specially.
> >>
> >> If you go to the "/slaves" endpoint on the scheduler UI, what does it
> >> report as attributes being exported by your slaves?  You might want to
> >> validate there that the "staging" attribute got picked up properly.  If
> >> it's not getting picked up (e.g. the attributes are getting cached
> >> incorrectly by the scheduler?) then you should file an issue.
> >>
> >>
> >> On Fri, Jul 11, 2014 at 5:24 PM, Kevin Burg <kb...@foursquare.com>
> wrote:
> >>
> >>> Hi,
> >>>
> >>> I'm having trouble getting the task constraint resolver worker with
> >>> attributes other than 'host' and 'rack.' Are arbitrary attribute keys
> in
> >>> the mesos slaves supported currently?
> >>>
> >>> Here is the setup.
> >>>
> >>> The slaves are configured to run with
> >>> `--attributes=host:<host>;rack:<rack>;staging:true`
> >>>
> >>> (I've also tried this with staging:1, and staging:foo)
> >>>
> >>> The constraint generated from the .aurora config looks like the
> following
> >>> Constraint(name:staging, constraint:<TaskConstraint
> >>> value:ValueConstraint(negated:false, values:[true])>)
> >>>
> >>> The schedule request then gets vetoed with the following veto object:
> >>> Veto{reason=Constraint not satisfied: staging, score=1000,
> >>> valueMismatch=true}]
> >>>
> >>> The constraints generated for 'host' and 'rack' look identical except
> for
> >>> the different name of course. I've even tried bouncing every mesos and
> >>> aurora process on the machine to see if maybe stale attributes were
> being
> >>> assigned to the slaves. All the offers being made to the master look
> >>> correct though, which leads me to believe that the constraint solver
> just
> >>> doesn't work for arbitrary attributes.
> >>>
> >>> We would appreciate any help you can offer.
> >>>
> >>> Thanks,
> >>> Kevin
> >>>
> >>
> >>
> >
> >
> > --
> > ===============
> > josh adams
> > production engineer
> > foursquare
> >
> > (gv) 415-830-4106
> > ===============
> > foursquare.com/jobs
> >
>

Re: Task Constraints

Posted by Kevin Burg <kb...@foursquare.com>.
I've confirmed by looking at that endpoint that new attributes are not
being picked up and modified attributes are retaining their old values.
This is after restarting both the slaves and the scheduler process.


On Mon, Jul 14, 2014 at 11:09 AM, Josh Adams <jo...@foursquare.com> wrote:

> Thanks Brian. Kevin should have some followup questions shortly.
>
> Josh
>
>
> On Mon, Jul 14, 2014 at 10:37 AM, Brian Wickman <wi...@apache.org>
> wrote:
>
>> host/rack should not be treated specially.
>>
>> If you go to the "/slaves" endpoint on the scheduler UI, what does it
>> report as attributes being exported by your slaves?  You might want to
>> validate there that the "staging" attribute got picked up properly.  If
>> it's not getting picked up (e.g. the attributes are getting cached
>> incorrectly by the scheduler?) then you should file an issue.
>>
>>
>> On Fri, Jul 11, 2014 at 5:24 PM, Kevin Burg <kb...@foursquare.com> wrote:
>>
>>> Hi,
>>>
>>> I'm having trouble getting the task constraint resolver worker with
>>> attributes other than 'host' and 'rack.' Are arbitrary attribute keys in
>>> the mesos slaves supported currently?
>>>
>>> Here is the setup.
>>>
>>> The slaves are configured to run with
>>> `--attributes=host:<host>;rack:<rack>;staging:true`
>>>
>>> (I've also tried this with staging:1, and staging:foo)
>>>
>>> The constraint generated from the .aurora config looks like the following
>>> Constraint(name:staging, constraint:<TaskConstraint
>>> value:ValueConstraint(negated:false, values:[true])>)
>>>
>>> The schedule request then gets vetoed with the following veto object:
>>> Veto{reason=Constraint not satisfied: staging, score=1000,
>>> valueMismatch=true}]
>>>
>>> The constraints generated for 'host' and 'rack' look identical except for
>>> the different name of course. I've even tried bouncing every mesos and
>>> aurora process on the machine to see if maybe stale attributes were being
>>> assigned to the slaves. All the offers being made to the master look
>>> correct though, which leads me to believe that the constraint solver just
>>> doesn't work for arbitrary attributes.
>>>
>>> We would appreciate any help you can offer.
>>>
>>> Thanks,
>>> Kevin
>>>
>>
>>
>
>
> --
> ===============
> josh adams
> production engineer
> foursquare
>
> (gv) 415-830-4106
> ===============
> foursquare.com/jobs
>

Re: Task Constraints

Posted by Josh Adams <jo...@foursquare.com>.
Thanks Brian. Kevin should have some followup questions shortly.

Josh


On Mon, Jul 14, 2014 at 10:37 AM, Brian Wickman <wi...@apache.org> wrote:

> host/rack should not be treated specially.
>
> If you go to the "/slaves" endpoint on the scheduler UI, what does it
> report as attributes being exported by your slaves?  You might want to
> validate there that the "staging" attribute got picked up properly.  If
> it's not getting picked up (e.g. the attributes are getting cached
> incorrectly by the scheduler?) then you should file an issue.
>
>
> On Fri, Jul 11, 2014 at 5:24 PM, Kevin Burg <kb...@foursquare.com> wrote:
>
>> Hi,
>>
>> I'm having trouble getting the task constraint resolver worker with
>> attributes other than 'host' and 'rack.' Are arbitrary attribute keys in
>> the mesos slaves supported currently?
>>
>> Here is the setup.
>>
>> The slaves are configured to run with
>> `--attributes=host:<host>;rack:<rack>;staging:true`
>>
>> (I've also tried this with staging:1, and staging:foo)
>>
>> The constraint generated from the .aurora config looks like the following
>> Constraint(name:staging, constraint:<TaskConstraint
>> value:ValueConstraint(negated:false, values:[true])>)
>>
>> The schedule request then gets vetoed with the following veto object:
>> Veto{reason=Constraint not satisfied: staging, score=1000,
>> valueMismatch=true}]
>>
>> The constraints generated for 'host' and 'rack' look identical except for
>> the different name of course. I've even tried bouncing every mesos and
>> aurora process on the machine to see if maybe stale attributes were being
>> assigned to the slaves. All the offers being made to the master look
>> correct though, which leads me to believe that the constraint solver just
>> doesn't work for arbitrary attributes.
>>
>> We would appreciate any help you can offer.
>>
>> Thanks,
>> Kevin
>>
>
>


-- 
===============
josh adams
production engineer
foursquare

(gv) 415-830-4106
===============
foursquare.com/jobs

Re: Task Constraints

Posted by Brian Wickman <wi...@apache.org>.
host/rack should not be treated specially.

If you go to the "/slaves" endpoint on the scheduler UI, what does it
report as attributes being exported by your slaves?  You might want to
validate there that the "staging" attribute got picked up properly.  If
it's not getting picked up (e.g. the attributes are getting cached
incorrectly by the scheduler?) then you should file an issue.


On Fri, Jul 11, 2014 at 5:24 PM, Kevin Burg <kb...@foursquare.com> wrote:

> Hi,
>
> I'm having trouble getting the task constraint resolver worker with
> attributes other than 'host' and 'rack.' Are arbitrary attribute keys in
> the mesos slaves supported currently?
>
> Here is the setup.
>
> The slaves are configured to run with
> `--attributes=host:<host>;rack:<rack>;staging:true`
>
> (I've also tried this with staging:1, and staging:foo)
>
> The constraint generated from the .aurora config looks like the following
> Constraint(name:staging, constraint:<TaskConstraint
> value:ValueConstraint(negated:false, values:[true])>)
>
> The schedule request then gets vetoed with the following veto object:
> Veto{reason=Constraint not satisfied: staging, score=1000,
> valueMismatch=true}]
>
> The constraints generated for 'host' and 'rack' look identical except for
> the different name of course. I've even tried bouncing every mesos and
> aurora process on the machine to see if maybe stale attributes were being
> assigned to the slaves. All the offers being made to the master look
> correct though, which leads me to believe that the constraint solver just
> doesn't work for arbitrary attributes.
>
> We would appreciate any help you can offer.
>
> Thanks,
> Kevin
>