You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mesos.apache.org by Jeff Schroeder <je...@computer.org> on 2015/05/08 21:45:21 UTC

Failed to perform slave recovery after adding or changing attributes

It took me a bit of time to hunt down the reason why none of my mesos
slaves (0.22  e890e24) would start after trying to add some attributes to
their config to run aurora.

I was starting the slave first with:

MESOS_attributes='host:nj1-test10'

And also tried starting it with:

MESOS_attributes='host:nj1-test10;rack:nj1-209.43'

The slave was popping with errors that looked like this:
https://gist.github.com/SEJeff/76b7ec29533097dd21c4

Now is it expected that the slave simply won't start when you add or
remove/change attributes? This makes configuring slaves a good bit more
difficult if we need to programatically roll out changes to all of our
mesos infrastructure via config management (I use salt).

This is more of me asking if there is a way for things to work a bit
better, and if not, I'd like to discuss a feature enhancement to make it
better.

-- 
Jeff Schroeder

Don't drink and derive, alcohol and analysis don't mix.
http://www.digitalprognosis.com

Re: Failed to perform slave recovery after adding or changing attributes

Posted by Vinod Kone <vi...@apache.org>.
You are correct. Any updates to any fields in SlaveInfo (including
resources and attributes) are considered incompatible with regards to slave
recovery. The ticket to add smarts to this algorithm is here: MESOS-1739
<https://issues.apache.org/jira/browse/MESOS-1739>. Unfortunately, no one I
know is currently working on it.

On Fri, May 8, 2015 at 12:45 PM, Jeff Schroeder <je...@computer.org>
wrote:

> It took me a bit of time to hunt down the reason why none of my mesos
> slaves (0.22  e890e24) would start after trying to add some attributes to
> their config to run aurora.
>
> I was starting the slave first with:
>
> MESOS_attributes='host:nj1-test10'
>
> And also tried starting it with:
>
> MESOS_attributes='host:nj1-test10;rack:nj1-209.43'
>
> The slave was popping with errors that looked like this:
> https://gist.github.com/SEJeff/76b7ec29533097dd21c4
>
> Now is it expected that the slave simply won't start when you add or
> remove/change attributes? This makes configuring slaves a good bit more
> difficult if we need to programatically roll out changes to all of our
> mesos infrastructure via config management (I use salt).
>
> This is more of me asking if there is a way for things to work a bit
> better, and if not, I'd like to discuss a feature enhancement to make it
> better.
>
> --
> Jeff Schroeder
>
> Don't drink and derive, alcohol and analysis don't mix.
> http://www.digitalprognosis.com
>