You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Andrei Sekretenko <as...@mesosphere.io> on 2019/05/02 15:58:15 UTC

Re: Review Request 70530: Refactored FrameworkInfo updates for the UPDATE_FRAMEWORK call + race fix.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70530/
-----------------------------------------------------------

(Updated May 2, 2019, 3:58 p.m.)


Review request for mesos and Benjamin Mahler.


Summary (updated)
-----------------

Refactored FrameworkInfo updates for the UPDATE_FRAMEWORK call + race fix.


Bugs: MESOS-7258
    https://issues.apache.org/jira/browse/MESOS-7258


Repository: mesos


Description (updated)
-------

The main concern of this patch is the behaviour of re-subscription and the new UPDATE_FRAMEWORK call when the framework attempts to change the "immutable" fields of  FrameworkInfo, namely `user`, `checkpoint`, `principal` and `id`.

The purposes of this patch are:
- to make the `Framework::update()` method simpler by removing the logic of ignoring the immutable fields
- to make `Master::validateFrameworkSubscription()` return validation errors on attempt to change `user` and `checkpoint` fields (needed for the new UPDATE_FRAMEWORK call)
- at the same time, to preserve the legacy behaviour of silently ignoring `user` and `checkpoint` on re-subscription
- to fix the already existing race between two re-subscriptions against an empty master

Changes made by this patch:
- `Framework::update()` now fails the program when the new values for the immutable fields differ from the old ones
- a function `validation::framework::validateUpdate()` is introduced for validating immutable fields against their old values
- a method `Master::sanitizeFrameworkSubscription` is introduced to preserve the legacy behaviour of re-subscription
- `validateFrameworkSubscription()` is called on subscription once again after authorization.

NOTE: currently there is a race between two different concurrent SUBSCRIBE calls, which might arise due to network partiotioning of a (buggy) framework.
The origin of the race is the non-atomic nature of the resubscription: Validate - Authorize (by another actor)- UpdateFramework(deferred).

The obvious case is an attempt to re-subscribe with two different principals against an empty (for example, failed over) master. 
If the scheduler A succeeds in setting `principal` between authorization and update by the scheduler B, the scheduler B will have its `principal` silently ignored (instead of getting an error).
Changing the `Framework::update()` from silently ignoring immutable fields to CHECKing them exacerbates the race: now it is a way to crash the master. 

To remove this race I'm simply adding the second validation run after authorization. 
A proper fix would probably require splitting the validation into two parts: checks required for the authorizer(s) to work correctly and checks required for everything else.


Diffs (updated)
-----

  src/master/framework.cpp 05f5514c589b2dba08afe77281e5fbc4e29f232b 
  src/master/master.hpp 7d9732f1e432f6f0290d234242864cbdbf381fa8 
  src/master/master.cpp a8ee6297e1587c160a47b806914d3c3aa4f35cd7 
  src/master/validation.hpp 95638a17052ece6c957aa76e4cead8d7bfe82024 
  src/master/validation.cpp 9fb0850987ce385d345302cac9721adead7181b8 


Diff: https://reviews.apache.org/r/70530/diff/3/

Changes: https://reviews.apache.org/r/70530/diff/2-3/


Testing
-------


Thanks,

Andrei Sekretenko


Re: Review Request 70530: Refactored Framework updates for the UPDATE_FRAMEWORK call; fixed race.

Posted by Benjamin Mahler <bm...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70530/#review215203
-----------------------------------------------------------



Thanks for the description! Overall this is looking pretty good, some suggestions:

* In general we try to aim for small independent patches, as it makes reviewing a lot easier. Can you pull out the validateUpdate call into it's own patch in front of this one? Also, a patch after it to unit test it? That will help us land the work incrementally and overall more quickly. I think we would stil use this even with the next suggestion:
* The sanitize function seems rather unfortunate: (1) It depends on the master state in a non-trivial way (i.e. it's not provided the new vs old, it has to go and find the old one itself) and (2) it doesn't seem intuitive for a reader to know what sanitization is in this context (I usually see "sanitization" used to refer to cleaning out sensitive data, one example being ip addresses and job names from logs). I think a reader would more find it more intuitive if updating returns errors, and then those errors are ignored for SUBSCRIBE and not ignored for UPDATE_FRAMEWORK. This would probably mean that validation would be done within the update call, and the error surfaced up to the caller.

A minor follow up:

* Framework::update rather inefficiently handles the roles, and there may be a large number of roles and a somewhat frequent updating of the framework info in the future. Feel free to write a patch at the end of this chain that avoids all the unnecessary copying and set contruction, as well as moves from the passed in FrameworkInfo if possible.


src/master/master.cpp
Line 2578 (original), 2621 (patched)
<https://reviews.apache.org/r/70530/#comment301775>

    newline bewteen the if and the assignment
    
    space between ) and {



src/master/master.cpp
Line 2578 (original), 2621 (patched)
<https://reviews.apache.org/r/70530/#comment301776>

    newline bewteen the if and the assignment



src/master/master.cpp
Line 2851 (original), 2908 (patched)
<https://reviews.apache.org/r/70530/#comment301774>

    just one newline here



src/master/master.cpp
Line 2976 (original), 3033 (patched)
<https://reviews.apache.org/r/70530/#comment301773>

    newline between the brace and the comment



src/master/validation.cpp
Lines 574 (patched)
<https://reviews.apache.org/r/70530/#comment301772>

    the braces here don't seem warranted



src/master/validation.cpp
Lines 577 (patched)
<https://reviews.apache.org/r/70530/#comment301770>

    inconsistent indentation?



src/master/validation.cpp
Lines 586-591 (patched)
<https://reviews.apache.org/r/70530/#comment301771>

    How about we just return this in the error message for all cases? e.g.:
    
    Caller: "Invalid framework update for <FID>: " + ...
    This function: "Updating 'FrameworkInfo.user' is unsupported; attempted to update from 'root' to 'nobody'"
    
    That way the caller can just log the error and we don't need to have special warning logging inside this function?


- Benjamin Mahler


On May 10, 2019, 2:50 p.m., Andrei Sekretenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70530/
> -----------------------------------------------------------
> 
> (Updated May 10, 2019, 2:50 p.m.)
> 
> 
> Review request for mesos and Benjamin Mahler.
> 
> 
> Bugs: MESOS-7258
>     https://issues.apache.org/jira/browse/MESOS-7258
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The main concern of this patch is the behaviour of the framework
> re-subscription and of the upcoming UPDATE_FRAMEWORK call in cases when
> the framework attempts to change the "immutable" fields of
> FrameworkInfo, namely: `user`, `checkpoint`, `principal` and `id`.
> 
> The changes introduced by this patch:
> - The method `Framework::update()` is simplified: the logic of ignoring
>   the immutable fields is removed, this method fails the program if the
>   new values differ from the old ones. This is needed to simplify
>   keeping UPDATE_FRAMEWORK consistent with re-subscription.
> - The method `Master::validateFrameworkSubscription()` now returns
>   validation errors on attempts to change `user` or `checkpoint` fields.
>   This is needed to enable validating them in the UPDATE_FRAMEWORK call.
> - A method `Master::sanitizeFrameworkSubscription()` is introduced to
>   preserve the legacy behaviour of re-subscription (which ignores the
>   updates of `user` and `checkpoint`).
> - A second call of `validateFrameworkSubscription()` is performed after
>   the authorization. This is a crude fix of the already existing race
>   between two re-subscriptions against an empty master (see MESOS-9763).
>   It is necessary because otherwise the change in `Framework::update()`
>   would exacerbate this race (namely, it would be possible to crash the
>   master).
> 
> 
> Diffs
> -----
> 
>   src/master/framework.cpp 05f5514c589b2dba08afe77281e5fbc4e29f232b 
>   src/master/master.hpp 7d9732f1e432f6f0290d234242864cbdbf381fa8 
>   src/master/master.cpp a8ee6297e1587c160a47b806914d3c3aa4f35cd7 
>   src/master/validation.hpp 95638a17052ece6c957aa76e4cead8d7bfe82024 
>   src/master/validation.cpp 9fb0850987ce385d345302cac9721adead7181b8 
> 
> 
> Diff: https://reviews.apache.org/r/70530/diff/4/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Andrei Sekretenko
> 
>


Re: Review Request 70530: Refactored Framework updates for the UPDATE_FRAMEWORK call; fixed race.

Posted by Mesos Reviewbot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70530/#review215329
-----------------------------------------------------------



Patch looks great!

Reviews applied: [70583, 70531, 70530]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' CONFIGURATION='--verbose --disable-libtool-wrappers --disable-parallel-test-execution' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On May 10, 2019, 2:50 p.m., Andrei Sekretenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70530/
> -----------------------------------------------------------
> 
> (Updated May 10, 2019, 2:50 p.m.)
> 
> 
> Review request for mesos and Benjamin Mahler.
> 
> 
> Bugs: MESOS-7258
>     https://issues.apache.org/jira/browse/MESOS-7258
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The main concern of this patch is the behaviour of the framework
> re-subscription and of the upcoming UPDATE_FRAMEWORK call in cases when
> the framework attempts to change the "immutable" fields of
> FrameworkInfo, namely: `user`, `checkpoint`, `principal` and `id`.
> 
> The changes introduced by this patch:
> - The method `Framework::update()` is simplified: the logic of ignoring
>   the immutable fields is removed, this method fails the program if the
>   new values differ from the old ones. This is needed to simplify
>   keeping UPDATE_FRAMEWORK consistent with re-subscription.
> - The method `Master::validateFrameworkSubscription()` now returns
>   validation errors on attempts to change `user` or `checkpoint` fields.
>   This is needed to enable validating them in the UPDATE_FRAMEWORK call.
> - A method `Master::sanitizeFrameworkSubscription()` is introduced to
>   preserve the legacy behaviour of re-subscription (which ignores the
>   updates of `user` and `checkpoint`).
> - A second call of `validateFrameworkSubscription()` is performed after
>   the authorization. This is a crude fix of the already existing race
>   between two re-subscriptions against an empty master (see MESOS-9763).
>   It is necessary because otherwise the change in `Framework::update()`
>   would exacerbate this race (namely, it would be possible to crash the
>   master).
> 
> 
> Diffs
> -----
> 
>   src/master/framework.cpp 05f5514c589b2dba08afe77281e5fbc4e29f232b 
>   src/master/master.hpp 7d9732f1e432f6f0290d234242864cbdbf381fa8 
>   src/master/master.cpp a8ee6297e1587c160a47b806914d3c3aa4f35cd7 
>   src/master/validation.hpp 95638a17052ece6c957aa76e4cead8d7bfe82024 
>   src/master/validation.cpp 9fb0850987ce385d345302cac9721adead7181b8 
> 
> 
> Diff: https://reviews.apache.org/r/70530/diff/4/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Andrei Sekretenko
> 
>


Re: Review Request 70530: Refactored Framework updates for the UPDATE_FRAMEWORK call; fixed race.

Posted by Andrei Sekretenko <as...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70530/
-----------------------------------------------------------

(Updated May 10, 2019, 2:50 p.m.)


Review request for mesos and Benjamin Mahler.


Changes
-------

Fixed formatting of the description.


Summary (updated)
-----------------

Refactored Framework updates for the UPDATE_FRAMEWORK call; fixed race.


Bugs: MESOS-7258
    https://issues.apache.org/jira/browse/MESOS-7258


Repository: mesos


Description (updated)
-------

The main concern of this patch is the behaviour of the framework
re-subscription and of the upcoming UPDATE_FRAMEWORK call in cases when
the framework attempts to change the "immutable" fields of
FrameworkInfo, namely: `user`, `checkpoint`, `principal` and `id`.

The changes introduced by this patch:
- The method `Framework::update()` is simplified: the logic of ignoring
  the immutable fields is removed, this method fails the program if the
  new values differ from the old ones. This is needed to simplify
  keeping UPDATE_FRAMEWORK consistent with re-subscription.
- The method `Master::validateFrameworkSubscription()` now returns
  validation errors on attempts to change `user` or `checkpoint` fields.
  This is needed to enable validating them in the UPDATE_FRAMEWORK call.
- A method `Master::sanitizeFrameworkSubscription()` is introduced to
  preserve the legacy behaviour of re-subscription (which ignores the
  updates of `user` and `checkpoint`).
- A second call of `validateFrameworkSubscription()` is performed after
  the authorization. This is a crude fix of the already existing race
  between two re-subscriptions against an empty master (see MESOS-9763).
  It is necessary because otherwise the change in `Framework::update()`
  would exacerbate this race (namely, it would be possible to crash the
  master).


Diffs (updated)
-----

  src/master/framework.cpp 05f5514c589b2dba08afe77281e5fbc4e29f232b 
  src/master/master.hpp 7d9732f1e432f6f0290d234242864cbdbf381fa8 
  src/master/master.cpp a8ee6297e1587c160a47b806914d3c3aa4f35cd7 
  src/master/validation.hpp 95638a17052ece6c957aa76e4cead8d7bfe82024 
  src/master/validation.cpp 9fb0850987ce385d345302cac9721adead7181b8 


Diff: https://reviews.apache.org/r/70530/diff/4/

Changes: https://reviews.apache.org/r/70530/diff/3-4/


Testing
-------


Thanks,

Andrei Sekretenko


Re: Review Request 70530: Refactored FrameworkInfo updates for the UPDATE_FRAMEWORK call + race fix.

Posted by Andrei Sekretenko <as...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70530/
-----------------------------------------------------------

(Updated May 7, 2019, 11:49 a.m.)


Review request for mesos and Benjamin Mahler.


Changes
-------

Cleaned up the description.


Bugs: MESOS-7258
    https://issues.apache.org/jira/browse/MESOS-7258


Repository: mesos


Description (updated)
-------

The main concern of this patch is the behaviour of the framework re-subscription 
and of the upcoming UPDATE_FRAMEWORK call in cases when the framework attempts 
to change the "immutable" fields of FrameworkInfo, namely: `user`, `checkpoint`,
`principal` and `id`.                                                                       
                                                                                
The changes introduced by this patch:                                           
- The method `Framework::update()` is simplified: the logic of ignoring the     
  immutable fields is removed, this method fails the program if the new values  
  differ from the old ones. This is needed to simplify keeping UPDATE_FRAMEWORK
  consistent with re-subscription.
- The method `Master::validateFrameworkSubscription()` now returns validation   
  errors on attempts to change `user` or `checkpoint` fields. This is needed to 
  enable validating them in the UPDATE_FRAMEWORK call.                          
- A method `Master::sanitizeFrameworkSubscription()` is introduced              
  to preserve the legacy behaviour of re-subscription (which ignores the
  updates of `user` and `checkpoint`).                                             
- A second call of `validateFrameworkSubscription()` is performed after the     
  authorization. This is a crude fix of the already existing race between       
  two re-subscriptions against an empty master (see MESOS-9763). It is necessary 
  because otherwise the change in `Framework::update()` would exacerbate this 
  race (namely, it would be possible to crash the master).


Diffs
-----

  src/master/framework.cpp 05f5514c589b2dba08afe77281e5fbc4e29f232b 
  src/master/master.hpp 7d9732f1e432f6f0290d234242864cbdbf381fa8 
  src/master/master.cpp a8ee6297e1587c160a47b806914d3c3aa4f35cd7 
  src/master/validation.hpp 95638a17052ece6c957aa76e4cead8d7bfe82024 
  src/master/validation.cpp 9fb0850987ce385d345302cac9721adead7181b8 


Diff: https://reviews.apache.org/r/70530/diff/3/


Testing
-------


Thanks,

Andrei Sekretenko