You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Benjamin Bannier <bb...@apache.org> on 2017/10/09 16:35:53 UTC

Review Request 62847: Did not send total agent resources if no resource providers are present.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62847/
-----------------------------------------------------------

Review request for mesos, Jie Yu and Jan Schlicht.


Bugs: MESOS-8058
    https://issues.apache.org/jira/browse/MESOS-8058


Repository: mesos


Description
-------

An agent sending an 'UpdateSlaveMessage' including a total can race
with a master sending a 'CheckpointResourcesMessage'. Since
the total in 'UpdateSlaveMessage' is intended to inform the master
about changes to resource provider resources (the resources of the
agent itself are static), and 'CheckpointResourcesMessage' is used by
master not yet capable of offer operation feedback, we here explicitly
disallow combining an agent with resource providers and a master not
supporting offer operation feedback.


Diffs
-----

  src/slave/slave.cpp 2e05637ed10a39eb3f4ce953506b1cb5d50caf3c 
  src/tests/oversubscription_tests.cpp 09b4a423a8a9fc543f5456010e7c54919500f78b 


Diff: https://reviews.apache.org/r/62847/diff/1/


Testing
-------

`make check`


Thanks,

Benjamin Bannier


Re: Review Request 62847: Did not send total agent resources if no resource providers are present.

Posted by Jie Yu <yu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62847/#review187470
-----------------------------------------------------------




src/slave/slave.cpp
Lines 1268 (patched)
<https://reviews.apache.org/r/62847/#comment264478>

    Consider the following cases:
    
    1. Old master, new agent
    
    For this case, old master will send `CheckpointResourcesMessage` when the new agent re-registers, and will ignore the 'total' part in the `UpdateSlaveMessage`. So even without this patch, the master and agent's view on total resources will be in sync.
    
    2. New master, old agent
    
    For this case, the old agent will not generate `UpdateSlaveMessage` for updating total during slave re-regisration. Therefore, even without this patch, the master and agent's view will be in sync if the new master always send `CheckpointResourcesMessage` on slave re-registration.
    
    However, if we want to move to a world that the new master does not reply `CheckpointResourcesMessage` on slave re-registration, we need to use slave capabilities. Basically, do not reply `CheckpointResourcesMessage` if it's a new agent. For agent, we have to because it won't send out a total to the master.
    
    3. New master, new agent
    
    For this case, the new agent will always send a `UpdateSlaveMessage` after (re-)registered. And the new master won't reply with `CheckpointResourcesMessage` on slave re-registration because it's a new agent based on the slave capabilities.
    
    I see several benefits of this apporach comparing to the one proposed in this patch:
    
    (1) We potentially allow new agent to change its total resources, not limited to LRP related resources.
    (2) Agent should ideally be source of truth in Mesos world. In the previous case, we think the master is source of truth. 
    (3) Think about it more, we should treat this as the same case as when the old operation fails on agent default resources. It does not make sense to "retry" that operation. This will be more consistent comparing to other operations we plan to add, which we don't retry.
    
    Let me know what do you think!


- Jie Yu


On Oct. 9, 2017, 4:35 p.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62847/
> -----------------------------------------------------------
> 
> (Updated Oct. 9, 2017, 4:35 p.m.)
> 
> 
> Review request for mesos, Jie Yu and Jan Schlicht.
> 
> 
> Bugs: MESOS-8058
>     https://issues.apache.org/jira/browse/MESOS-8058
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> An agent sending an 'UpdateSlaveMessage' including a total can race
> with a master sending a 'CheckpointResourcesMessage'. Since
> the total in 'UpdateSlaveMessage' is intended to inform the master
> about changes to resource provider resources (the resources of the
> agent itself are static), and 'CheckpointResourcesMessage' is used by
> master not yet capable of offer operation feedback, we here explicitly
> disallow combining an agent with resource providers and a master not
> supporting offer operation feedback.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.cpp 2e05637ed10a39eb3f4ce953506b1cb5d50caf3c 
>   src/tests/oversubscription_tests.cpp 09b4a423a8a9fc543f5456010e7c54919500f78b 
> 
> 
> Diff: https://reviews.apache.org/r/62847/diff/1/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>


Re: Review Request 62847: Did not send total agent resources if no resource providers are present.

Posted by Mesos Reviewbot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62847/#review187432
-----------------------------------------------------------



Patch looks great!

Reviews applied: [62843, 62834, 62847]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' CONFIGURATION='--verbose --disable-libtool-wrappers' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On Oct. 9, 2017, 4:35 p.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62847/
> -----------------------------------------------------------
> 
> (Updated Oct. 9, 2017, 4:35 p.m.)
> 
> 
> Review request for mesos, Jie Yu and Jan Schlicht.
> 
> 
> Bugs: MESOS-8058
>     https://issues.apache.org/jira/browse/MESOS-8058
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> An agent sending an 'UpdateSlaveMessage' including a total can race
> with a master sending a 'CheckpointResourcesMessage'. Since
> the total in 'UpdateSlaveMessage' is intended to inform the master
> about changes to resource provider resources (the resources of the
> agent itself are static), and 'CheckpointResourcesMessage' is used by
> master not yet capable of offer operation feedback, we here explicitly
> disallow combining an agent with resource providers and a master not
> supporting offer operation feedback.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.cpp 2e05637ed10a39eb3f4ce953506b1cb5d50caf3c 
>   src/tests/oversubscription_tests.cpp 09b4a423a8a9fc543f5456010e7c54919500f78b 
> 
> 
> Diff: https://reviews.apache.org/r/62847/diff/1/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>


Re: Review Request 62847: Did not send total agent resources if no resource providers are present.

Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62847/#review187443
-----------------------------------------------------------



PASS: Mesos patch 62847 was successfully built and tested.

Reviews applied: `['62843', '62834', '62847']`

All the build artifacts available at: http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/62847

- Mesos Reviewbot Windows


On Oct. 9, 2017, 4:35 p.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62847/
> -----------------------------------------------------------
> 
> (Updated Oct. 9, 2017, 4:35 p.m.)
> 
> 
> Review request for mesos, Jie Yu and Jan Schlicht.
> 
> 
> Bugs: MESOS-8058
>     https://issues.apache.org/jira/browse/MESOS-8058
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> An agent sending an 'UpdateSlaveMessage' including a total can race
> with a master sending a 'CheckpointResourcesMessage'. Since
> the total in 'UpdateSlaveMessage' is intended to inform the master
> about changes to resource provider resources (the resources of the
> agent itself are static), and 'CheckpointResourcesMessage' is used by
> master not yet capable of offer operation feedback, we here explicitly
> disallow combining an agent with resource providers and a master not
> supporting offer operation feedback.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.cpp 2e05637ed10a39eb3f4ce953506b1cb5d50caf3c 
>   src/tests/oversubscription_tests.cpp 09b4a423a8a9fc543f5456010e7c54919500f78b 
> 
> 
> Diff: https://reviews.apache.org/r/62847/diff/1/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>


Re: Review Request 62847: Did not send total agent resources if no resource providers are present.

Posted by Jie Yu <yu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62847/#review187488
-----------------------------------------------------------




src/slave/slave.cpp
Lines 1275 (patched)
<https://reviews.apache.org/r/62847/#comment264514>

    Noticed some other issues:
    
    1) What about 'reregistered' path?
    2) What if LRP does exist, but its total has been changed to zero?
    3) What if UpdateSlaveMessage is sent due to LRP registration? Will you have the same race condition? In other words, while the master is sending `CheckpointResourcesMessage`, an LRP registered with the agent and result in a `UpdateSlaveMessage` being sent out with the current total of the agent default resources.


- Jie Yu


On Oct. 9, 2017, 4:35 p.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62847/
> -----------------------------------------------------------
> 
> (Updated Oct. 9, 2017, 4:35 p.m.)
> 
> 
> Review request for mesos, Jie Yu and Jan Schlicht.
> 
> 
> Bugs: MESOS-8058
>     https://issues.apache.org/jira/browse/MESOS-8058
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> An agent sending an 'UpdateSlaveMessage' including a total can race
> with a master sending a 'CheckpointResourcesMessage'. Since
> the total in 'UpdateSlaveMessage' is intended to inform the master
> about changes to resource provider resources (the resources of the
> agent itself are static), and 'CheckpointResourcesMessage' is used by
> master not yet capable of offer operation feedback, we here explicitly
> disallow combining an agent with resource providers and a master not
> supporting offer operation feedback.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.cpp 2e05637ed10a39eb3f4ce953506b1cb5d50caf3c 
>   src/tests/oversubscription_tests.cpp 09b4a423a8a9fc543f5456010e7c54919500f78b 
> 
> 
> Diff: https://reviews.apache.org/r/62847/diff/1/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>