You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Vinod Kone <vi...@gmail.com> on 2014/03/21 19:42:10 UTC

Review Request 19542: Fixed a bug to properly re-register an authenticating framework.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19542/
-----------------------------------------------------------

Review request for mesos and Ben Mahler.


Bugs: MESOS-1135
    https://issues.apache.org/jira/browse/MESOS-1135


Repository: mesos-git


Description
-------

See bug for details.


Diffs
-----

  src/master/master.cpp 3b28f72be2016997e30649d2b12ed0e8f1a57dd5 
  src/tests/fault_tolerance_tests.cpp 3f4796a0f5ed236d20906ee90e2cf8d30b5e228b 
  src/tests/master_tests.cpp 42c5a77425209d96f8e7286102a672be50463a40 

Diff: https://reviews.apache.org/r/19542/diff/


Testing
-------

Updated 2 tests to verify the behavior.

make check

./bin/mesos-tests.sh --gtest_filter="*FaultToleranceTest.FrameworkReregister*:*MasterInfoOnReElection*" --gtest_repeat=1000 --gtest_break_on_failure


Thanks,

Vinod Kone


Re: Review Request 19542: Fixed a bug to properly re-register an authenticating framework.

Posted by Vinod Kone <vi...@gmail.com>.

> On March 21, 2014, 8:46 p.m., Adam B wrote:
> > src/master/master.cpp, line 974
> > <https://reviews.apache.org/r/19542/diff/1/?file=531713#file531713line974>
> >
> >     Since allocator->frameworkActivated() calls allocate(), shouldn't we make the frameworkActivated() call after we "Remove any offers [previously] sent to this framework"? Otherwise, the allocate() call (in frameworkActivated) might allocate/offer new resources and then the following loop could call resourcesRecovered on the newly received offer/resources.

I don't think this is a problem because the offers sent by the allocator are only processed by the master (Master::offer) after it finishes executing this function. But I do think calling frameworkActivated() after removing current offers make sense here and in failoverFramework(). I will do that in a subsequent review. Added a TODO. Is that ok?


> On March 21, 2014, 8:46 p.m., Adam B wrote:
> > src/master/master.cpp, lines 2744-2747
> > <https://reviews.apache.org/r/19542/diff/1/?file=531713#file531713line2744>
> >
> >     Should we be setting framework->active = false; here? I know we're just moving it into frameworks.completed, but none of those should think that they're "active".

Based on how we resurrect a completed framework (Master::readdCompletedFramework) this seems fine. But I would like to do this also in a different review just to avoid doing too many changes in this review. Added a TODO.


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19542/#review38179
-----------------------------------------------------------


On March 21, 2014, 6:42 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19542/
> -----------------------------------------------------------
> 
> (Updated March 21, 2014, 6:42 p.m.)
> 
> 
> Review request for mesos and Ben Mahler.
> 
> 
> Bugs: MESOS-1135
>     https://issues.apache.org/jira/browse/MESOS-1135
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> See bug for details.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp 3b28f72be2016997e30649d2b12ed0e8f1a57dd5 
>   src/tests/fault_tolerance_tests.cpp 3f4796a0f5ed236d20906ee90e2cf8d30b5e228b 
>   src/tests/master_tests.cpp 42c5a77425209d96f8e7286102a672be50463a40 
> 
> Diff: https://reviews.apache.org/r/19542/diff/
> 
> 
> Testing
> -------
> 
> Updated 2 tests to verify the behavior.
> 
> make check
> 
> ./bin/mesos-tests.sh --gtest_filter="*FaultToleranceTest.FrameworkReregister*:*MasterInfoOnReElection*" --gtest_repeat=1000 --gtest_break_on_failure
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request 19542: Fixed a bug to properly re-register an authenticating framework.

Posted by Adam B <ad...@mesosphere.io>.

> On March 21, 2014, 1:46 p.m., Adam B wrote:
> > src/master/master.cpp, line 974
> > <https://reviews.apache.org/r/19542/diff/1/?file=531713#file531713line974>
> >
> >     Since allocator->frameworkActivated() calls allocate(), shouldn't we make the frameworkActivated() call after we "Remove any offers [previously] sent to this framework"? Otherwise, the allocate() call (in frameworkActivated) might allocate/offer new resources and then the following loop could call resourcesRecovered on the newly received offer/resources.
> 
> Vinod Kone wrote:
>     I don't think this is a problem because the offers sent by the allocator are only processed by the master (Master::offer) after it finishes executing this function. But I do think calling frameworkActivated() after removing current offers make sense here and in failoverFramework(). I will do that in a subsequent review. Added a TODO. Is that ok?

Ok. TODOs acknowledged.


> On March 21, 2014, 1:46 p.m., Adam B wrote:
> > src/master/master.cpp, lines 2744-2747
> > <https://reviews.apache.org/r/19542/diff/1/?file=531713#file531713line2744>
> >
> >     Should we be setting framework->active = false; here? I know we're just moving it into frameworks.completed, but none of those should think that they're "active".
> 
> Vinod Kone wrote:
>     Based on how we resurrect a completed framework (Master::readdCompletedFramework) this seems fine. But I would like to do this also in a different review just to avoid doing too many changes in this review. Added a TODO.

No big deal. I don't think the active flag on a completed framework would cause a difference in behavior (yet).
FYI, readdCompletedFramework doesn't really "resurrect" a completed framework which I interpret as adding it back to frameworks.activated. Instead, it is used on master failover so that a (re)registering slave will provide the new master with the state of its completed frameworks. The completed framework remains completed, but is added (back) to frameworks.completed on the new master. Yet another reason (doxygen) method comments would be helpful. I'll create a doxygen JIRA so we can get moving on that discussion.


- Adam


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19542/#review38179
-----------------------------------------------------------


On March 21, 2014, 2:19 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19542/
> -----------------------------------------------------------
> 
> (Updated March 21, 2014, 2:19 p.m.)
> 
> 
> Review request for mesos and Ben Mahler.
> 
> 
> Bugs: MESOS-1135
>     https://issues.apache.org/jira/browse/MESOS-1135
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> See bug for details.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp 3b28f72be2016997e30649d2b12ed0e8f1a57dd5 
>   src/tests/fault_tolerance_tests.cpp 3f4796a0f5ed236d20906ee90e2cf8d30b5e228b 
>   src/tests/master_tests.cpp 42c5a77425209d96f8e7286102a672be50463a40 
> 
> Diff: https://reviews.apache.org/r/19542/diff/
> 
> 
> Testing
> -------
> 
> Updated 2 tests to verify the behavior.
> 
> make check
> 
> ./bin/mesos-tests.sh --gtest_filter="*FaultToleranceTest.FrameworkReregister*:*MasterInfoOnReElection*" --gtest_repeat=1000 --gtest_break_on_failure
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request 19542: Fixed a bug to properly re-register an authenticating framework.

Posted by Adam B <ad...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19542/#review38179
-----------------------------------------------------------


Nice catch. Just a couple of comments based on looking deeper into frameworks.activated (which actually includes deactivated-but-not-completed frameworks too), framework->active, and allocator->frameworkActivated.


src/master/master.cpp
<https://reviews.apache.org/r/19542/#comment70172>

    Since allocator->frameworkActivated() calls allocate(), shouldn't we make the frameworkActivated() call after we "Remove any offers [previously] sent to this framework"? Otherwise, the allocate() call (in frameworkActivated) might allocate/offer new resources and then the following loop could call resourcesRecovered on the newly received offer/resources.



src/master/master.cpp
<https://reviews.apache.org/r/19542/#comment70179>

    Should we be setting framework->active = false; here? I know we're just moving it into frameworks.completed, but none of those should think that they're "active".


- Adam B


On March 21, 2014, 11:42 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19542/
> -----------------------------------------------------------
> 
> (Updated March 21, 2014, 11:42 a.m.)
> 
> 
> Review request for mesos and Ben Mahler.
> 
> 
> Bugs: MESOS-1135
>     https://issues.apache.org/jira/browse/MESOS-1135
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> See bug for details.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp 3b28f72be2016997e30649d2b12ed0e8f1a57dd5 
>   src/tests/fault_tolerance_tests.cpp 3f4796a0f5ed236d20906ee90e2cf8d30b5e228b 
>   src/tests/master_tests.cpp 42c5a77425209d96f8e7286102a672be50463a40 
> 
> Diff: https://reviews.apache.org/r/19542/diff/
> 
> 
> Testing
> -------
> 
> Updated 2 tests to verify the behavior.
> 
> make check
> 
> ./bin/mesos-tests.sh --gtest_filter="*FaultToleranceTest.FrameworkReregister*:*MasterInfoOnReElection*" --gtest_repeat=1000 --gtest_break_on_failure
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request 19542: Fixed a bug to properly re-register an authenticating framework.

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19542/#review38178
-----------------------------------------------------------

Ship it!


Would love to see an activate(Framework*) call that is the opposite of deactivate(Framework*) and we just always call activate when a framework (re-)registers, these could be no-ops when the framework is already in the corresponding state. Maybe let's leave a TODO for that?

- Ben Mahler


On March 21, 2014, 6:42 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19542/
> -----------------------------------------------------------
> 
> (Updated March 21, 2014, 6:42 p.m.)
> 
> 
> Review request for mesos and Ben Mahler.
> 
> 
> Bugs: MESOS-1135
>     https://issues.apache.org/jira/browse/MESOS-1135
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> See bug for details.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp 3b28f72be2016997e30649d2b12ed0e8f1a57dd5 
>   src/tests/fault_tolerance_tests.cpp 3f4796a0f5ed236d20906ee90e2cf8d30b5e228b 
>   src/tests/master_tests.cpp 42c5a77425209d96f8e7286102a672be50463a40 
> 
> Diff: https://reviews.apache.org/r/19542/diff/
> 
> 
> Testing
> -------
> 
> Updated 2 tests to verify the behavior.
> 
> make check
> 
> ./bin/mesos-tests.sh --gtest_filter="*FaultToleranceTest.FrameworkReregister*:*MasterInfoOnReElection*" --gtest_repeat=1000 --gtest_break_on_failure
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request 19542: Fixed a bug to properly re-register an authenticating framework.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19542/
-----------------------------------------------------------

(Updated March 21, 2014, 9:19 p.m.)


Review request for mesos and Ben Mahler.


Changes
-------

sorry forgot to rebase. NNFR.


Bugs: MESOS-1135
    https://issues.apache.org/jira/browse/MESOS-1135


Repository: mesos-git


Description
-------

See bug for details.


Diffs (updated)
-----

  src/master/master.cpp 3b28f72be2016997e30649d2b12ed0e8f1a57dd5 
  src/tests/fault_tolerance_tests.cpp 3f4796a0f5ed236d20906ee90e2cf8d30b5e228b 
  src/tests/master_tests.cpp 42c5a77425209d96f8e7286102a672be50463a40 

Diff: https://reviews.apache.org/r/19542/diff/


Testing
-------

Updated 2 tests to verify the behavior.

make check

./bin/mesos-tests.sh --gtest_filter="*FaultToleranceTest.FrameworkReregister*:*MasterInfoOnReElection*" --gtest_repeat=1000 --gtest_break_on_failure


Thanks,

Vinod Kone


Re: Review Request 19542: Fixed a bug to properly re-register an authenticating framework.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19542/
-----------------------------------------------------------

(Updated March 21, 2014, 9:18 p.m.)


Review request for mesos and Ben Mahler.


Changes
-------

Added TODOs per BenM's and Adam's feedback. NNFR.


Bugs: MESOS-1135
    https://issues.apache.org/jira/browse/MESOS-1135


Repository: mesos-git


Description
-------

See bug for details.


Diffs (updated)
-----

  src/master/master.cpp 3b28f72be2016997e30649d2b12ed0e8f1a57dd5 
  src/tests/fault_tolerance_tests.cpp 3f4796a0f5ed236d20906ee90e2cf8d30b5e228b 
  src/tests/master_tests.cpp 42c5a77425209d96f8e7286102a672be50463a40 
  support/verify-reviews.py 2ee777f8a1d21afdbf11c31f04a3d9d4d869a330 

Diff: https://reviews.apache.org/r/19542/diff/


Testing
-------

Updated 2 tests to verify the behavior.

make check

./bin/mesos-tests.sh --gtest_filter="*FaultToleranceTest.FrameworkReregister*:*MasterInfoOnReElection*" --gtest_repeat=1000 --gtest_break_on_failure


Thanks,

Vinod Kone