You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Vinod Kone <vi...@gmail.com> on 2013/03/15 05:54:03 UTC

Review Request: Fixed master to properly remove a non-checkpointing framework when a checkpointing slave exits.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9945/
-----------------------------------------------------------

Review request for mesos, Benjamin Hindman and Ben Mahler.


Description
-------

Sending this out before I write any tests, to get feedback on the semantics.


Diffs
-----

  src/master/master.hpp c9b4b3f3ff562b75f74a605f3cd7a6957f00d9e7 
  src/master/master.cpp ecefc8b9fc14b4f0950b053d47bb2d6e464f8724 

Diff: https://reviews.apache.org/r/9945/diff/


Testing
-------

None yet. Will update this diff when I write one.


Thanks,

Vinod Kone


Re: Review Request: Fixed master to properly remove a non-checkpointing framework when a checkpointing slave exits.

Posted by Vinod Kone <vi...@gmail.com>.

> On March 15, 2013, 9:27 a.m., Benjamin Hindman wrote:
> > src/master/master.cpp, line 1839
> > <https://reviews.apache.org/r/9945/diff/1/?file=270895#file270895line1839>
> >
> >     But even the local framework object would be insufficient if that master failed over to another one right?

Fwiw, this was just copy pasted from removeFramework(Framework*), which I think you wrote? Any how, decided to just refer to that function instead of repeating the comment here, so that it easy to keep it in sync.

Having said that, this is not a new issue and iiuc, cannot be solved without a registrar?


> On March 15, 2013, 9:27 a.m., Benjamin Hindman wrote:
> > src/master/master.cpp, lines 541-544
> > <https://reviews.apache.org/r/9945/diff/1/?file=270895#file270895line541>
> >
> >     foreachvalue (Task* task, utils::copy(slave->tasks)) {
> >       if (!ids.contains(task->framework_id())) {
> >         ...;
> >       } 
> >     }

thank you.


> On March 15, 2013, 9:27 a.m., Benjamin Hindman wrote:
> > src/master/master.cpp, lines 1878-1880
> > <https://reviews.apache.org/r/9945/diff/1/?file=270895#file270895line1878>
> >
> >     if (slave->executors.contains(framework->id)) {
> >       foreachkey (const ExecutorID& executor, slave->executors[framework->id]) {
> >         ...;
> >       }
> >     }

duh. thank you.


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9945/#review17965
-----------------------------------------------------------


On March 15, 2013, 4:54 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9945/
> -----------------------------------------------------------
> 
> (Updated March 15, 2013, 4:54 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> Sending this out before I write any tests, to get feedback on the semantics.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp c9b4b3f3ff562b75f74a605f3cd7a6957f00d9e7 
>   src/master/master.cpp ecefc8b9fc14b4f0950b053d47bb2d6e464f8724 
> 
> Diff: https://reviews.apache.org/r/9945/diff/
> 
> 
> Testing
> -------
> 
> None yet. Will update this diff when I write one.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Fixed master to properly remove a non-checkpointing framework when a checkpointing slave exits.

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9945/#review17965
-----------------------------------------------------------



src/master/master.cpp
<https://reviews.apache.org/r/9945/#comment37968>

    foreachvalue (Task* task, utils::copy(slave->tasks)) {
      if (!ids.contains(task->framework_id())) {
        ...;
      } 
    }



src/master/master.cpp
<https://reviews.apache.org/r/9945/#comment37967>

    But even the local framework object would be insufficient if that master failed over to another one right?



src/master/master.cpp
<https://reviews.apache.org/r/9945/#comment37965>

    if (slave->executors.contains(framework->id)) {
      foreachkey (const ExecutorID& executor, slave->executors[framework->id]) {
        ...;
      }
    }


- Benjamin Hindman


On March 15, 2013, 4:54 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9945/
> -----------------------------------------------------------
> 
> (Updated March 15, 2013, 4:54 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> Sending this out before I write any tests, to get feedback on the semantics.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp c9b4b3f3ff562b75f74a605f3cd7a6957f00d9e7 
>   src/master/master.cpp ecefc8b9fc14b4f0950b053d47bb2d6e464f8724 
> 
> Diff: https://reviews.apache.org/r/9945/diff/
> 
> 
> Testing
> -------
> 
> None yet. Will update this diff when I write one.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Fixed master to properly remove a non-checkpointing framework when a checkpointing slave exits.

Posted by Benjamin Mahler <be...@gmail.com>.
Nice!


On Mon, Apr 8, 2013 at 10:16 AM, Vinod Kone <vi...@gmail.com> wrote:

>    This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9945/
>
> On April 8th, 2013, 4:51 p.m., *Ben Mahler* wrote:
>
> Sounds like the slave could also use some states, e.g.
> INITIALIZING
> RUNNING
> CHECKPOINTING
> TERMINATING
>
>  Yup. Done in another review: https://reviews.apache.org/r/10268/
>
>
> - Vinod
>
> On March 15th, 2013, 8:37 p.m., Vinod Kone wrote:
>   Review request for mesos, Benjamin Hindman and Ben Mahler.
> By Vinod Kone.
>
> *Updated March 15, 2013, 8:37 p.m.*
> Description
>
> Sending this out before I write any tests, to get feedback on the semantics.
>
>   Testing
>
> make check.
>
>   Diffs
>
>    - src/master/master.hpp (c9b4b3f3ff562b75f74a605f3cd7a6957f00d9e7)
>    - src/master/master.cpp (ecefc8b9fc14b4f0950b053d47bb2d6e464f8724)
>    - src/tests/slave_recovery_tests.cpp
>    (a1fa485d917e882ad1d950532758c5978e8fbb05)
>
> View Diff <https://reviews.apache.org/r/9945/diff/>
>

Re: Review Request: Fixed master to properly remove a non-checkpointing framework when a checkpointing slave exits.

Posted by Vinod Kone <vi...@gmail.com>.

> On April 8, 2013, 4:51 p.m., Ben Mahler wrote:
> > Sounds like the slave could also use some states, e.g.
> > INITIALIZING
> > RUNNING
> > CHECKPOINTING
> > TERMINATING

Yup. Done in another review: https://reviews.apache.org/r/10268/


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9945/#review18777
-----------------------------------------------------------


On March 15, 2013, 8:37 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9945/
> -----------------------------------------------------------
> 
> (Updated March 15, 2013, 8:37 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> Sending this out before I write any tests, to get feedback on the semantics.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp c9b4b3f3ff562b75f74a605f3cd7a6957f00d9e7 
>   src/master/master.cpp ecefc8b9fc14b4f0950b053d47bb2d6e464f8724 
>   src/tests/slave_recovery_tests.cpp a1fa485d917e882ad1d950532758c5978e8fbb05 
> 
> Diff: https://reviews.apache.org/r/9945/diff/
> 
> 
> Testing
> -------
> 
> make check.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Fixed master to properly remove a non-checkpointing framework when a checkpointing slave exits.

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9945/#review18777
-----------------------------------------------------------


Sounds like the slave could also use some states, e.g.
INITIALIZING
RUNNING
CHECKPOINTING
TERMINATING

- Ben Mahler


On March 15, 2013, 8:37 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9945/
> -----------------------------------------------------------
> 
> (Updated March 15, 2013, 8:37 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> Sending this out before I write any tests, to get feedback on the semantics.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp c9b4b3f3ff562b75f74a605f3cd7a6957f00d9e7 
>   src/master/master.cpp ecefc8b9fc14b4f0950b053d47bb2d6e464f8724 
>   src/tests/slave_recovery_tests.cpp a1fa485d917e882ad1d950532758c5978e8fbb05 
> 
> Diff: https://reviews.apache.org/r/9945/diff/
> 
> 
> Testing
> -------
> 
> make check.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Fixed master to properly remove a non-checkpointing framework when a checkpointing slave exits.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9945/
-----------------------------------------------------------

(Updated March 15, 2013, 8:37 p.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

fixed minor typo in comment. no need for review. updated for posterity.


Description
-------

Sending this out before I write any tests, to get feedback on the semantics.


Diffs (updated)
-----

  src/master/master.hpp c9b4b3f3ff562b75f74a605f3cd7a6957f00d9e7 
  src/master/master.cpp ecefc8b9fc14b4f0950b053d47bb2d6e464f8724 
  src/tests/slave_recovery_tests.cpp a1fa485d917e882ad1d950532758c5978e8fbb05 

Diff: https://reviews.apache.org/r/9945/diff/


Testing (updated)
-------

make check.


Thanks,

Vinod Kone


Re: Review Request: Fixed master to properly remove a non-checkpointing framework when a checkpointing slave exits.

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9945/#review17995
-----------------------------------------------------------

Ship it!


Ship It!

- Benjamin Hindman


On March 15, 2013, 7:48 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9945/
> -----------------------------------------------------------
> 
> (Updated March 15, 2013, 7:48 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> Sending this out before I write any tests, to get feedback on the semantics.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp c9b4b3f3ff562b75f74a605f3cd7a6957f00d9e7 
>   src/master/master.cpp ecefc8b9fc14b4f0950b053d47bb2d6e464f8724 
>   src/tests/slave_recovery_tests.cpp a1fa485d917e882ad1d950532758c5978e8fbb05 
> 
> Diff: https://reviews.apache.org/r/9945/diff/
> 
> 
> Testing
> -------
> 
> None yet. Will update this diff when I write one.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Fixed master to properly remove a non-checkpointing framework when a checkpointing slave exits.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9945/
-----------------------------------------------------------

(Updated March 15, 2013, 7:48 p.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

Added a test.


Description
-------

Sending this out before I write any tests, to get feedback on the semantics.


Diffs (updated)
-----

  src/master/master.hpp c9b4b3f3ff562b75f74a605f3cd7a6957f00d9e7 
  src/master/master.cpp ecefc8b9fc14b4f0950b053d47bb2d6e464f8724 
  src/tests/slave_recovery_tests.cpp a1fa485d917e882ad1d950532758c5978e8fbb05 

Diff: https://reviews.apache.org/r/9945/diff/


Testing
-------

None yet. Will update this diff when I write one.


Thanks,

Vinod Kone


Re: Review Request: Fixed master to properly remove a non-checkpointing framework when a checkpointing slave exits.

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9945/#review17986
-----------------------------------------------------------

Ship it!


Ship It!

- Benjamin Hindman


On March 15, 2013, 5:20 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9945/
> -----------------------------------------------------------
> 
> (Updated March 15, 2013, 5:20 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> Sending this out before I write any tests, to get feedback on the semantics.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp c9b4b3f3ff562b75f74a605f3cd7a6957f00d9e7 
>   src/master/master.cpp ecefc8b9fc14b4f0950b053d47bb2d6e464f8724 
> 
> Diff: https://reviews.apache.org/r/9945/diff/
> 
> 
> Testing
> -------
> 
> None yet. Will update this diff when I write one.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Fixed master to properly remove a non-checkpointing framework when a checkpointing slave exits.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9945/
-----------------------------------------------------------

(Updated March 15, 2013, 5:20 p.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Description
-------

Sending this out before I write any tests, to get feedback on the semantics.


Diffs (updated)
-----

  src/master/master.hpp c9b4b3f3ff562b75f74a605f3cd7a6957f00d9e7 
  src/master/master.cpp ecefc8b9fc14b4f0950b053d47bb2d6e464f8724 

Diff: https://reviews.apache.org/r/9945/diff/


Testing
-------

None yet. Will update this diff when I write one.


Thanks,

Vinod Kone


Re: Review Request: Fixed master to properly remove a non-checkpointing framework when a checkpointing slave exits.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9945/
-----------------------------------------------------------

(Updated March 15, 2013, 5:19 p.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

benh's comments.

also, clearly explained the slave disconnected semantics with a comment in exited().


Description
-------

Sending this out before I write any tests, to get feedback on the semantics.


Diffs (updated)
-----

  src/master/master.hpp c9b4b3f3ff562b75f74a605f3cd7a6957f00d9e7 
  src/master/master.cpp ecefc8b9fc14b4f0950b053d47bb2d6e464f8724 

Diff: https://reviews.apache.org/r/9945/diff/


Testing
-------

None yet. Will update this diff when I write one.


Thanks,

Vinod Kone