You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Dominic Hamon <dh...@twopensource.com> on 2014/06/10 22:40:25 UTC

Review Request 22441: Paused the clock earlier to avoid early task failure message.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22441/
-----------------------------------------------------------

Review request for mesos, Ian Downes and Vinod Kone.


Bugs: MESOS-1437
    https://issues.apache.org/jira/browse/MESOS-1437


Repository: mesos-git


Description
-------

Occasionally the task would fail due to unknown container before we started the recover. Moving the pause earlier to ensure we are ready for the failure before we see it.


Diffs
-----

  src/tests/slave_recovery_tests.cpp 44ffac40b9edc9940f17b5fbe1848d56cf53b69b 

Diff: https://reviews.apache.org/r/22441/diff/


Testing
-------

make check x 1500


Thanks,

Dominic Hamon


Re: Review Request 22441: Paused the clock earlier to avoid early task failure message.

Posted by Mesos ReviewBot <de...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22441/#review45498
-----------------------------------------------------------


Patch looks great!

Reviews applied: [22441]

All tests passed.

- Mesos ReviewBot


On June 11, 2014, 9:01 p.m., Dominic Hamon wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22441/
> -----------------------------------------------------------
> 
> (Updated June 11, 2014, 9:01 p.m.)
> 
> 
> Review request for mesos, Ian Downes and Vinod Kone.
> 
> 
> Bugs: MESOS-1437
>     https://issues.apache.org/jira/browse/MESOS-1437
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Occasionally the task would fail due to unknown container before we started the recover. Moving the pause earlier to ensure we are ready for the failure before we see it.
> 
> 
> Diffs
> -----
> 
>   src/tests/slave_recovery_tests.cpp 44ffac40b9edc9940f17b5fbe1848d56cf53b69b 
> 
> Diff: https://reviews.apache.org/r/22441/diff/
> 
> 
> Testing
> -------
> 
> make check x 1500
> 
> 
> Thanks,
> 
> Dominic Hamon
> 
>


Re: Review Request 22441: Paused the clock earlier to avoid early task failure message.

Posted by Vinod Kone <vi...@gmail.com>.

> On June 12, 2014, 6:05 p.m., Ian Downes wrote:
> > src/tests/slave_recovery_tests.cpp, line 3053
> > <https://reviews.apache.org/r/22441/diff/2/?file=607620#file607620line3053>
> >
> >     I think we also need to expect the wait() and do nothing for that as well. See the ticket for details.

+1


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22441/#review45515
-----------------------------------------------------------


On June 11, 2014, 9:01 p.m., Dominic Hamon wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22441/
> -----------------------------------------------------------
> 
> (Updated June 11, 2014, 9:01 p.m.)
> 
> 
> Review request for mesos, Ian Downes and Vinod Kone.
> 
> 
> Bugs: MESOS-1437
>     https://issues.apache.org/jira/browse/MESOS-1437
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Occasionally the task would fail due to unknown container before we started the recover. Moving the pause earlier to ensure we are ready for the failure before we see it.
> 
> 
> Diffs
> -----
> 
>   src/tests/slave_recovery_tests.cpp 44ffac40b9edc9940f17b5fbe1848d56cf53b69b 
> 
> Diff: https://reviews.apache.org/r/22441/diff/
> 
> 
> Testing
> -------
> 
> make check x 1500
> 
> 
> Thanks,
> 
> Dominic Hamon
> 
>


Re: Review Request 22441: Paused the clock earlier to avoid early task failure message.

Posted by Ian Downes <ia...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22441/#review45515
-----------------------------------------------------------



src/tests/slave_recovery_tests.cpp
<https://reviews.apache.org/r/22441/#comment80373>

    I think we also need to expect the wait() and do nothing for that as well. See the ticket for details.


- Ian Downes


On June 11, 2014, 2:01 p.m., Dominic Hamon wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22441/
> -----------------------------------------------------------
> 
> (Updated June 11, 2014, 2:01 p.m.)
> 
> 
> Review request for mesos, Ian Downes and Vinod Kone.
> 
> 
> Bugs: MESOS-1437
>     https://issues.apache.org/jira/browse/MESOS-1437
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Occasionally the task would fail due to unknown container before we started the recover. Moving the pause earlier to ensure we are ready for the failure before we see it.
> 
> 
> Diffs
> -----
> 
>   src/tests/slave_recovery_tests.cpp 44ffac40b9edc9940f17b5fbe1848d56cf53b69b 
> 
> Diff: https://reviews.apache.org/r/22441/diff/
> 
> 
> Testing
> -------
> 
> make check x 1500
> 
> 
> Thanks,
> 
> Dominic Hamon
> 
>


Re: Review Request 22441: Intercept wait and allow for multiple status update manager flushes.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22441/#review45559
-----------------------------------------------------------

Ship it!



src/tests/slave_recovery_tests.cpp
<https://reviews.apache.org/r/22441/#comment80446>

    Add a comment here on why you are doing this.


- Vinod Kone


On June 12, 2014, 11:37 p.m., Dominic Hamon wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22441/
> -----------------------------------------------------------
> 
> (Updated June 12, 2014, 11:37 p.m.)
> 
> 
> Review request for mesos, Ian Downes and Vinod Kone.
> 
> 
> Bugs: MESOS-1437
>     https://issues.apache.org/jira/browse/MESOS-1437
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Occasionally the task would fail due to unknown container before we started the recover. Moving the pause earlier to ensure we are ready for the failure before we see it.
> 
> 
> Diffs
> -----
> 
>   src/tests/slave_recovery_tests.cpp 44ffac40b9edc9940f17b5fbe1848d56cf53b69b 
> 
> Diff: https://reviews.apache.org/r/22441/diff/
> 
> 
> Testing
> -------
> 
> make check x 1500
> 
> 
> Thanks,
> 
> Dominic Hamon
> 
>


Re: Review Request 22441: Intercept wait and allow for multiple status update manager flushes.

Posted by Dominic Hamon <dh...@twopensource.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22441/
-----------------------------------------------------------

(Updated June 12, 2014, 6:45 p.m.)


Review request for mesos, Ian Downes and Vinod Kone.


Changes
-------

commented


Bugs: MESOS-1437
    https://issues.apache.org/jira/browse/MESOS-1437


Repository: mesos-git


Description
-------

Occasionally the task would fail due to unknown container before we started the recover. Moving the pause earlier to ensure we are ready for the failure before we see it.


Diffs (updated)
-----

  src/tests/slave_recovery_tests.cpp 44ffac40b9edc9940f17b5fbe1848d56cf53b69b 

Diff: https://reviews.apache.org/r/22441/diff/


Testing
-------

make check x 1500


Thanks,

Dominic Hamon


Re: Review Request 22441: Intercept wait and allow for multiple status update manager flushes.

Posted by Dominic Hamon <dh...@twopensource.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22441/
-----------------------------------------------------------

(Updated June 12, 2014, 4:37 p.m.)


Review request for mesos, Ian Downes and Vinod Kone.


Summary (updated)
-----------------

Intercept wait and allow for multiple status update manager flushes.


Bugs: MESOS-1437
    https://issues.apache.org/jira/browse/MESOS-1437


Repository: mesos-git


Description
-------

Occasionally the task would fail due to unknown container before we started the recover. Moving the pause earlier to ensure we are ready for the failure before we see it.


Diffs (updated)
-----

  src/tests/slave_recovery_tests.cpp 44ffac40b9edc9940f17b5fbe1848d56cf53b69b 

Diff: https://reviews.apache.org/r/22441/diff/


Testing
-------

make check x 1500


Thanks,

Dominic Hamon


Re: Review Request 22441: Paused the clock earlier to avoid early task failure message.

Posted by Dominic Hamon <dh...@twopensource.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22441/
-----------------------------------------------------------

(Updated June 11, 2014, 2:01 p.m.)


Review request for mesos, Ian Downes and Vinod Kone.


Changes
-------

removed unnecessary settle


Bugs: MESOS-1437
    https://issues.apache.org/jira/browse/MESOS-1437


Repository: mesos-git


Description
-------

Occasionally the task would fail due to unknown container before we started the recover. Moving the pause earlier to ensure we are ready for the failure before we see it.


Diffs (updated)
-----

  src/tests/slave_recovery_tests.cpp 44ffac40b9edc9940f17b5fbe1848d56cf53b69b 

Diff: https://reviews.apache.org/r/22441/diff/


Testing
-------

make check x 1500


Thanks,

Dominic Hamon


Re: Review Request 22441: Paused the clock earlier to avoid early task failure message.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22441/#review45407
-----------------------------------------------------------



src/tests/slave_recovery_tests.cpp
<https://reviews.apache.org/r/22441/#comment80251>

    this is the default. why did you need to set it here?



src/tests/slave_recovery_tests.cpp
<https://reviews.apache.org/r/22441/#comment80257>

    Can you add a comment on why you are doing a settle here? I'm still a bit unclear after reading your description.


- Vinod Kone


On June 10, 2014, 8:40 p.m., Dominic Hamon wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22441/
> -----------------------------------------------------------
> 
> (Updated June 10, 2014, 8:40 p.m.)
> 
> 
> Review request for mesos, Ian Downes and Vinod Kone.
> 
> 
> Bugs: MESOS-1437
>     https://issues.apache.org/jira/browse/MESOS-1437
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Occasionally the task would fail due to unknown container before we started the recover. Moving the pause earlier to ensure we are ready for the failure before we see it.
> 
> 
> Diffs
> -----
> 
>   src/tests/slave_recovery_tests.cpp 44ffac40b9edc9940f17b5fbe1848d56cf53b69b 
> 
> Diff: https://reviews.apache.org/r/22441/diff/
> 
> 
> Testing
> -------
> 
> make check x 1500
> 
> 
> Thanks,
> 
> Dominic Hamon
> 
>


Re: Review Request 22441: Paused the clock earlier to avoid early task failure message.

Posted by Dominic Hamon <dh...@twopensource.com>.

> On June 11, 2014, 10:44 a.m., Ian Downes wrote:
> > Ship It!

Is it possible that https://reviews.apache.org/r/22253/ will also fix this flakiness?


- Dominic


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22441/#review45398
-----------------------------------------------------------


On June 11, 2014, 2:01 p.m., Dominic Hamon wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22441/
> -----------------------------------------------------------
> 
> (Updated June 11, 2014, 2:01 p.m.)
> 
> 
> Review request for mesos, Ian Downes and Vinod Kone.
> 
> 
> Bugs: MESOS-1437
>     https://issues.apache.org/jira/browse/MESOS-1437
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Occasionally the task would fail due to unknown container before we started the recover. Moving the pause earlier to ensure we are ready for the failure before we see it.
> 
> 
> Diffs
> -----
> 
>   src/tests/slave_recovery_tests.cpp 44ffac40b9edc9940f17b5fbe1848d56cf53b69b 
> 
> Diff: https://reviews.apache.org/r/22441/diff/
> 
> 
> Testing
> -------
> 
> make check x 1500
> 
> 
> Thanks,
> 
> Dominic Hamon
> 
>


Re: Review Request 22441: Paused the clock earlier to avoid early task failure message.

Posted by Ian Downes <ia...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22441/#review45398
-----------------------------------------------------------

Ship it!


Ship It!

- Ian Downes


On June 10, 2014, 1:40 p.m., Dominic Hamon wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22441/
> -----------------------------------------------------------
> 
> (Updated June 10, 2014, 1:40 p.m.)
> 
> 
> Review request for mesos, Ian Downes and Vinod Kone.
> 
> 
> Bugs: MESOS-1437
>     https://issues.apache.org/jira/browse/MESOS-1437
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Occasionally the task would fail due to unknown container before we started the recover. Moving the pause earlier to ensure we are ready for the failure before we see it.
> 
> 
> Diffs
> -----
> 
>   src/tests/slave_recovery_tests.cpp 44ffac40b9edc9940f17b5fbe1848d56cf53b69b 
> 
> Diff: https://reviews.apache.org/r/22441/diff/
> 
> 
> Testing
> -------
> 
> make check x 1500
> 
> 
> Thanks,
> 
> Dominic Hamon
> 
>


Re: Review Request 22441: Paused the clock earlier to avoid early task failure message.

Posted by Mesos ReviewBot <de...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22441/#review45367
-----------------------------------------------------------


Patch looks great!

Reviews applied: [22441]

All tests passed.

- Mesos ReviewBot


On June 10, 2014, 8:40 p.m., Dominic Hamon wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22441/
> -----------------------------------------------------------
> 
> (Updated June 10, 2014, 8:40 p.m.)
> 
> 
> Review request for mesos, Ian Downes and Vinod Kone.
> 
> 
> Bugs: MESOS-1437
>     https://issues.apache.org/jira/browse/MESOS-1437
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Occasionally the task would fail due to unknown container before we started the recover. Moving the pause earlier to ensure we are ready for the failure before we see it.
> 
> 
> Diffs
> -----
> 
>   src/tests/slave_recovery_tests.cpp 44ffac40b9edc9940f17b5fbe1848d56cf53b69b 
> 
> Diff: https://reviews.apache.org/r/22441/diff/
> 
> 
> Testing
> -------
> 
> make check x 1500
> 
> 
> Thanks,
> 
> Dominic Hamon
> 
>