You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Greg Mann <gr...@mesosphere.io> on 2017/05/22 21:17:55 UTC

Review Request 59463: Added test for agent ping timeout during agent recovery.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59463/
-----------------------------------------------------------

Review request for mesos, Anand Mazumdar, Benjamin Mahler, and Vinod Kone.


Bugs: MESOS-7540
    https://issues.apache.org/jira/browse/MESOS-7540


Repository: mesos


Description
-------

This patch adds a new test, `SlaveRecoveryTest.PingTimeoutDuringRecovery`,
which verifies that the agent will reply to pings from the master while it
is performing recovery.


Diffs
-----

  src/tests/slave_recovery_tests.cpp 52e78b6b6280a159233b402ce2849448204d4f11 


Diff: https://reviews.apache.org/r/59463/diff/1/


Testing
-------

`GTEST_FILTER="*PingTimeoutDuringRecovery*" bin/mesos-tests.sh --gtest_repeat=-1 --gtest_break_on_failure`


Thanks,

Greg Mann


Re: Review Request 59463: Added test for agent ping timeout during agent recovery.

Posted by Greg Mann <gr...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59463/
-----------------------------------------------------------

(Updated June 1, 2017, 5:48 p.m.)


Review request for mesos, Anand Mazumdar, Benjamin Mahler, and Vinod Kone.


Bugs: MESOS-7540
    https://issues.apache.org/jira/browse/MESOS-7540


Repository: mesos


Description
-------

This patch adds a new test,
`SlaveRecoveryTest.PingTimeoutDuringRecovery`, which verifies
that the agent will reply to pings from the master while it
is performing recovery.


Diffs (updated)
-----

  src/tests/slave_recovery_tests.cpp df0c5c88786190be06df7ef3602834aa8985cefe 


Diff: https://reviews.apache.org/r/59463/diff/7/

Changes: https://reviews.apache.org/r/59463/diff/6-7/


Testing
-------

`GTEST_FILTER="*PingTimeoutDuringRecovery*" bin/mesos-tests.sh --gtest_repeat=-1 --gtest_break_on_failure`


Thanks,

Greg Mann


Re: Review Request 59463: Added test for agent ping timeout during agent recovery.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59463/#review176402
-----------------------------------------------------------


Fix it, then Ship it!





src/tests/slave_recovery_tests.cpp
Lines 991 (patched)
<https://reviews.apache.org/r/59463/#comment249767>

    you need to wait until the ack is checkpointed.


- Vinod Kone


On May 30, 2017, 11:29 p.m., Greg Mann wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59463/
> -----------------------------------------------------------
> 
> (Updated May 30, 2017, 11:29 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Benjamin Mahler, and Vinod Kone.
> 
> 
> Bugs: MESOS-7540
>     https://issues.apache.org/jira/browse/MESOS-7540
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch adds a new test,
> `SlaveRecoveryTest.PingTimeoutDuringRecovery`, which verifies
> that the agent will reply to pings from the master while it
> is performing recovery.
> 
> 
> Diffs
> -----
> 
>   src/tests/slave_recovery_tests.cpp df0c5c88786190be06df7ef3602834aa8985cefe 
> 
> 
> Diff: https://reviews.apache.org/r/59463/diff/6/
> 
> 
> Testing
> -------
> 
> `GTEST_FILTER="*PingTimeoutDuringRecovery*" bin/mesos-tests.sh --gtest_repeat=-1 --gtest_break_on_failure`
> 
> 
> Thanks,
> 
> Greg Mann
> 
>


Re: Review Request 59463: Added test for agent ping timeout during agent recovery.

Posted by Greg Mann <gr...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59463/
-----------------------------------------------------------

(Updated May 30, 2017, 11:29 p.m.)


Review request for mesos, Anand Mazumdar, Benjamin Mahler, and Vinod Kone.


Changes
-------

Rebase.


Bugs: MESOS-7540
    https://issues.apache.org/jira/browse/MESOS-7540


Repository: mesos


Description
-------

This patch adds a new test,
`SlaveRecoveryTest.PingTimeoutDuringRecovery`, which verifies
that the agent will reply to pings from the master while it
is performing recovery.


Diffs (updated)
-----

  src/tests/slave_recovery_tests.cpp df0c5c88786190be06df7ef3602834aa8985cefe 


Diff: https://reviews.apache.org/r/59463/diff/6/

Changes: https://reviews.apache.org/r/59463/diff/5-6/


Testing
-------

`GTEST_FILTER="*PingTimeoutDuringRecovery*" bin/mesos-tests.sh --gtest_repeat=-1 --gtest_break_on_failure`


Thanks,

Greg Mann


Re: Review Request 59463: Added test for agent ping timeout during agent recovery.

Posted by Greg Mann <gr...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59463/
-----------------------------------------------------------

(Updated May 27, 2017, 1:32 a.m.)


Review request for mesos, Anand Mazumdar, Benjamin Mahler, and Vinod Kone.


Bugs: MESOS-7540
    https://issues.apache.org/jira/browse/MESOS-7540


Repository: mesos


Description
-------

This patch adds a new test,
`SlaveRecoveryTest.PingTimeoutDuringRecovery`, which verifies
that the agent will reply to pings from the master while it
is performing recovery.


Diffs (updated)
-----

  src/tests/slave_recovery_tests.cpp df0c5c88786190be06df7ef3602834aa8985cefe 


Diff: https://reviews.apache.org/r/59463/diff/5/

Changes: https://reviews.apache.org/r/59463/diff/4-5/


Testing
-------

`GTEST_FILTER="*PingTimeoutDuringRecovery*" bin/mesos-tests.sh --gtest_repeat=-1 --gtest_break_on_failure`


Thanks,

Greg Mann


Re: Review Request 59463: Added test for agent ping timeout during agent recovery.

Posted by Greg Mann <gr...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59463/
-----------------------------------------------------------

(Updated May 26, 2017, 5:17 p.m.)


Review request for mesos, Anand Mazumdar, Benjamin Mahler, and Vinod Kone.


Bugs: MESOS-7540
    https://issues.apache.org/jira/browse/MESOS-7540


Repository: mesos


Description (updated)
-------

This patch adds a new test,
`SlaveRecoveryTest.PingTimeoutDuringRecovery`, which verifies
that the agent will reply to pings from the master while it
is performing recovery.


Diffs
-----

  src/tests/slave_recovery_tests.cpp 0aa87f534fbc655e3f1aa2ab7f56a1b6be7a8755 


Diff: https://reviews.apache.org/r/59463/diff/4/


Testing
-------

`GTEST_FILTER="*PingTimeoutDuringRecovery*" bin/mesos-tests.sh --gtest_repeat=-1 --gtest_break_on_failure`


Thanks,

Greg Mann


Re: Review Request 59463: Added test for agent ping timeout during agent recovery.

Posted by Greg Mann <gr...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59463/
-----------------------------------------------------------

(Updated May 24, 2017, 5:54 p.m.)


Review request for mesos, Anand Mazumdar, Benjamin Mahler, and Vinod Kone.


Bugs: MESOS-7540
    https://issues.apache.org/jira/browse/MESOS-7540


Repository: mesos


Description
-------

This patch adds a new test, `SlaveRecoveryTest.PingTimeoutDuringRecovery`,
which verifies that the agent will reply to pings from the master while it
is performing recovery.


Diffs (updated)
-----

  src/tests/slave_recovery_tests.cpp 0aa87f534fbc655e3f1aa2ab7f56a1b6be7a8755 


Diff: https://reviews.apache.org/r/59463/diff/4/

Changes: https://reviews.apache.org/r/59463/diff/3-4/


Testing
-------

`GTEST_FILTER="*PingTimeoutDuringRecovery*" bin/mesos-tests.sh --gtest_repeat=-1 --gtest_break_on_failure`


Thanks,

Greg Mann


Re: Review Request 59463: Added test for agent ping timeout during agent recovery.

Posted by Greg Mann <gr...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59463/
-----------------------------------------------------------

(Updated May 24, 2017, 4:30 a.m.)


Review request for mesos, Anand Mazumdar, Benjamin Mahler, and Vinod Kone.


Bugs: MESOS-7540
    https://issues.apache.org/jira/browse/MESOS-7540


Repository: mesos


Description
-------

This patch adds a new test, `SlaveRecoveryTest.PingTimeoutDuringRecovery`,
which verifies that the agent will reply to pings from the master while it
is performing recovery.


Diffs (updated)
-----

  src/tests/slave_recovery_tests.cpp 0aa87f534fbc655e3f1aa2ab7f56a1b6be7a8755 


Diff: https://reviews.apache.org/r/59463/diff/3/

Changes: https://reviews.apache.org/r/59463/diff/2-3/


Testing
-------

`GTEST_FILTER="*PingTimeoutDuringRecovery*" bin/mesos-tests.sh --gtest_repeat=-1 --gtest_break_on_failure`


Thanks,

Greg Mann


Re: Review Request 59463: Added test for agent ping timeout during agent recovery.

Posted by Greg Mann <gr...@mesosphere.io>.

> On May 24, 2017, 1:27 a.m., Benjamin Mahler wrote:
> > src/tests/slave_recovery_tests.cpp
> > Lines 955 (patched)
> > <https://reviews.apache.org/r/59463/diff/2/?file=1730917#file1730917line955>
> >
> >     Rather than pausing, resuming and pausing again, have you tried leaving the clock paused for the whole test?

I did try this, but was unable to achieve a successful agent re-registration when the clock was paused.


- Greg


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59463/#review175888
-----------------------------------------------------------


On May 24, 2017, 5:54 p.m., Greg Mann wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59463/
> -----------------------------------------------------------
> 
> (Updated May 24, 2017, 5:54 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Benjamin Mahler, and Vinod Kone.
> 
> 
> Bugs: MESOS-7540
>     https://issues.apache.org/jira/browse/MESOS-7540
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch adds a new test, `SlaveRecoveryTest.PingTimeoutDuringRecovery`,
> which verifies that the agent will reply to pings from the master while it
> is performing recovery.
> 
> 
> Diffs
> -----
> 
>   src/tests/slave_recovery_tests.cpp 0aa87f534fbc655e3f1aa2ab7f56a1b6be7a8755 
> 
> 
> Diff: https://reviews.apache.org/r/59463/diff/4/
> 
> 
> Testing
> -------
> 
> `GTEST_FILTER="*PingTimeoutDuringRecovery*" bin/mesos-tests.sh --gtest_repeat=-1 --gtest_break_on_failure`
> 
> 
> Thanks,
> 
> Greg Mann
> 
>


Re: Review Request 59463: Added test for agent ping timeout during agent recovery.

Posted by Benjamin Mahler <bm...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59463/#review175888
-----------------------------------------------------------


Fix it, then Ship it!





src/tests/slave_recovery_tests.cpp
Lines 807 (patched)
<https://reviews.apache.org/r/59463/#comment249233>

    "re-registration"



src/tests/slave_recovery_tests.cpp
Lines 810-812 (patched)
<https://reviews.apache.org/r/59463/#comment249241>

    elapsed, even if the executors are all re-registered



src/tests/slave_recovery_tests.cpp
Lines 811 (patched)
<https://reviews.apache.org/r/59463/#comment249234>

    "re-registration"



src/tests/slave_recovery_tests.cpp
Lines 812 (patched)
<https://reviews.apache.org/r/59463/#comment249237>

    (see MESOS-7539).



src/tests/slave_recovery_tests.cpp
Lines 894 (patched)
<https://reviews.apache.org/r/59463/#comment249238>

    issue (see MESOS-7551).
    
    Maybe add a TODO here?
    
    TODO(gregggomannn): Remove this once MESOS-7551 is resolved.



src/tests/slave_recovery_tests.cpp
Lines 914 (patched)
<https://reviews.apache.org/r/59463/#comment249236>

    What was this settle for?



src/tests/slave_recovery_tests.cpp
Lines 918 (patched)
<https://reviews.apache.org/r/59463/#comment249240>

    Can you use 'unsigned int' (i.e. only the "equivalent types" from here: http://en.cppreference.com/w/cpp/language/types) or 'size_t' here since it's a count?



src/tests/slave_recovery_tests.cpp
Lines 955 (patched)
<https://reviews.apache.org/r/59463/#comment249235>

    Rather than pausing, resuming and pausing again, have you tried leaving the clock paused for the whole test?


- Benjamin Mahler


On May 23, 2017, 11:59 p.m., Greg Mann wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59463/
> -----------------------------------------------------------
> 
> (Updated May 23, 2017, 11:59 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Benjamin Mahler, and Vinod Kone.
> 
> 
> Bugs: MESOS-7540
>     https://issues.apache.org/jira/browse/MESOS-7540
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch adds a new test, `SlaveRecoveryTest.PingTimeoutDuringRecovery`,
> which verifies that the agent will reply to pings from the master while it
> is performing recovery.
> 
> 
> Diffs
> -----
> 
>   src/tests/slave_recovery_tests.cpp 52e78b6b6280a159233b402ce2849448204d4f11 
> 
> 
> Diff: https://reviews.apache.org/r/59463/diff/2/
> 
> 
> Testing
> -------
> 
> `GTEST_FILTER="*PingTimeoutDuringRecovery*" bin/mesos-tests.sh --gtest_repeat=-1 --gtest_break_on_failure`
> 
> 
> Thanks,
> 
> Greg Mann
> 
>


Re: Review Request 59463: Added test for agent ping timeout during agent recovery.

Posted by Greg Mann <gr...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59463/
-----------------------------------------------------------

(Updated May 23, 2017, 11:59 p.m.)


Review request for mesos, Anand Mazumdar, Benjamin Mahler, and Vinod Kone.


Bugs: MESOS-7540
    https://issues.apache.org/jira/browse/MESOS-7540


Repository: mesos


Description
-------

This patch adds a new test, `SlaveRecoveryTest.PingTimeoutDuringRecovery`,
which verifies that the agent will reply to pings from the master while it
is performing recovery.


Diffs (updated)
-----

  src/tests/slave_recovery_tests.cpp 52e78b6b6280a159233b402ce2849448204d4f11 


Diff: https://reviews.apache.org/r/59463/diff/2/

Changes: https://reviews.apache.org/r/59463/diff/1-2/


Testing
-------

`GTEST_FILTER="*PingTimeoutDuringRecovery*" bin/mesos-tests.sh --gtest_repeat=-1 --gtest_break_on_failure`


Thanks,

Greg Mann


Re: Review Request 59463: Added test for agent ping timeout during agent recovery.

Posted by Benjamin Mahler <bm...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59463/#review175720
-----------------------------------------------------------




src/tests/slave_recovery_tests.cpp
Lines 891-898 (patched)
<https://reviews.apache.org/r/59463/#comment249072>

    This needs to advance and wait for each ping, otherwise it only leads to 1 ping being fired. You can find some examples:
    
    $ grep -R PingSlaveMessage src/tests
    $ grep -R PongSlaveMessage src/tests


- Benjamin Mahler


On May 22, 2017, 9:17 p.m., Greg Mann wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59463/
> -----------------------------------------------------------
> 
> (Updated May 22, 2017, 9:17 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Benjamin Mahler, and Vinod Kone.
> 
> 
> Bugs: MESOS-7540
>     https://issues.apache.org/jira/browse/MESOS-7540
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch adds a new test, `SlaveRecoveryTest.PingTimeoutDuringRecovery`,
> which verifies that the agent will reply to pings from the master while it
> is performing recovery.
> 
> 
> Diffs
> -----
> 
>   src/tests/slave_recovery_tests.cpp 52e78b6b6280a159233b402ce2849448204d4f11 
> 
> 
> Diff: https://reviews.apache.org/r/59463/diff/1/
> 
> 
> Testing
> -------
> 
> `GTEST_FILTER="*PingTimeoutDuringRecovery*" bin/mesos-tests.sh --gtest_repeat=-1 --gtest_break_on_failure`
> 
> 
> Thanks,
> 
> Greg Mann
> 
>


Re: Review Request 59463: Added test for agent ping timeout during agent recovery.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59463/#review175714
-----------------------------------------------------------




src/tests/slave_recovery_tests.cpp
Lines 870 (patched)
<https://reviews.apache.org/r/59463/#comment249057>

    you want to wait until the update is acked before bringing down the agent. otherwise it might result in a status update retry messing up your expectations.


- Vinod Kone


On May 22, 2017, 9:17 p.m., Greg Mann wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59463/
> -----------------------------------------------------------
> 
> (Updated May 22, 2017, 9:17 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Benjamin Mahler, and Vinod Kone.
> 
> 
> Bugs: MESOS-7540
>     https://issues.apache.org/jira/browse/MESOS-7540
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch adds a new test, `SlaveRecoveryTest.PingTimeoutDuringRecovery`,
> which verifies that the agent will reply to pings from the master while it
> is performing recovery.
> 
> 
> Diffs
> -----
> 
>   src/tests/slave_recovery_tests.cpp 52e78b6b6280a159233b402ce2849448204d4f11 
> 
> 
> Diff: https://reviews.apache.org/r/59463/diff/1/
> 
> 
> Testing
> -------
> 
> `GTEST_FILTER="*PingTimeoutDuringRecovery*" bin/mesos-tests.sh --gtest_repeat=-1 --gtest_break_on_failure`
> 
> 
> Thanks,
> 
> Greg Mann
> 
>