You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Greg Mann <gr...@mesosphere.io> on 2017/05/22 21:17:55 UTC
Review Request 59463: Added test for agent ping timeout during agent
recovery.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59463/
-----------------------------------------------------------
Review request for mesos, Anand Mazumdar, Benjamin Mahler, and Vinod Kone.
Bugs: MESOS-7540
https://issues.apache.org/jira/browse/MESOS-7540
Repository: mesos
Description
-------
This patch adds a new test, `SlaveRecoveryTest.PingTimeoutDuringRecovery`,
which verifies that the agent will reply to pings from the master while it
is performing recovery.
Diffs
-----
src/tests/slave_recovery_tests.cpp 52e78b6b6280a159233b402ce2849448204d4f11
Diff: https://reviews.apache.org/r/59463/diff/1/
Testing
-------
`GTEST_FILTER="*PingTimeoutDuringRecovery*" bin/mesos-tests.sh --gtest_repeat=-1 --gtest_break_on_failure`
Thanks,
Greg Mann
Re: Review Request 59463: Added test for agent ping timeout during
agent recovery.
Posted by Greg Mann <gr...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59463/
-----------------------------------------------------------
(Updated June 1, 2017, 5:48 p.m.)
Review request for mesos, Anand Mazumdar, Benjamin Mahler, and Vinod Kone.
Bugs: MESOS-7540
https://issues.apache.org/jira/browse/MESOS-7540
Repository: mesos
Description
-------
This patch adds a new test,
`SlaveRecoveryTest.PingTimeoutDuringRecovery`, which verifies
that the agent will reply to pings from the master while it
is performing recovery.
Diffs (updated)
-----
src/tests/slave_recovery_tests.cpp df0c5c88786190be06df7ef3602834aa8985cefe
Diff: https://reviews.apache.org/r/59463/diff/7/
Changes: https://reviews.apache.org/r/59463/diff/6-7/
Testing
-------
`GTEST_FILTER="*PingTimeoutDuringRecovery*" bin/mesos-tests.sh --gtest_repeat=-1 --gtest_break_on_failure`
Thanks,
Greg Mann
Re: Review Request 59463: Added test for agent ping timeout during
agent recovery.
Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59463/#review176402
-----------------------------------------------------------
Fix it, then Ship it!
src/tests/slave_recovery_tests.cpp
Lines 991 (patched)
<https://reviews.apache.org/r/59463/#comment249767>
you need to wait until the ack is checkpointed.
- Vinod Kone
On May 30, 2017, 11:29 p.m., Greg Mann wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59463/
> -----------------------------------------------------------
>
> (Updated May 30, 2017, 11:29 p.m.)
>
>
> Review request for mesos, Anand Mazumdar, Benjamin Mahler, and Vinod Kone.
>
>
> Bugs: MESOS-7540
> https://issues.apache.org/jira/browse/MESOS-7540
>
>
> Repository: mesos
>
>
> Description
> -------
>
> This patch adds a new test,
> `SlaveRecoveryTest.PingTimeoutDuringRecovery`, which verifies
> that the agent will reply to pings from the master while it
> is performing recovery.
>
>
> Diffs
> -----
>
> src/tests/slave_recovery_tests.cpp df0c5c88786190be06df7ef3602834aa8985cefe
>
>
> Diff: https://reviews.apache.org/r/59463/diff/6/
>
>
> Testing
> -------
>
> `GTEST_FILTER="*PingTimeoutDuringRecovery*" bin/mesos-tests.sh --gtest_repeat=-1 --gtest_break_on_failure`
>
>
> Thanks,
>
> Greg Mann
>
>
Re: Review Request 59463: Added test for agent ping timeout during
agent recovery.
Posted by Greg Mann <gr...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59463/
-----------------------------------------------------------
(Updated May 30, 2017, 11:29 p.m.)
Review request for mesos, Anand Mazumdar, Benjamin Mahler, and Vinod Kone.
Changes
-------
Rebase.
Bugs: MESOS-7540
https://issues.apache.org/jira/browse/MESOS-7540
Repository: mesos
Description
-------
This patch adds a new test,
`SlaveRecoveryTest.PingTimeoutDuringRecovery`, which verifies
that the agent will reply to pings from the master while it
is performing recovery.
Diffs (updated)
-----
src/tests/slave_recovery_tests.cpp df0c5c88786190be06df7ef3602834aa8985cefe
Diff: https://reviews.apache.org/r/59463/diff/6/
Changes: https://reviews.apache.org/r/59463/diff/5-6/
Testing
-------
`GTEST_FILTER="*PingTimeoutDuringRecovery*" bin/mesos-tests.sh --gtest_repeat=-1 --gtest_break_on_failure`
Thanks,
Greg Mann
Re: Review Request 59463: Added test for agent ping timeout during
agent recovery.
Posted by Greg Mann <gr...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59463/
-----------------------------------------------------------
(Updated May 27, 2017, 1:32 a.m.)
Review request for mesos, Anand Mazumdar, Benjamin Mahler, and Vinod Kone.
Bugs: MESOS-7540
https://issues.apache.org/jira/browse/MESOS-7540
Repository: mesos
Description
-------
This patch adds a new test,
`SlaveRecoveryTest.PingTimeoutDuringRecovery`, which verifies
that the agent will reply to pings from the master while it
is performing recovery.
Diffs (updated)
-----
src/tests/slave_recovery_tests.cpp df0c5c88786190be06df7ef3602834aa8985cefe
Diff: https://reviews.apache.org/r/59463/diff/5/
Changes: https://reviews.apache.org/r/59463/diff/4-5/
Testing
-------
`GTEST_FILTER="*PingTimeoutDuringRecovery*" bin/mesos-tests.sh --gtest_repeat=-1 --gtest_break_on_failure`
Thanks,
Greg Mann
Re: Review Request 59463: Added test for agent ping timeout during
agent recovery.
Posted by Greg Mann <gr...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59463/
-----------------------------------------------------------
(Updated May 26, 2017, 5:17 p.m.)
Review request for mesos, Anand Mazumdar, Benjamin Mahler, and Vinod Kone.
Bugs: MESOS-7540
https://issues.apache.org/jira/browse/MESOS-7540
Repository: mesos
Description (updated)
-------
This patch adds a new test,
`SlaveRecoveryTest.PingTimeoutDuringRecovery`, which verifies
that the agent will reply to pings from the master while it
is performing recovery.
Diffs
-----
src/tests/slave_recovery_tests.cpp 0aa87f534fbc655e3f1aa2ab7f56a1b6be7a8755
Diff: https://reviews.apache.org/r/59463/diff/4/
Testing
-------
`GTEST_FILTER="*PingTimeoutDuringRecovery*" bin/mesos-tests.sh --gtest_repeat=-1 --gtest_break_on_failure`
Thanks,
Greg Mann
Re: Review Request 59463: Added test for agent ping timeout during
agent recovery.
Posted by Greg Mann <gr...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59463/
-----------------------------------------------------------
(Updated May 24, 2017, 5:54 p.m.)
Review request for mesos, Anand Mazumdar, Benjamin Mahler, and Vinod Kone.
Bugs: MESOS-7540
https://issues.apache.org/jira/browse/MESOS-7540
Repository: mesos
Description
-------
This patch adds a new test, `SlaveRecoveryTest.PingTimeoutDuringRecovery`,
which verifies that the agent will reply to pings from the master while it
is performing recovery.
Diffs (updated)
-----
src/tests/slave_recovery_tests.cpp 0aa87f534fbc655e3f1aa2ab7f56a1b6be7a8755
Diff: https://reviews.apache.org/r/59463/diff/4/
Changes: https://reviews.apache.org/r/59463/diff/3-4/
Testing
-------
`GTEST_FILTER="*PingTimeoutDuringRecovery*" bin/mesos-tests.sh --gtest_repeat=-1 --gtest_break_on_failure`
Thanks,
Greg Mann
Re: Review Request 59463: Added test for agent ping timeout during
agent recovery.
Posted by Greg Mann <gr...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59463/
-----------------------------------------------------------
(Updated May 24, 2017, 4:30 a.m.)
Review request for mesos, Anand Mazumdar, Benjamin Mahler, and Vinod Kone.
Bugs: MESOS-7540
https://issues.apache.org/jira/browse/MESOS-7540
Repository: mesos
Description
-------
This patch adds a new test, `SlaveRecoveryTest.PingTimeoutDuringRecovery`,
which verifies that the agent will reply to pings from the master while it
is performing recovery.
Diffs (updated)
-----
src/tests/slave_recovery_tests.cpp 0aa87f534fbc655e3f1aa2ab7f56a1b6be7a8755
Diff: https://reviews.apache.org/r/59463/diff/3/
Changes: https://reviews.apache.org/r/59463/diff/2-3/
Testing
-------
`GTEST_FILTER="*PingTimeoutDuringRecovery*" bin/mesos-tests.sh --gtest_repeat=-1 --gtest_break_on_failure`
Thanks,
Greg Mann
Re: Review Request 59463: Added test for agent ping timeout during
agent recovery.
Posted by Greg Mann <gr...@mesosphere.io>.
> On May 24, 2017, 1:27 a.m., Benjamin Mahler wrote:
> > src/tests/slave_recovery_tests.cpp
> > Lines 955 (patched)
> > <https://reviews.apache.org/r/59463/diff/2/?file=1730917#file1730917line955>
> >
> > Rather than pausing, resuming and pausing again, have you tried leaving the clock paused for the whole test?
I did try this, but was unable to achieve a successful agent re-registration when the clock was paused.
- Greg
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59463/#review175888
-----------------------------------------------------------
On May 24, 2017, 5:54 p.m., Greg Mann wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59463/
> -----------------------------------------------------------
>
> (Updated May 24, 2017, 5:54 p.m.)
>
>
> Review request for mesos, Anand Mazumdar, Benjamin Mahler, and Vinod Kone.
>
>
> Bugs: MESOS-7540
> https://issues.apache.org/jira/browse/MESOS-7540
>
>
> Repository: mesos
>
>
> Description
> -------
>
> This patch adds a new test, `SlaveRecoveryTest.PingTimeoutDuringRecovery`,
> which verifies that the agent will reply to pings from the master while it
> is performing recovery.
>
>
> Diffs
> -----
>
> src/tests/slave_recovery_tests.cpp 0aa87f534fbc655e3f1aa2ab7f56a1b6be7a8755
>
>
> Diff: https://reviews.apache.org/r/59463/diff/4/
>
>
> Testing
> -------
>
> `GTEST_FILTER="*PingTimeoutDuringRecovery*" bin/mesos-tests.sh --gtest_repeat=-1 --gtest_break_on_failure`
>
>
> Thanks,
>
> Greg Mann
>
>
Re: Review Request 59463: Added test for agent ping timeout during
agent recovery.
Posted by Benjamin Mahler <bm...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59463/#review175888
-----------------------------------------------------------
Fix it, then Ship it!
src/tests/slave_recovery_tests.cpp
Lines 807 (patched)
<https://reviews.apache.org/r/59463/#comment249233>
"re-registration"
src/tests/slave_recovery_tests.cpp
Lines 810-812 (patched)
<https://reviews.apache.org/r/59463/#comment249241>
elapsed, even if the executors are all re-registered
src/tests/slave_recovery_tests.cpp
Lines 811 (patched)
<https://reviews.apache.org/r/59463/#comment249234>
"re-registration"
src/tests/slave_recovery_tests.cpp
Lines 812 (patched)
<https://reviews.apache.org/r/59463/#comment249237>
(see MESOS-7539).
src/tests/slave_recovery_tests.cpp
Lines 894 (patched)
<https://reviews.apache.org/r/59463/#comment249238>
issue (see MESOS-7551).
Maybe add a TODO here?
TODO(gregggomannn): Remove this once MESOS-7551 is resolved.
src/tests/slave_recovery_tests.cpp
Lines 914 (patched)
<https://reviews.apache.org/r/59463/#comment249236>
What was this settle for?
src/tests/slave_recovery_tests.cpp
Lines 918 (patched)
<https://reviews.apache.org/r/59463/#comment249240>
Can you use 'unsigned int' (i.e. only the "equivalent types" from here: http://en.cppreference.com/w/cpp/language/types) or 'size_t' here since it's a count?
src/tests/slave_recovery_tests.cpp
Lines 955 (patched)
<https://reviews.apache.org/r/59463/#comment249235>
Rather than pausing, resuming and pausing again, have you tried leaving the clock paused for the whole test?
- Benjamin Mahler
On May 23, 2017, 11:59 p.m., Greg Mann wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59463/
> -----------------------------------------------------------
>
> (Updated May 23, 2017, 11:59 p.m.)
>
>
> Review request for mesos, Anand Mazumdar, Benjamin Mahler, and Vinod Kone.
>
>
> Bugs: MESOS-7540
> https://issues.apache.org/jira/browse/MESOS-7540
>
>
> Repository: mesos
>
>
> Description
> -------
>
> This patch adds a new test, `SlaveRecoveryTest.PingTimeoutDuringRecovery`,
> which verifies that the agent will reply to pings from the master while it
> is performing recovery.
>
>
> Diffs
> -----
>
> src/tests/slave_recovery_tests.cpp 52e78b6b6280a159233b402ce2849448204d4f11
>
>
> Diff: https://reviews.apache.org/r/59463/diff/2/
>
>
> Testing
> -------
>
> `GTEST_FILTER="*PingTimeoutDuringRecovery*" bin/mesos-tests.sh --gtest_repeat=-1 --gtest_break_on_failure`
>
>
> Thanks,
>
> Greg Mann
>
>
Re: Review Request 59463: Added test for agent ping timeout during
agent recovery.
Posted by Greg Mann <gr...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59463/
-----------------------------------------------------------
(Updated May 23, 2017, 11:59 p.m.)
Review request for mesos, Anand Mazumdar, Benjamin Mahler, and Vinod Kone.
Bugs: MESOS-7540
https://issues.apache.org/jira/browse/MESOS-7540
Repository: mesos
Description
-------
This patch adds a new test, `SlaveRecoveryTest.PingTimeoutDuringRecovery`,
which verifies that the agent will reply to pings from the master while it
is performing recovery.
Diffs (updated)
-----
src/tests/slave_recovery_tests.cpp 52e78b6b6280a159233b402ce2849448204d4f11
Diff: https://reviews.apache.org/r/59463/diff/2/
Changes: https://reviews.apache.org/r/59463/diff/1-2/
Testing
-------
`GTEST_FILTER="*PingTimeoutDuringRecovery*" bin/mesos-tests.sh --gtest_repeat=-1 --gtest_break_on_failure`
Thanks,
Greg Mann
Re: Review Request 59463: Added test for agent ping timeout during
agent recovery.
Posted by Benjamin Mahler <bm...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59463/#review175720
-----------------------------------------------------------
src/tests/slave_recovery_tests.cpp
Lines 891-898 (patched)
<https://reviews.apache.org/r/59463/#comment249072>
This needs to advance and wait for each ping, otherwise it only leads to 1 ping being fired. You can find some examples:
$ grep -R PingSlaveMessage src/tests
$ grep -R PongSlaveMessage src/tests
- Benjamin Mahler
On May 22, 2017, 9:17 p.m., Greg Mann wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59463/
> -----------------------------------------------------------
>
> (Updated May 22, 2017, 9:17 p.m.)
>
>
> Review request for mesos, Anand Mazumdar, Benjamin Mahler, and Vinod Kone.
>
>
> Bugs: MESOS-7540
> https://issues.apache.org/jira/browse/MESOS-7540
>
>
> Repository: mesos
>
>
> Description
> -------
>
> This patch adds a new test, `SlaveRecoveryTest.PingTimeoutDuringRecovery`,
> which verifies that the agent will reply to pings from the master while it
> is performing recovery.
>
>
> Diffs
> -----
>
> src/tests/slave_recovery_tests.cpp 52e78b6b6280a159233b402ce2849448204d4f11
>
>
> Diff: https://reviews.apache.org/r/59463/diff/1/
>
>
> Testing
> -------
>
> `GTEST_FILTER="*PingTimeoutDuringRecovery*" bin/mesos-tests.sh --gtest_repeat=-1 --gtest_break_on_failure`
>
>
> Thanks,
>
> Greg Mann
>
>
Re: Review Request 59463: Added test for agent ping timeout during
agent recovery.
Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59463/#review175714
-----------------------------------------------------------
src/tests/slave_recovery_tests.cpp
Lines 870 (patched)
<https://reviews.apache.org/r/59463/#comment249057>
you want to wait until the update is acked before bringing down the agent. otherwise it might result in a status update retry messing up your expectations.
- Vinod Kone
On May 22, 2017, 9:17 p.m., Greg Mann wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59463/
> -----------------------------------------------------------
>
> (Updated May 22, 2017, 9:17 p.m.)
>
>
> Review request for mesos, Anand Mazumdar, Benjamin Mahler, and Vinod Kone.
>
>
> Bugs: MESOS-7540
> https://issues.apache.org/jira/browse/MESOS-7540
>
>
> Repository: mesos
>
>
> Description
> -------
>
> This patch adds a new test, `SlaveRecoveryTest.PingTimeoutDuringRecovery`,
> which verifies that the agent will reply to pings from the master while it
> is performing recovery.
>
>
> Diffs
> -----
>
> src/tests/slave_recovery_tests.cpp 52e78b6b6280a159233b402ce2849448204d4f11
>
>
> Diff: https://reviews.apache.org/r/59463/diff/1/
>
>
> Testing
> -------
>
> `GTEST_FILTER="*PingTimeoutDuringRecovery*" bin/mesos-tests.sh --gtest_repeat=-1 --gtest_break_on_failure`
>
>
> Thanks,
>
> Greg Mann
>
>