You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Anand Mazumdar <ma...@gmail.com> on 2015/09/22 22:43:01 UTC
Review Request 38645: Fixed Flaky Executor HTTP tests
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38645/
-----------------------------------------------------------
Review request for mesos, Isabel Jimenez and Vinod Kone.
Repository: mesos
Description
-------
This showed up on ASF. From the logs:
`I0922 17:31:49.819221 28463 slave.cpp:4104] Finished recovery”`
Then ....
`../../src/tests/executor_http_api_tests.cpp:290: Failure`
`Failed to wait 15secs for __recover`
Instead of doing a `FUTURE_DISPATCH` after `StartSlave()` we should be doing it before starting the slave. In some cases, slave would have already recovered by the time we invoke `FUTURE_DISPATCH` leading to the flakiness.
Diffs
-----
src/tests/executor_http_api_tests.cpp 9dbc5191b5950df2faa693720f3740e97c7df758
Diff: https://reviews.apache.org/r/38645/diff/
Testing
-------
I was not able to reproduce it before or after this change but looking at the logs it is quite obvious what the issue was. Ran in a loop 100 times.
ASF error log: https://builds.apache.org/job/Mesos/COMPILER=gcc,CONFIGURATION=--verbose,OS=ubuntu%3A14.04,label_exp=docker%7C%7CHadoop/839/changes
Thanks,
Anand Mazumdar
Re: Review Request 38645: Fixed Flaky Executor HTTP tests
Posted by Anand Mazumdar <ma...@gmail.com>.
> On Sept. 22, 2015, 9:09 p.m., Neil Conway wrote:
> > For the sake of repro'ing, maybe you could add a sleep before waiting on the future? Obviously not something we want in the actual patch though.
Thanks Neil, that worked. Updated the `Testing Done` section with the details now. Should have spent more time reproducing it then just leaving it to inference from the error logs.
- Anand
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38645/#review100066
-----------------------------------------------------------
On Sept. 22, 2015, 8:46 p.m., Anand Mazumdar wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/38645/
> -----------------------------------------------------------
>
> (Updated Sept. 22, 2015, 8:46 p.m.)
>
>
> Review request for mesos, Isabel Jimenez and Vinod Kone.
>
>
> Repository: mesos
>
>
> Description
> -------
>
> This showed up on ASF CI. From the logs:
>
> `I0922 17:31:49.819221 28463 slave.cpp:4104] Finished recovery”`
> Then ....
> `../../src/tests/executor_http_api_tests.cpp:290: Failure`
> `Failed to wait 15secs for __recover`
>
> Instead of doing a `FUTURE_DISPATCH` after `StartSlave()` we should be doing it before starting the slave. In some cases, slave would have already recovered by the time we invoke `FUTURE_DISPATCH` leading to the flakiness.
>
>
> Diffs
> -----
>
> src/tests/executor_http_api_tests.cpp 9dbc5191b5950df2faa693720f3740e97c7df758
>
> Diff: https://reviews.apache.org/r/38645/diff/
>
>
> Testing
> -------
>
> I was not able to reproduce it before or after this change but looking at the logs it is quite obvious what the issue was. Ran in a loop 100 times.
>
> ASF CI error log: https://builds.apache.org/job/Mesos/COMPILER=gcc,CONFIGURATION=--verbose,OS=ubuntu%3A14.04,label_exp=docker%7C%7CHadoop/839/changes
>
>
> Thanks,
>
> Anand Mazumdar
>
>
Re: Review Request 38645: Fixed Flaky Executor HTTP tests
Posted by Neil Conway <ne...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38645/#review100066
-----------------------------------------------------------
For the sake of repro'ing, maybe you could add a sleep before waiting on the future? Obviously not something we want in the actual patch though.
- Neil Conway
On Sept. 22, 2015, 8:46 p.m., Anand Mazumdar wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/38645/
> -----------------------------------------------------------
>
> (Updated Sept. 22, 2015, 8:46 p.m.)
>
>
> Review request for mesos, Isabel Jimenez and Vinod Kone.
>
>
> Repository: mesos
>
>
> Description
> -------
>
> This showed up on ASF CI. From the logs:
>
> `I0922 17:31:49.819221 28463 slave.cpp:4104] Finished recovery”`
> Then ....
> `../../src/tests/executor_http_api_tests.cpp:290: Failure`
> `Failed to wait 15secs for __recover`
>
> Instead of doing a `FUTURE_DISPATCH` after `StartSlave()` we should be doing it before starting the slave. In some cases, slave would have already recovered by the time we invoke `FUTURE_DISPATCH` leading to the flakiness.
>
>
> Diffs
> -----
>
> src/tests/executor_http_api_tests.cpp 9dbc5191b5950df2faa693720f3740e97c7df758
>
> Diff: https://reviews.apache.org/r/38645/diff/
>
>
> Testing
> -------
>
> I was not able to reproduce it before or after this change but looking at the logs it is quite obvious what the issue was. Ran in a loop 100 times.
>
> ASF CI error log: https://builds.apache.org/job/Mesos/COMPILER=gcc,CONFIGURATION=--verbose,OS=ubuntu%3A14.04,label_exp=docker%7C%7CHadoop/839/changes
>
>
> Thanks,
>
> Anand Mazumdar
>
>
Re: Review Request 38645: Fixed Flaky Executor HTTP tests
Posted by Mesos ReviewBot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38645/#review100103
-----------------------------------------------------------
Patch looks great!
Reviews applied: [38645]
All tests passed.
- Mesos ReviewBot
On Sept. 22, 2015, 9:37 p.m., Anand Mazumdar wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/38645/
> -----------------------------------------------------------
>
> (Updated Sept. 22, 2015, 9:37 p.m.)
>
>
> Review request for mesos, Isabel Jimenez and Vinod Kone.
>
>
> Repository: mesos
>
>
> Description
> -------
>
> This showed up on ASF CI. From the logs:
>
> `I0922 17:31:49.819221 28463 slave.cpp:4104] Finished recovery”`
> Then ....
> `../../src/tests/executor_http_api_tests.cpp:290: Failure`
> `Failed to wait 15secs for __recover`
>
> Instead of doing a `FUTURE_DISPATCH` after `StartSlave()` we should be doing it before starting the slave. In some cases, slave would have already recovered by the time we invoke `FUTURE_DISPATCH` leading to the flakiness.
>
>
> Diffs
> -----
>
> src/tests/executor_http_api_tests.cpp 9dbc5191b5950df2faa693720f3740e97c7df758
>
> Diff: https://reviews.apache.org/r/38645/diff/
>
>
> Testing
> -------
>
> Introduced a sleep before `AWAIT_READY`, the test failed. After this change with the sleep it still passed.
>
> Ran in a loop 100 times.
>
> ASF CI error log: https://builds.apache.org/job/Mesos/COMPILER=gcc,CONFIGURATION=--verbose,OS=ubuntu%3A14.04,label_exp=docker%7C%7CHadoop/839/changes
>
>
> Thanks,
>
> Anand Mazumdar
>
>
Re: Review Request 38645: Fixed Flaky Executor HTTP tests
Posted by Neil Conway <ne...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38645/#review100072
-----------------------------------------------------------
Ship it!
Ship It!
- Neil Conway
On Sept. 22, 2015, 9:37 p.m., Anand Mazumdar wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/38645/
> -----------------------------------------------------------
>
> (Updated Sept. 22, 2015, 9:37 p.m.)
>
>
> Review request for mesos, Isabel Jimenez and Vinod Kone.
>
>
> Repository: mesos
>
>
> Description
> -------
>
> This showed up on ASF CI. From the logs:
>
> `I0922 17:31:49.819221 28463 slave.cpp:4104] Finished recovery”`
> Then ....
> `../../src/tests/executor_http_api_tests.cpp:290: Failure`
> `Failed to wait 15secs for __recover`
>
> Instead of doing a `FUTURE_DISPATCH` after `StartSlave()` we should be doing it before starting the slave. In some cases, slave would have already recovered by the time we invoke `FUTURE_DISPATCH` leading to the flakiness.
>
>
> Diffs
> -----
>
> src/tests/executor_http_api_tests.cpp 9dbc5191b5950df2faa693720f3740e97c7df758
>
> Diff: https://reviews.apache.org/r/38645/diff/
>
>
> Testing
> -------
>
> Introduced a sleep before `AWAIT_READY`, the test failed. After this change with the sleep it still passed.
>
> Ran in a loop 100 times.
>
> ASF CI error log: https://builds.apache.org/job/Mesos/COMPILER=gcc,CONFIGURATION=--verbose,OS=ubuntu%3A14.04,label_exp=docker%7C%7CHadoop/839/changes
>
>
> Thanks,
>
> Anand Mazumdar
>
>
Re: Review Request 38645: Fixed Flaky Executor HTTP tests
Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38645/#review100477
-----------------------------------------------------------
src/tests/executor_http_api_tests.cpp (line 95)
<https://reviews.apache.org/r/38645/#comment157683>
You should do a Clock::Settle() here (and for that to work pause the clock before) because AFAICT AWAIT_READY(__recover) doesn't guarantee that Slave::__recover() has been executed. It only tells us that the event is about to be processed. See process::resume().
- Vinod Kone
On Sept. 22, 2015, 9:37 p.m., Anand Mazumdar wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/38645/
> -----------------------------------------------------------
>
> (Updated Sept. 22, 2015, 9:37 p.m.)
>
>
> Review request for mesos, Isabel Jimenez and Vinod Kone.
>
>
> Repository: mesos
>
>
> Description
> -------
>
> This showed up on ASF CI. From the logs:
>
> `I0922 17:31:49.819221 28463 slave.cpp:4104] Finished recovery”`
> Then ....
> `../../src/tests/executor_http_api_tests.cpp:290: Failure`
> `Failed to wait 15secs for __recover`
>
> Instead of doing a `FUTURE_DISPATCH` after `StartSlave()` we should be doing it before starting the slave. In some cases, slave would have already recovered by the time we invoke `FUTURE_DISPATCH` leading to the flakiness.
>
>
> Diffs
> -----
>
> src/tests/executor_http_api_tests.cpp 9dbc5191b5950df2faa693720f3740e97c7df758
>
> Diff: https://reviews.apache.org/r/38645/diff/
>
>
> Testing
> -------
>
> Introduced a sleep before `AWAIT_READY`, the test failed. After this change with the sleep it still passed.
>
> Ran in a loop 100 times.
>
> ASF CI error log: https://builds.apache.org/job/Mesos/COMPILER=gcc,CONFIGURATION=--verbose,OS=ubuntu%3A14.04,label_exp=docker%7C%7CHadoop/839/changes
>
>
> Thanks,
>
> Anand Mazumdar
>
>
Re: Review Request 38645: Fixed Flaky Executor HTTP tests
Posted by Anand Mazumdar <ma...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38645/
-----------------------------------------------------------
(Updated Sept. 24, 2015, 11:48 p.m.)
Review request for mesos, Isabel Jimenez and Vinod Kone.
Changes
-------
rebased
Repository: mesos
Description
-------
This showed up on ASF CI. From the logs:
`I0922 17:31:49.819221 28463 slave.cpp:4104] Finished recovery”`
Then ....
`../../src/tests/executor_http_api_tests.cpp:290: Failure`
`Failed to wait 15secs for __recover`
Instead of doing a `FUTURE_DISPATCH` after `StartSlave()` we should be doing it before starting the slave. In some cases, slave would have already recovered by the time we invoke `FUTURE_DISPATCH` leading to the flakiness.
Diffs (updated)
-----
src/tests/executor_http_api_tests.cpp 9dbc5191b5950df2faa693720f3740e97c7df758
Diff: https://reviews.apache.org/r/38645/diff/
Testing
-------
Introduced a sleep before `AWAIT_READY`, the test failed. After this change with the sleep it still passed.
Ran in a loop 100 times.
ASF CI error log: https://builds.apache.org/job/Mesos/COMPILER=gcc,CONFIGURATION=--verbose,OS=ubuntu%3A14.04,label_exp=docker%7C%7CHadoop/839/changes
Thanks,
Anand Mazumdar
Re: Review Request 38645: Fixed Flaky Executor HTTP tests
Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38645/#review100499
-----------------------------------------------------------
Ship it!
Ship It!
- Vinod Kone
On Sept. 24, 2015, 11:24 p.m., Anand Mazumdar wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/38645/
> -----------------------------------------------------------
>
> (Updated Sept. 24, 2015, 11:24 p.m.)
>
>
> Review request for mesos, Isabel Jimenez and Vinod Kone.
>
>
> Repository: mesos
>
>
> Description
> -------
>
> This showed up on ASF CI. From the logs:
>
> `I0922 17:31:49.819221 28463 slave.cpp:4104] Finished recovery”`
> Then ....
> `../../src/tests/executor_http_api_tests.cpp:290: Failure`
> `Failed to wait 15secs for __recover`
>
> Instead of doing a `FUTURE_DISPATCH` after `StartSlave()` we should be doing it before starting the slave. In some cases, slave would have already recovered by the time we invoke `FUTURE_DISPATCH` leading to the flakiness.
>
>
> Diffs
> -----
>
> src/tests/executor_http_api_tests.cpp 9dbc5191b5950df2faa693720f3740e97c7df758
>
> Diff: https://reviews.apache.org/r/38645/diff/
>
>
> Testing
> -------
>
> Introduced a sleep before `AWAIT_READY`, the test failed. After this change with the sleep it still passed.
>
> Ran in a loop 100 times.
>
> ASF CI error log: https://builds.apache.org/job/Mesos/COMPILER=gcc,CONFIGURATION=--verbose,OS=ubuntu%3A14.04,label_exp=docker%7C%7CHadoop/839/changes
>
>
> Thanks,
>
> Anand Mazumdar
>
>
Re: Review Request 38645: Fixed Flaky Executor HTTP tests
Posted by Anand Mazumdar <ma...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38645/
-----------------------------------------------------------
(Updated Sept. 24, 2015, 11:24 p.m.)
Review request for mesos, Isabel Jimenez and Vinod Kone.
Changes
-------
Review comments to bring pause/settle calls one after the other.
Repository: mesos
Description
-------
This showed up on ASF CI. From the logs:
`I0922 17:31:49.819221 28463 slave.cpp:4104] Finished recovery”`
Then ....
`../../src/tests/executor_http_api_tests.cpp:290: Failure`
`Failed to wait 15secs for __recover`
Instead of doing a `FUTURE_DISPATCH` after `StartSlave()` we should be doing it before starting the slave. In some cases, slave would have already recovered by the time we invoke `FUTURE_DISPATCH` leading to the flakiness.
Diffs (updated)
-----
src/tests/executor_http_api_tests.cpp 9dbc5191b5950df2faa693720f3740e97c7df758
Diff: https://reviews.apache.org/r/38645/diff/
Testing
-------
Introduced a sleep before `AWAIT_READY`, the test failed. After this change with the sleep it still passed.
Ran in a loop 100 times.
ASF CI error log: https://builds.apache.org/job/Mesos/COMPILER=gcc,CONFIGURATION=--verbose,OS=ubuntu%3A14.04,label_exp=docker%7C%7CHadoop/839/changes
Thanks,
Anand Mazumdar
Re: Review Request 38645: Fixed Flaky Executor HTTP tests
Posted by Anand Mazumdar <ma...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38645/
-----------------------------------------------------------
(Updated Sept. 24, 2015, 11:09 p.m.)
Review request for mesos, Isabel Jimenez and Vinod Kone.
Changes
-------
Address comments from Vinod
Repository: mesos
Description
-------
This showed up on ASF CI. From the logs:
`I0922 17:31:49.819221 28463 slave.cpp:4104] Finished recovery”`
Then ....
`../../src/tests/executor_http_api_tests.cpp:290: Failure`
`Failed to wait 15secs for __recover`
Instead of doing a `FUTURE_DISPATCH` after `StartSlave()` we should be doing it before starting the slave. In some cases, slave would have already recovered by the time we invoke `FUTURE_DISPATCH` leading to the flakiness.
Diffs (updated)
-----
src/tests/executor_http_api_tests.cpp 9dbc5191b5950df2faa693720f3740e97c7df758
Diff: https://reviews.apache.org/r/38645/diff/
Testing
-------
Introduced a sleep before `AWAIT_READY`, the test failed. After this change with the sleep it still passed.
Ran in a loop 100 times.
ASF CI error log: https://builds.apache.org/job/Mesos/COMPILER=gcc,CONFIGURATION=--verbose,OS=ubuntu%3A14.04,label_exp=docker%7C%7CHadoop/839/changes
Thanks,
Anand Mazumdar
Re: Review Request 38645: Fixed Flaky Executor HTTP tests
Posted by Anand Mazumdar <ma...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38645/
-----------------------------------------------------------
(Updated Sept. 22, 2015, 9:37 p.m.)
Review request for mesos, Isabel Jimenez and Vinod Kone.
Changes
-------
update testing done
Repository: mesos
Description
-------
This showed up on ASF CI. From the logs:
`I0922 17:31:49.819221 28463 slave.cpp:4104] Finished recovery”`
Then ....
`../../src/tests/executor_http_api_tests.cpp:290: Failure`
`Failed to wait 15secs for __recover`
Instead of doing a `FUTURE_DISPATCH` after `StartSlave()` we should be doing it before starting the slave. In some cases, slave would have already recovered by the time we invoke `FUTURE_DISPATCH` leading to the flakiness.
Diffs
-----
src/tests/executor_http_api_tests.cpp 9dbc5191b5950df2faa693720f3740e97c7df758
Diff: https://reviews.apache.org/r/38645/diff/
Testing (updated)
-------
Introduced a sleep before `AWAIT_READY`, the test failed. After this change with the sleep it still passed.
Ran in a loop 100 times.
ASF CI error log: https://builds.apache.org/job/Mesos/COMPILER=gcc,CONFIGURATION=--verbose,OS=ubuntu%3A14.04,label_exp=docker%7C%7CHadoop/839/changes
Thanks,
Anand Mazumdar
Re: Review Request 38645: Fixed Flaky Executor HTTP tests
Posted by Anand Mazumdar <ma...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38645/
-----------------------------------------------------------
(Updated Sept. 22, 2015, 8:46 p.m.)
Review request for mesos, Isabel Jimenez and Vinod Kone.
Repository: mesos
Description (updated)
-------
This showed up on ASF CI. From the logs:
`I0922 17:31:49.819221 28463 slave.cpp:4104] Finished recovery”`
Then ....
`../../src/tests/executor_http_api_tests.cpp:290: Failure`
`Failed to wait 15secs for __recover`
Instead of doing a `FUTURE_DISPATCH` after `StartSlave()` we should be doing it before starting the slave. In some cases, slave would have already recovered by the time we invoke `FUTURE_DISPATCH` leading to the flakiness.
Diffs
-----
src/tests/executor_http_api_tests.cpp 9dbc5191b5950df2faa693720f3740e97c7df758
Diff: https://reviews.apache.org/r/38645/diff/
Testing (updated)
-------
I was not able to reproduce it before or after this change but looking at the logs it is quite obvious what the issue was. Ran in a loop 100 times.
ASF CI error log: https://builds.apache.org/job/Mesos/COMPILER=gcc,CONFIGURATION=--verbose,OS=ubuntu%3A14.04,label_exp=docker%7C%7CHadoop/839/changes
Thanks,
Anand Mazumdar