You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Gaston Kleiman <ga...@mesosphere.io> on 2018/02/07 20:05:25 UTC

Review Request 65552: Added a regression test for MESOS-8468.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65552/
-----------------------------------------------------------

Review request for mesos, Anand Mazumdar, Greg Mann, Qian Zhang, and Vinod Kone.


Bugs: MESOS-8468
    https://issues.apache.org/jira/browse/MESOS-8468


Repository: mesos


Description
-------

Added a regression test for MESOS-8468.


Diffs
-----

  src/tests/default_executor_tests.cpp cc97e0d1fea7f4d0bc544d850593d8d91921b552 


Diff: https://reviews.apache.org/r/65552/diff/1/


Testing
-------

`GLOG_v=1 sudo bin/mesos-tests.sh --gtest_filter='*ROOT_LaunchGroupFailure*' --verbose --gtest_repeat=650 --gtest_break_on_failure` on GNU/Linux


Thanks,

Gaston Kleiman


Re: Review Request 65552: Added a regression test for MESOS-8468.

Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65552/#review197034
-----------------------------------------------------------



FAIL: Some of the unit tests failed. Please check the relevant logs.

Reviews applied: `['65548', '65549', '65550', '65551', '65552']`

Failed command: `Start-MesosCITesting`

All the build artifacts available at: http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65552

Relevant logs:

- [mesos-tests-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65552/logs/mesos-tests-stdout.log):

```
[       OK ] Endpoint/SlaveEndpointTest.NoAuthorizer/2 (102 ms)
[----------] 9 tests from Endpoint/SlaveEndpointTest (1004 ms total)

[----------] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest
[ RUN      ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0
[       OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0 (32 ms)
[ RUN      ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1
[       OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1 (37 ms)
[----------] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest (70 ms total)

[----------] 1 test from IsolationFlag/CpuIsolatorTest
[ RUN      ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0
[       OK ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0 (2284 ms)
[----------] 1 test from IsolationFlag/CpuIsolatorTest (2307 ms total)

[----------] 1 test from IsolationFlag/MemoryIsolatorTest
[ RUN      ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0
[       OK ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0 (2250 ms)
[----------] 1 test from IsolationFlag/MemoryIsolatorTest (2274 ms total)

[----------] Global test environment tear-down
[==========] 852 tests from 85 test cases ran. (309053 ms total)
[  PASSED  ] 851 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] MesosContainerizer/DefaultExecutorTest.ROOT_LaunchGroupFailure/0, where GetParam() = "mesos"

 1 FAILED TEST
  YOU HAVE 213 DISABLED TESTS

```

- [mesos-tests-stderr.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65552/logs/mesos-tests-stderr.log):

```
I0207 20:57:04.589033  5876 executor.cpp:171] Received SUBSCRIBED event
I0207 20:57:04.593070  5876 executor.cpp:175] Subscribed executor on build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net
I0207 20:57:04.594066  5876 executor.cpp:171] Received LAUNCH event
I0207 20:57:04.598068  5876 executor.cpp:638] Starting task c6317f1b-f8c3-4e2e-b15e-923ea21bc48f
I0207 20:57:04.670063  5876 executor.cpp:478] Running 'D:\DCOS\mesos\src\mesos-containerizer.exe launch <POSSIBLY-SENSITIVE-DATA>'
I0207 20:57:05.175050  5876 executor.cpp:651] Forked command at 2288
I0207 20:57:05.204072  6708 exec.cpp:445] Executor asked to shutdown
I0207 20:57:05.205040  5876 executor.cpp:171] Received SHUTDOWN event
I0207 20:57:05.205040  5876 executor.cpp:748] Shutting down
I0207 20:57:05.205040  5876 executor.cpp:863] Sending SIGTERM to process tree at pid 247  2068 master.cpp:3239] Deactivating framework b9f2a006-df38-46e0-b8be-85efde5447c3-0000 (default) at scheduler-bf610ad9-c067-4614-b82b-f632e1568bc6@10.3.1.5:59751
I0207 20:57:05.202073  3272 hierarchical.cpp:405] Deactivated framework b9f2a006-df38-46e0-b8be-85efde5447c3-0000
I0207 20:57:05.202073  2068 master.cpp:10204] Updating the state of task c6317f1b-f8c3-4e2e-b15e-923ea21bc48f of framework b9f2a006-df38-46e0-b8be-85efde5447c3-0000 (latest state: TASK_KILLED, status update state: TASK_KILLED)
I0207 20:57:05.202073  7332 slave.cpp:3479] Shutting down framework b9f2a006-df38-46e0-b8be-85efde5447c3-0000
I0207 20:57:05.202073  7332 slave.cpp:6178] Shutting down executor 'c6317f1b-f8c3-4e2e-b15e-923ea21bc48f' of framework b9f2a006-df38-46e0-b8be-85efde5447c3-0000 at executor(1)@10.3.1.5:59772
I0207 20:57:05.203073  7332 slave.cpp:931] Agent terminating
W0207 20:57:05.204072  7332 slave.cpp:3475] Ignoring shutdown framework b9f2a006-df38-46e0-b8be-85efde5447c3-0000 because it is terminating
I0207 20:57:05.205040  2068 master.cpp:10303] Removing task c6317f1b-f8c3-4e2e-b15e-923ea21bc48f with resources cpus(allocated: *):4; mem(allocated: *):2048; disk(allocated: *):1024; ports(allocated: *):[31000-32000] of framework b9f2a006-df38-46e0-b8be-85efde5447c3-0000 on agent b9f2a006-df38-46e0-b8be-85efde5447c3-S0 at slave(330)@10.3.1.5:59751 (build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I0207 20:57:05.206046  8624 containerizer.cpp:2338] Destroying container 99c11c94-aea8-4a79-959b-c6b208d02c59 in RUNNING state
I0207 20:57:05.206046  8624 containerizer.cpp:2952] Transitioning the state of container 99c11c94-aea8-4a79-959b-c6b208d02c59 from RUNNING to DESTROYING
I0207 20:57:05.207041  8624 launcher.cpp:156] Asked to destroy container 99c11c94-aea8-4a79-959b-c6b208d02c59
I0207 20:57:05.208039  2068 master.cpp:1307] Agent b9f2a006-df38-46e0-b8be-85efde5447c3-S0 at slave(330)@10.3.1.5:59751 (build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net) disconnected
I0207 20:57:05.208039  2068 master.cpp:3276] Disconnecting agent b9f2a006-df38-46e0-b8be-85efde5447c3-S0 at slave(330)@10.3.1.5:59751 (build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I0207 20:57:05.208039  3272 hierarchical.cpp:344] Removed framework b9f2a006-df38-46e0-b8be-85efde5447c3-0000
I0207 20:57:05.208039  2068 master.cpp:3295] Deactivating agent b9f2a006-df38-46e0-b8be-85efde5447c3-S0 at slave(330)@10.3.1.5:59751 (build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I0207 20:57:05.209039  4508 hierarchical.cpp:766] Agent b9f2a006-df38-46e0-b8be-85efde5447c3-S0 deactivated
I0207 20:57:05.237257 11016 containerizer.cpp:2791] Container 99c11c94-aea8-4a79-959b-c6b208d02c59 has exited
I0207 20:57:05.267297  6468 master.cpp:1149] Master terminating
I0207 20:57:05.269306  3272 hierarchical.cpp:609] Removed agent b9f2a006-df38-46e0-b8be-85efde5447c3-S0
I0207 20:57:05.762310 10416 process.cpp:929] Stopped the socket accept loop
```

- Mesos Reviewbot Windows


On Feb. 7, 2018, 8:05 p.m., Gaston Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65552/
> -----------------------------------------------------------
> 
> (Updated Feb. 7, 2018, 8:05 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Greg Mann, Qian Zhang, and Vinod Kone.
> 
> 
> Bugs: MESOS-8468
>     https://issues.apache.org/jira/browse/MESOS-8468
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added a regression test for MESOS-8468.
> 
> 
> Diffs
> -----
> 
>   src/tests/default_executor_tests.cpp cc97e0d1fea7f4d0bc544d850593d8d91921b552 
> 
> 
> Diff: https://reviews.apache.org/r/65552/diff/1/
> 
> 
> Testing
> -------
> 
> `GLOG_v=1 sudo bin/mesos-tests.sh --gtest_filter='*ROOT_LaunchGroupFailure*' --verbose --gtest_repeat=650 --gtest_break_on_failure` on GNU/Linux
> 
> 
> Thanks,
> 
> Gaston Kleiman
> 
>


Re: Review Request 65552: Added a regression test for MESOS-8468.

Posted by Qian Zhang <zh...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65552/#review197509
-----------------------------------------------------------


Ship it!




Ship It!

- Qian Zhang


On Feb. 13, 2018, 7:24 a.m., Gaston Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65552/
> -----------------------------------------------------------
> 
> (Updated Feb. 13, 2018, 7:24 a.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Greg Mann, Qian Zhang, and Vinod Kone.
> 
> 
> Bugs: MESOS-8468
>     https://issues.apache.org/jira/browse/MESOS-8468
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added a regression test for MESOS-8468.
> 
> 
> Diffs
> -----
> 
>   src/tests/default_executor_tests.cpp cc97e0d1fea7f4d0bc544d850593d8d91921b552 
> 
> 
> Diff: https://reviews.apache.org/r/65552/diff/4/
> 
> 
> Testing
> -------
> 
> `GLOG_v=1 sudo bin/mesos-tests.sh --gtest_filter='*ROOT_LaunchGroupFailure*' --verbose --gtest_repeat=650 --gtest_break_on_failure` on GNU/Linux
> 
> 
> Thanks,
> 
> Gaston Kleiman
> 
>


Re: Review Request 65552: Added a regression test for MESOS-8468.

Posted by Gaston Kleiman <ga...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65552/
-----------------------------------------------------------

(Updated Feb. 12, 2018, 3:24 p.m.)


Review request for mesos, Anand Mazumdar, Greg Mann, Qian Zhang, and Vinod Kone.


Changes
-------

Swapped tasks in task groups in order to prevent a potential race.


Bugs: MESOS-8468
    https://issues.apache.org/jira/browse/MESOS-8468


Repository: mesos


Description
-------

Added a regression test for MESOS-8468.


Diffs (updated)
-----

  src/tests/default_executor_tests.cpp cc97e0d1fea7f4d0bc544d850593d8d91921b552 


Diff: https://reviews.apache.org/r/65552/diff/3/

Changes: https://reviews.apache.org/r/65552/diff/2-3/


Testing
-------

`GLOG_v=1 sudo bin/mesos-tests.sh --gtest_filter='*ROOT_LaunchGroupFailure*' --verbose --gtest_repeat=650 --gtest_break_on_failure` on GNU/Linux


Thanks,

Gaston Kleiman


Re: Review Request 65552: Added a regression test for MESOS-8468.

Posted by Gaston Kleiman <ga...@mesosphere.io>.

> On Feb. 12, 2018, 12:35 p.m., Joseph Wu wrote:
> > src/tests/default_executor_tests.cpp
> > Lines 3450-3461 (patched)
> > <https://reviews.apache.org/r/65552/diff/2/?file=1954220#file1954220line3450>
> >
> >     Is it possible for the following race to occur?
> >     
> >     * Executor launches task group 1 (expected to fail/kill)
> >     * Executor performs the launch/kill.
> >     * Executor commits suicide because it is no longer running any tasks.
> >     * The agent sends the second task group to the now-dead executor.

Yeah, that sounds possible, I changed the test so that it now does the following:

1. Executor launches `taskGroup1` with a task that sleeps for a very long time and isn't expected to stop until killed.
2. Executor launches `taskGroup2` with a sleep task and one that should fail to launch.
3. Executor should kill the sleep task in `taskGroup2`.
4. Executor report all tasks in `taskGroup2` as killed/failed.
4. Scheduler will ask to kill the sole task in `taskGroup1`.
5. Executor should kill the task in `taskGroup1` and terminate.


- Gaston


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65552/#review197312
-----------------------------------------------------------


On Feb. 12, 2018, 3:24 p.m., Gaston Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65552/
> -----------------------------------------------------------
> 
> (Updated Feb. 12, 2018, 3:24 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Greg Mann, Qian Zhang, and Vinod Kone.
> 
> 
> Bugs: MESOS-8468
>     https://issues.apache.org/jira/browse/MESOS-8468
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added a regression test for MESOS-8468.
> 
> 
> Diffs
> -----
> 
>   src/tests/default_executor_tests.cpp cc97e0d1fea7f4d0bc544d850593d8d91921b552 
> 
> 
> Diff: https://reviews.apache.org/r/65552/diff/3/
> 
> 
> Testing
> -------
> 
> `GLOG_v=1 sudo bin/mesos-tests.sh --gtest_filter='*ROOT_LaunchGroupFailure*' --verbose --gtest_repeat=650 --gtest_break_on_failure` on GNU/Linux
> 
> 
> Thanks,
> 
> Gaston Kleiman
> 
>


Re: Review Request 65552: Added a regression test for MESOS-8468.

Posted by Joseph Wu <jo...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65552/#review197312
-----------------------------------------------------------




src/tests/default_executor_tests.cpp
Lines 3276 (patched)
<https://reviews.apache.org/r/65552/#comment277452>

    s/shoud/should/



src/tests/default_executor_tests.cpp
Lines 3450-3461 (patched)
<https://reviews.apache.org/r/65552/#comment277454>

    Is it possible for the following race to occur?
    
    * Executor launches task group 1 (expected to fail/kill)
    * Executor performs the launch/kill.
    * Executor commits suicide because it is no longer running any tasks.
    * The agent sends the second task group to the now-dead executor.


- Joseph Wu


On Feb. 7, 2018, 12:05 p.m., Gaston Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65552/
> -----------------------------------------------------------
> 
> (Updated Feb. 7, 2018, 12:05 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Greg Mann, Qian Zhang, and Vinod Kone.
> 
> 
> Bugs: MESOS-8468
>     https://issues.apache.org/jira/browse/MESOS-8468
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added a regression test for MESOS-8468.
> 
> 
> Diffs
> -----
> 
>   src/tests/default_executor_tests.cpp cc97e0d1fea7f4d0bc544d850593d8d91921b552 
> 
> 
> Diff: https://reviews.apache.org/r/65552/diff/2/
> 
> 
> Testing
> -------
> 
> `GLOG_v=1 sudo bin/mesos-tests.sh --gtest_filter='*ROOT_LaunchGroupFailure*' --verbose --gtest_repeat=650 --gtest_break_on_failure` on GNU/Linux
> 
> 
> Thanks,
> 
> Gaston Kleiman
> 
>