You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Chun-Hung Hsiao <ch...@mesosphere.io> on 2018/03/01 05:18:00 UTC

Review Request 65856: Added `--fetcher_stall_timeout` to abort stalled artifact fetching.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65856/
-----------------------------------------------------------

Review request for mesos and Gilbert Song.


Bugs: MESOS-8620
    https://issues.apache.org/jira/browse/MESOS-8620


Repository: mesos


Description
-------

This flag specifies a timeout for `mesos-fetcher` to wait before
aborting if the download speed keeps below 1 bytes/sec. This would avoid
containers to get stuck at FETCHING. The default value is 1 minute.


Diffs
-----

  include/mesos/fetcher/fetcher.proto 6a5d807221681853444ffd3ab42a23827604e550 
  src/launcher/fetcher.cpp 2f42fa221a42fdc131a8a97ded4e9433ce51ea77 
  src/slave/constants.hpp 030fb05186f7f360010bb7e5b4948faac69771cc 
  src/slave/containerizer/fetcher.cpp a49411b7bac2d5a50a75d0b802842c2f61fe58c6 
  src/slave/flags.hpp 0c67bf214ceb93ae7ff088bec2648fa26ddac59e 
  src/slave/flags.cpp 943aaaf58b5f36555f0902019b8c5c6522ab7afc 


Diff: https://reviews.apache.org/r/65856/diff/1/


Testing
-------

sudo make check

Manually tested with Nginx servers that sleeps for 59 seconds and 1 mintue before serving artifacts.


Thanks,

Chun-Hung Hsiao


Re: Review Request 65856: Added `--fetcher_stall_timeout` to abort stalled artifact fetching.

Posted by James Peach <jp...@apache.org>.

> On March 21, 2018, 11:06 p.m., Gilbert Song wrote:
> > src/slave/flags.cpp
> > Lines 251-257 (patched)
> > <https://reviews.apache.org/r/65856/diff/1/?file=1968115#file1968115line251>
> >
> >     Should we update `configuration/agent.md`?

In those docs we should make it clear what kinds of fetches this applies to, e.g it won't apply to HDFS.


- James


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65856/#review199719
-----------------------------------------------------------


On March 1, 2018, 5:17 a.m., Chun-Hung Hsiao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65856/
> -----------------------------------------------------------
> 
> (Updated March 1, 2018, 5:17 a.m.)
> 
> 
> Review request for mesos and Gilbert Song.
> 
> 
> Bugs: MESOS-8620
>     https://issues.apache.org/jira/browse/MESOS-8620
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This flag specifies a timeout for `mesos-fetcher` to wait before
> aborting if the download speed keeps below 1 bytes/sec. This would avoid
> containers to get stuck at FETCHING. The default value is 1 minute.
> 
> 
> Diffs
> -----
> 
>   include/mesos/fetcher/fetcher.proto 6a5d807221681853444ffd3ab42a23827604e550 
>   src/launcher/fetcher.cpp 2f42fa221a42fdc131a8a97ded4e9433ce51ea77 
>   src/slave/constants.hpp 030fb05186f7f360010bb7e5b4948faac69771cc 
>   src/slave/containerizer/fetcher.cpp a49411b7bac2d5a50a75d0b802842c2f61fe58c6 
>   src/slave/flags.hpp 0c67bf214ceb93ae7ff088bec2648fa26ddac59e 
>   src/slave/flags.cpp 943aaaf58b5f36555f0902019b8c5c6522ab7afc 
> 
> 
> Diff: https://reviews.apache.org/r/65856/diff/1/
> 
> 
> Testing
> -------
> 
> sudo make check
> 
> Manually tested with Nginx servers with the following cases:
> 1. Sleeps for 59 seconds before serving artifacts (successful)
> 2. Sleeps for 1 mintue before serving artifacts (failed)
> 3. Sleeps for 55 seconds and then serve a 640B artifact with 12 bytes/second (successful)
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>


Re: Review Request 65856: Added `--fetcher_stall_timeout` to abort stalled artifact fetching.

Posted by Gilbert Song <so...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65856/#review199719
-----------------------------------------------------------


Fix it, then Ship it!





src/slave/flags.cpp
Lines 251-257 (patched)
<https://reviews.apache.org/r/65856/#comment280135>

    Should we update `configuration/agent.md`?


- Gilbert Song


On Feb. 28, 2018, 9:17 p.m., Chun-Hung Hsiao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65856/
> -----------------------------------------------------------
> 
> (Updated Feb. 28, 2018, 9:17 p.m.)
> 
> 
> Review request for mesos and Gilbert Song.
> 
> 
> Bugs: MESOS-8620
>     https://issues.apache.org/jira/browse/MESOS-8620
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This flag specifies a timeout for `mesos-fetcher` to wait before
> aborting if the download speed keeps below 1 bytes/sec. This would avoid
> containers to get stuck at FETCHING. The default value is 1 minute.
> 
> 
> Diffs
> -----
> 
>   include/mesos/fetcher/fetcher.proto 6a5d807221681853444ffd3ab42a23827604e550 
>   src/launcher/fetcher.cpp 2f42fa221a42fdc131a8a97ded4e9433ce51ea77 
>   src/slave/constants.hpp 030fb05186f7f360010bb7e5b4948faac69771cc 
>   src/slave/containerizer/fetcher.cpp a49411b7bac2d5a50a75d0b802842c2f61fe58c6 
>   src/slave/flags.hpp 0c67bf214ceb93ae7ff088bec2648fa26ddac59e 
>   src/slave/flags.cpp 943aaaf58b5f36555f0902019b8c5c6522ab7afc 
> 
> 
> Diff: https://reviews.apache.org/r/65856/diff/1/
> 
> 
> Testing
> -------
> 
> sudo make check
> 
> Manually tested with Nginx servers with the following cases:
> 1. Sleeps for 59 seconds before serving artifacts (successful)
> 2. Sleeps for 1 mintue before serving artifacts (failed)
> 3. Sleeps for 55 seconds and then serve a 640B artifact with 12 bytes/second (successful)
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>


Re: Review Request 65856: Added `--fetcher_stall_timeout` to abort stalled artifact fetching.

Posted by Mesos Reviewbot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65856/#review198441
-----------------------------------------------------------



Patch looks great!

Reviews applied: [65855, 65856]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' CONFIGURATION='--verbose --disable-libtool-wrappers' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On Feb. 28, 2018, 9:17 p.m., Chun-Hung Hsiao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65856/
> -----------------------------------------------------------
> 
> (Updated Feb. 28, 2018, 9:17 p.m.)
> 
> 
> Review request for mesos and Gilbert Song.
> 
> 
> Bugs: MESOS-8620
>     https://issues.apache.org/jira/browse/MESOS-8620
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This flag specifies a timeout for `mesos-fetcher` to wait before
> aborting if the download speed keeps below 1 bytes/sec. This would avoid
> containers to get stuck at FETCHING. The default value is 1 minute.
> 
> 
> Diffs
> -----
> 
>   include/mesos/fetcher/fetcher.proto 6a5d807221681853444ffd3ab42a23827604e550 
>   src/launcher/fetcher.cpp 2f42fa221a42fdc131a8a97ded4e9433ce51ea77 
>   src/slave/constants.hpp 030fb05186f7f360010bb7e5b4948faac69771cc 
>   src/slave/containerizer/fetcher.cpp a49411b7bac2d5a50a75d0b802842c2f61fe58c6 
>   src/slave/flags.hpp 0c67bf214ceb93ae7ff088bec2648fa26ddac59e 
>   src/slave/flags.cpp 943aaaf58b5f36555f0902019b8c5c6522ab7afc 
> 
> 
> Diff: https://reviews.apache.org/r/65856/diff/1/
> 
> 
> Testing
> -------
> 
> sudo make check
> 
> Manually tested with Nginx servers that sleeps for 59 seconds and 1 mintue before serving artifacts.
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>


Re: Review Request 65856: Added `--fetcher_stall_timeout` to abort stalled artifact fetching.

Posted by Gilbert Song <so...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65856/#review200171
-----------------------------------------------------------


Ship it!




Ship It!

- Gilbert Song


On March 28, 2018, 4:51 p.m., Chun-Hung Hsiao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65856/
> -----------------------------------------------------------
> 
> (Updated March 28, 2018, 4:51 p.m.)
> 
> 
> Review request for mesos and Gilbert Song.
> 
> 
> Bugs: MESOS-8620
>     https://issues.apache.org/jira/browse/MESOS-8620
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This flag specifies a timeout for `mesos-fetcher` to wait before
> aborting if the download speed keeps below 1 bytes/sec. This would avoid
> containers to get stuck at FETCHING.
> 
> 
> Diffs
> -----
> 
>   docs/configuration/agent.md 13e4c551b8b0ba47190b4016220e48c3a4c391fb 
>   include/mesos/fetcher/fetcher.proto 6a5d807221681853444ffd3ab42a23827604e550 
>   src/launcher/fetcher.cpp 2f42fa221a42fdc131a8a97ded4e9433ce51ea77 
>   src/slave/constants.hpp f1fc2bfcb9e093ab39a550d8fc7daa8fadee6f64 
>   src/slave/containerizer/fetcher.cpp f9ab55404801e27900dc82316c1ca595fd65b942 
>   src/slave/flags.hpp 949a4783caf8aac9a246a98525a5287b0f8256d8 
>   src/slave/flags.cpp 962b07c1d701f4ab819b14730fbc116b981433bb 
> 
> 
> Diff: https://reviews.apache.org/r/65856/diff/2/
> 
> 
> Testing
> -------
> 
> sudo make check
> 
> Manually tested with Nginx servers with the following cases:
> 1. Sleeps for 59 seconds before serving artifacts (successful)
> 2. Sleeps for 1 mintue before serving artifacts (failed)
> 3. Sleeps for 55 seconds and then serve a 640B artifact with 12 bytes/second (successful)
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>


Re: Review Request 65856: Added `--fetcher_stall_timeout` to abort stalled artifact fetching.

Posted by Chun-Hung Hsiao <ch...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65856/
-----------------------------------------------------------

(Updated March 28, 2018, 11:51 p.m.)


Review request for mesos and Gilbert Song.


Changes
-------

Addressed Gilbert's and James' comment.


Bugs: MESOS-8620
    https://issues.apache.org/jira/browse/MESOS-8620


Repository: mesos


Description (updated)
-------

This flag specifies a timeout for `mesos-fetcher` to wait before
aborting if the download speed keeps below 1 bytes/sec. This would avoid
containers to get stuck at FETCHING.


Diffs (updated)
-----

  docs/configuration/agent.md 13e4c551b8b0ba47190b4016220e48c3a4c391fb 
  include/mesos/fetcher/fetcher.proto 6a5d807221681853444ffd3ab42a23827604e550 
  src/launcher/fetcher.cpp 2f42fa221a42fdc131a8a97ded4e9433ce51ea77 
  src/slave/constants.hpp f1fc2bfcb9e093ab39a550d8fc7daa8fadee6f64 
  src/slave/containerizer/fetcher.cpp f9ab55404801e27900dc82316c1ca595fd65b942 
  src/slave/flags.hpp 949a4783caf8aac9a246a98525a5287b0f8256d8 
  src/slave/flags.cpp 962b07c1d701f4ab819b14730fbc116b981433bb 


Diff: https://reviews.apache.org/r/65856/diff/2/

Changes: https://reviews.apache.org/r/65856/diff/1-2/


Testing
-------

sudo make check

Manually tested with Nginx servers with the following cases:
1. Sleeps for 59 seconds before serving artifacts (successful)
2. Sleeps for 1 mintue before serving artifacts (failed)
3. Sleeps for 55 seconds and then serve a 640B artifact with 12 bytes/second (successful)


Thanks,

Chun-Hung Hsiao


Re: Review Request 65856: Added `--fetcher_stall_timeout` to abort stalled artifact fetching.

Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65856/#review198434
-----------------------------------------------------------



FAIL: Some of the unit tests failed. Please check the relevant logs.

Reviews applied: `['65855', '65856']`

Failed command: `Start-MesosCITesting`

All the build artifacts available at: http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65856

Relevant logs:

- [mesos-tests-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65856/logs/mesos-tests-stdout.log):

```
[       OK ] Endpoint/SlaveEndpointTest.NoAuthorizer/2 (206 ms)
[----------] 9 tests from Endpoint/SlaveEndpointTest (1251 ms total)

[----------] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest
[ RUN      ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0
[       OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0 (39 ms)
[ RUN      ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1
[       OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1 (45 ms)
[----------] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest (86 ms total)

[----------] 1 test from IsolationFlag/CpuIsolatorTest
[ RUN      ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0
[       OK ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0 (2506 ms)
[----------] 1 test from IsolationFlag/CpuIsolatorTest (2531 ms total)

[----------] 1 test from IsolationFlag/MemoryIsolatorTest
[ RUN      ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0
[       OK ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0 (2468 ms)
[----------] 1 test from IsolationFlag/MemoryIsolatorTest (2492 ms total)

[----------] Global test environment tear-down
[==========] 915 tests from 90 test cases ran. (477671 ms total)
[  PASSED  ] 914 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] CommandExecutorCheckTest.CommandCheckTimeout

 1 FAILED TEST
  YOU HAVE 211 DISABLED TESTS

```

- [mesos-tests-stderr.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65856/logs/mesos-tests-stderr.log):

```
I0301 06:19:01.867746  1316 slave.cpp:3879] Shutting down framework ed234448-e169-4c04-93bc-5ddd2de8997a-0000
I0301 06:19:01.867746  6516 master.cpp:10258] Updating the state of task 9d020280-c221-4c61-b7ad-1afee8382366 of framework ed234448-e169-4c04-93bc-5ddd2de8997a-0000 (latest state: TASK_KILLED, status update state: TASK_KILLED)
I0301 06:19:01.867746  1316 slave.cpp:6586] Shutting down executor '9d020280-c221-4c61-b7ad-1afee8382366' of framework ed234448-e169-4c04-93bc-5ddd2de8997a-0000 at executor(1)@10.3.1.5:55976
I0301 06:19:01.869748  1316 slave.cpp:922] Agent terminating
W0301 06:19:01.869748  1316 slave.cpp:3875] Ignoring shutdown framework ed234448-e169-4c04-93bc-5ddd2de8997a-0000 because it is terminating
I0301 06:19:01.869748  6516 master.cpp:10357] Removing task 9d020280-c221-4c61-b7ad-1afee8382366 with resources cpus(allocated: *):4; mem(allocated: *):2048; disk(allocated: *):1024; ports(allocated: *):[31000-32000] of framework ed234448-e169-4c04-93bc-5ddd2de8997a-0000 on agent ed234448-e169-4c04-93bc-5ddd2de8997a-S0 at slave(398)@10.3.1.5:55955 (build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I0301 06:19:01.872721  6516 master.cpp:1306] Agent ed234448-e169-4c04-93bc-5ddd2de8997a-S0 at slave(398)@10.3.1.5:55955 (build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net) disconnected
I0301 06:19:01.872721  6516 master.cpp:3276] Disconnecting agent ed234448-e169-4c04-93bc-5ddd2de8997a-S0 aI0301 06:19:01.169744  6488 exec.cpp:162] Version: 1.6.0
I0301 06:19:01.198719  2512 exec.cpp:236] Executor registered on agent ed234448-e169-4c04-93bc-5ddd2de8997a-S0
I0301 06:19:01.202718  1240 executor.cpp:176] Received SUBSCRIBED event
I0301 06:19:01.207742  1240 executor.cpp:180] Subscribed executor on build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net
I0301 06:19:01.208744  1240 executor.cpp:176] Received LAUNCH event
I0301 06:19:01.212718  1240 executor.cpp:648] Starting task 9d020280-c221-4c61-b7ad-1afee8382366
I0301 06:19:01.296744  1240 executor.cpp:483] Running 'D:\DCOS\mesos\src\mesos-containerizer.exe launch <POSSIBLY-SENSITIVE-DATA>'
I0301 06:19:01.835724  1240 executor.cpp:661] Forked command at 3796
I0301 06:19:01.873726  7912 exec.cpp:445] Executor asked to shutdown
I0301 06:19:01.874722  1240 executor.cpp:176] Received SHUTDOWN event
I0301 06:19:01.874722  1240 executor.cpp:758] Shutting down
I0301 06:19:01.874722  1240 executor.cpp:868] Sending SIGTERM to process tree at pid 3t slave(398)@10.3.1.5:55955 (build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I0301 06:19:01.873726  1132 hierarchical.cpp:344] Removed framework ed234448-e169-4c04-93bc-5ddd2de8997a-0000
I0301 06:19:01.872721  6752 containerizer.cpp:2338] Destroying container 5d36381f-2b59-464b-ab48-1139dc873e9d in RUNNING state
I0301 06:19:01.873726  6752 containerizer.cpp:2952] Transitioning the state of container 5d36381f-2b59-464b-ab48-1139dc873e9d from RUNNING to DESTROYING
I0301 06:19:01.873726  6516 master.cpp:3295] Deactivating agent ed234448-e169-4c04-93bc-5ddd2de8997a-S0 at slave(398)@10.3.1.5:55955 (build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I0301 06:19:01.873726  9144 hierarchical.cpp:766] Agent ed234448-e169-4c04-93bc-5ddd2de8997a-S0 deactivated
I0301 06:19:01.874722  6752 launcher.cpp:156] Asked to destroy container 5d36381f-2b59-464b-ab48-1139dc873e9d
I0301 06:19:01.939859  9144 containerizer.cpp:2791] Container 5d36381f-2b59-464b-ab48-1139dc873e9d has exited
I0301 06:19:01.971858  6444 master.cpp:1149] Master terminating
I0301 06:19:01.974859  1316 hierarchical.cpp:609] Removed agent ed234448-e169-4c04-93bc-5ddd2de8997a-S0
I0301 06:19:02.439860  3124 process.cpp:929] Stopped the socket accept loop
```

- Mesos Reviewbot Windows


On March 1, 2018, 5:17 a.m., Chun-Hung Hsiao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65856/
> -----------------------------------------------------------
> 
> (Updated March 1, 2018, 5:17 a.m.)
> 
> 
> Review request for mesos and Gilbert Song.
> 
> 
> Bugs: MESOS-8620
>     https://issues.apache.org/jira/browse/MESOS-8620
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This flag specifies a timeout for `mesos-fetcher` to wait before
> aborting if the download speed keeps below 1 bytes/sec. This would avoid
> containers to get stuck at FETCHING. The default value is 1 minute.
> 
> 
> Diffs
> -----
> 
>   include/mesos/fetcher/fetcher.proto 6a5d807221681853444ffd3ab42a23827604e550 
>   src/launcher/fetcher.cpp 2f42fa221a42fdc131a8a97ded4e9433ce51ea77 
>   src/slave/constants.hpp 030fb05186f7f360010bb7e5b4948faac69771cc 
>   src/slave/containerizer/fetcher.cpp a49411b7bac2d5a50a75d0b802842c2f61fe58c6 
>   src/slave/flags.hpp 0c67bf214ceb93ae7ff088bec2648fa26ddac59e 
>   src/slave/flags.cpp 943aaaf58b5f36555f0902019b8c5c6522ab7afc 
> 
> 
> Diff: https://reviews.apache.org/r/65856/diff/1/
> 
> 
> Testing
> -------
> 
> sudo make check
> 
> Manually tested with Nginx servers that sleeps for 59 seconds and 1 mintue before serving artifacts.
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>