You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Chun-Hung Hsiao <ch...@mesosphere.io> on 2018/03/01 05:18:00 UTC
Review Request 65856: Added `--fetcher_stall_timeout` to abort stalled
artifact fetching.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65856/
-----------------------------------------------------------
Review request for mesos and Gilbert Song.
Bugs: MESOS-8620
https://issues.apache.org/jira/browse/MESOS-8620
Repository: mesos
Description
-------
This flag specifies a timeout for `mesos-fetcher` to wait before
aborting if the download speed keeps below 1 bytes/sec. This would avoid
containers to get stuck at FETCHING. The default value is 1 minute.
Diffs
-----
include/mesos/fetcher/fetcher.proto 6a5d807221681853444ffd3ab42a23827604e550
src/launcher/fetcher.cpp 2f42fa221a42fdc131a8a97ded4e9433ce51ea77
src/slave/constants.hpp 030fb05186f7f360010bb7e5b4948faac69771cc
src/slave/containerizer/fetcher.cpp a49411b7bac2d5a50a75d0b802842c2f61fe58c6
src/slave/flags.hpp 0c67bf214ceb93ae7ff088bec2648fa26ddac59e
src/slave/flags.cpp 943aaaf58b5f36555f0902019b8c5c6522ab7afc
Diff: https://reviews.apache.org/r/65856/diff/1/
Testing
-------
sudo make check
Manually tested with Nginx servers that sleeps for 59 seconds and 1 mintue before serving artifacts.
Thanks,
Chun-Hung Hsiao
Re: Review Request 65856: Added `--fetcher_stall_timeout` to abort
stalled artifact fetching.
Posted by James Peach <jp...@apache.org>.
> On March 21, 2018, 11:06 p.m., Gilbert Song wrote:
> > src/slave/flags.cpp
> > Lines 251-257 (patched)
> > <https://reviews.apache.org/r/65856/diff/1/?file=1968115#file1968115line251>
> >
> > Should we update `configuration/agent.md`?
In those docs we should make it clear what kinds of fetches this applies to, e.g it won't apply to HDFS.
- James
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65856/#review199719
-----------------------------------------------------------
On March 1, 2018, 5:17 a.m., Chun-Hung Hsiao wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65856/
> -----------------------------------------------------------
>
> (Updated March 1, 2018, 5:17 a.m.)
>
>
> Review request for mesos and Gilbert Song.
>
>
> Bugs: MESOS-8620
> https://issues.apache.org/jira/browse/MESOS-8620
>
>
> Repository: mesos
>
>
> Description
> -------
>
> This flag specifies a timeout for `mesos-fetcher` to wait before
> aborting if the download speed keeps below 1 bytes/sec. This would avoid
> containers to get stuck at FETCHING. The default value is 1 minute.
>
>
> Diffs
> -----
>
> include/mesos/fetcher/fetcher.proto 6a5d807221681853444ffd3ab42a23827604e550
> src/launcher/fetcher.cpp 2f42fa221a42fdc131a8a97ded4e9433ce51ea77
> src/slave/constants.hpp 030fb05186f7f360010bb7e5b4948faac69771cc
> src/slave/containerizer/fetcher.cpp a49411b7bac2d5a50a75d0b802842c2f61fe58c6
> src/slave/flags.hpp 0c67bf214ceb93ae7ff088bec2648fa26ddac59e
> src/slave/flags.cpp 943aaaf58b5f36555f0902019b8c5c6522ab7afc
>
>
> Diff: https://reviews.apache.org/r/65856/diff/1/
>
>
> Testing
> -------
>
> sudo make check
>
> Manually tested with Nginx servers with the following cases:
> 1. Sleeps for 59 seconds before serving artifacts (successful)
> 2. Sleeps for 1 mintue before serving artifacts (failed)
> 3. Sleeps for 55 seconds and then serve a 640B artifact with 12 bytes/second (successful)
>
>
> Thanks,
>
> Chun-Hung Hsiao
>
>
Re: Review Request 65856: Added `--fetcher_stall_timeout` to abort
stalled artifact fetching.
Posted by Gilbert Song <so...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65856/#review199719
-----------------------------------------------------------
Fix it, then Ship it!
src/slave/flags.cpp
Lines 251-257 (patched)
<https://reviews.apache.org/r/65856/#comment280135>
Should we update `configuration/agent.md`?
- Gilbert Song
On Feb. 28, 2018, 9:17 p.m., Chun-Hung Hsiao wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65856/
> -----------------------------------------------------------
>
> (Updated Feb. 28, 2018, 9:17 p.m.)
>
>
> Review request for mesos and Gilbert Song.
>
>
> Bugs: MESOS-8620
> https://issues.apache.org/jira/browse/MESOS-8620
>
>
> Repository: mesos
>
>
> Description
> -------
>
> This flag specifies a timeout for `mesos-fetcher` to wait before
> aborting if the download speed keeps below 1 bytes/sec. This would avoid
> containers to get stuck at FETCHING. The default value is 1 minute.
>
>
> Diffs
> -----
>
> include/mesos/fetcher/fetcher.proto 6a5d807221681853444ffd3ab42a23827604e550
> src/launcher/fetcher.cpp 2f42fa221a42fdc131a8a97ded4e9433ce51ea77
> src/slave/constants.hpp 030fb05186f7f360010bb7e5b4948faac69771cc
> src/slave/containerizer/fetcher.cpp a49411b7bac2d5a50a75d0b802842c2f61fe58c6
> src/slave/flags.hpp 0c67bf214ceb93ae7ff088bec2648fa26ddac59e
> src/slave/flags.cpp 943aaaf58b5f36555f0902019b8c5c6522ab7afc
>
>
> Diff: https://reviews.apache.org/r/65856/diff/1/
>
>
> Testing
> -------
>
> sudo make check
>
> Manually tested with Nginx servers with the following cases:
> 1. Sleeps for 59 seconds before serving artifacts (successful)
> 2. Sleeps for 1 mintue before serving artifacts (failed)
> 3. Sleeps for 55 seconds and then serve a 640B artifact with 12 bytes/second (successful)
>
>
> Thanks,
>
> Chun-Hung Hsiao
>
>
Re: Review Request 65856: Added `--fetcher_stall_timeout` to abort
stalled artifact fetching.
Posted by Mesos Reviewbot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65856/#review198441
-----------------------------------------------------------
Patch looks great!
Reviews applied: [65855, 65856]
Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' CONFIGURATION='--verbose --disable-libtool-wrappers' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; ./support/docker-build.sh
- Mesos Reviewbot
On Feb. 28, 2018, 9:17 p.m., Chun-Hung Hsiao wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65856/
> -----------------------------------------------------------
>
> (Updated Feb. 28, 2018, 9:17 p.m.)
>
>
> Review request for mesos and Gilbert Song.
>
>
> Bugs: MESOS-8620
> https://issues.apache.org/jira/browse/MESOS-8620
>
>
> Repository: mesos
>
>
> Description
> -------
>
> This flag specifies a timeout for `mesos-fetcher` to wait before
> aborting if the download speed keeps below 1 bytes/sec. This would avoid
> containers to get stuck at FETCHING. The default value is 1 minute.
>
>
> Diffs
> -----
>
> include/mesos/fetcher/fetcher.proto 6a5d807221681853444ffd3ab42a23827604e550
> src/launcher/fetcher.cpp 2f42fa221a42fdc131a8a97ded4e9433ce51ea77
> src/slave/constants.hpp 030fb05186f7f360010bb7e5b4948faac69771cc
> src/slave/containerizer/fetcher.cpp a49411b7bac2d5a50a75d0b802842c2f61fe58c6
> src/slave/flags.hpp 0c67bf214ceb93ae7ff088bec2648fa26ddac59e
> src/slave/flags.cpp 943aaaf58b5f36555f0902019b8c5c6522ab7afc
>
>
> Diff: https://reviews.apache.org/r/65856/diff/1/
>
>
> Testing
> -------
>
> sudo make check
>
> Manually tested with Nginx servers that sleeps for 59 seconds and 1 mintue before serving artifacts.
>
>
> Thanks,
>
> Chun-Hung Hsiao
>
>
Re: Review Request 65856: Added `--fetcher_stall_timeout` to abort
stalled artifact fetching.
Posted by Gilbert Song <so...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65856/#review200171
-----------------------------------------------------------
Ship it!
Ship It!
- Gilbert Song
On March 28, 2018, 4:51 p.m., Chun-Hung Hsiao wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65856/
> -----------------------------------------------------------
>
> (Updated March 28, 2018, 4:51 p.m.)
>
>
> Review request for mesos and Gilbert Song.
>
>
> Bugs: MESOS-8620
> https://issues.apache.org/jira/browse/MESOS-8620
>
>
> Repository: mesos
>
>
> Description
> -------
>
> This flag specifies a timeout for `mesos-fetcher` to wait before
> aborting if the download speed keeps below 1 bytes/sec. This would avoid
> containers to get stuck at FETCHING.
>
>
> Diffs
> -----
>
> docs/configuration/agent.md 13e4c551b8b0ba47190b4016220e48c3a4c391fb
> include/mesos/fetcher/fetcher.proto 6a5d807221681853444ffd3ab42a23827604e550
> src/launcher/fetcher.cpp 2f42fa221a42fdc131a8a97ded4e9433ce51ea77
> src/slave/constants.hpp f1fc2bfcb9e093ab39a550d8fc7daa8fadee6f64
> src/slave/containerizer/fetcher.cpp f9ab55404801e27900dc82316c1ca595fd65b942
> src/slave/flags.hpp 949a4783caf8aac9a246a98525a5287b0f8256d8
> src/slave/flags.cpp 962b07c1d701f4ab819b14730fbc116b981433bb
>
>
> Diff: https://reviews.apache.org/r/65856/diff/2/
>
>
> Testing
> -------
>
> sudo make check
>
> Manually tested with Nginx servers with the following cases:
> 1. Sleeps for 59 seconds before serving artifacts (successful)
> 2. Sleeps for 1 mintue before serving artifacts (failed)
> 3. Sleeps for 55 seconds and then serve a 640B artifact with 12 bytes/second (successful)
>
>
> Thanks,
>
> Chun-Hung Hsiao
>
>
Re: Review Request 65856: Added `--fetcher_stall_timeout` to abort
stalled artifact fetching.
Posted by Chun-Hung Hsiao <ch...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65856/
-----------------------------------------------------------
(Updated March 28, 2018, 11:51 p.m.)
Review request for mesos and Gilbert Song.
Changes
-------
Addressed Gilbert's and James' comment.
Bugs: MESOS-8620
https://issues.apache.org/jira/browse/MESOS-8620
Repository: mesos
Description (updated)
-------
This flag specifies a timeout for `mesos-fetcher` to wait before
aborting if the download speed keeps below 1 bytes/sec. This would avoid
containers to get stuck at FETCHING.
Diffs (updated)
-----
docs/configuration/agent.md 13e4c551b8b0ba47190b4016220e48c3a4c391fb
include/mesos/fetcher/fetcher.proto 6a5d807221681853444ffd3ab42a23827604e550
src/launcher/fetcher.cpp 2f42fa221a42fdc131a8a97ded4e9433ce51ea77
src/slave/constants.hpp f1fc2bfcb9e093ab39a550d8fc7daa8fadee6f64
src/slave/containerizer/fetcher.cpp f9ab55404801e27900dc82316c1ca595fd65b942
src/slave/flags.hpp 949a4783caf8aac9a246a98525a5287b0f8256d8
src/slave/flags.cpp 962b07c1d701f4ab819b14730fbc116b981433bb
Diff: https://reviews.apache.org/r/65856/diff/2/
Changes: https://reviews.apache.org/r/65856/diff/1-2/
Testing
-------
sudo make check
Manually tested with Nginx servers with the following cases:
1. Sleeps for 59 seconds before serving artifacts (successful)
2. Sleeps for 1 mintue before serving artifacts (failed)
3. Sleeps for 55 seconds and then serve a 640B artifact with 12 bytes/second (successful)
Thanks,
Chun-Hung Hsiao
Re: Review Request 65856: Added `--fetcher_stall_timeout` to abort
stalled artifact fetching.
Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65856/#review198434
-----------------------------------------------------------
FAIL: Some of the unit tests failed. Please check the relevant logs.
Reviews applied: `['65855', '65856']`
Failed command: `Start-MesosCITesting`
All the build artifacts available at: http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65856
Relevant logs:
- [mesos-tests-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65856/logs/mesos-tests-stdout.log):
```
[ OK ] Endpoint/SlaveEndpointTest.NoAuthorizer/2 (206 ms)
[----------] 9 tests from Endpoint/SlaveEndpointTest (1251 ms total)
[----------] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest
[ RUN ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0
[ OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0 (39 ms)
[ RUN ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1
[ OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1 (45 ms)
[----------] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest (86 ms total)
[----------] 1 test from IsolationFlag/CpuIsolatorTest
[ RUN ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0
[ OK ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0 (2506 ms)
[----------] 1 test from IsolationFlag/CpuIsolatorTest (2531 ms total)
[----------] 1 test from IsolationFlag/MemoryIsolatorTest
[ RUN ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0
[ OK ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0 (2468 ms)
[----------] 1 test from IsolationFlag/MemoryIsolatorTest (2492 ms total)
[----------] Global test environment tear-down
[==========] 915 tests from 90 test cases ran. (477671 ms total)
[ PASSED ] 914 tests.
[ FAILED ] 1 test, listed below:
[ FAILED ] CommandExecutorCheckTest.CommandCheckTimeout
1 FAILED TEST
YOU HAVE 211 DISABLED TESTS
```
- [mesos-tests-stderr.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65856/logs/mesos-tests-stderr.log):
```
I0301 06:19:01.867746 1316 slave.cpp:3879] Shutting down framework ed234448-e169-4c04-93bc-5ddd2de8997a-0000
I0301 06:19:01.867746 6516 master.cpp:10258] Updating the state of task 9d020280-c221-4c61-b7ad-1afee8382366 of framework ed234448-e169-4c04-93bc-5ddd2de8997a-0000 (latest state: TASK_KILLED, status update state: TASK_KILLED)
I0301 06:19:01.867746 1316 slave.cpp:6586] Shutting down executor '9d020280-c221-4c61-b7ad-1afee8382366' of framework ed234448-e169-4c04-93bc-5ddd2de8997a-0000 at executor(1)@10.3.1.5:55976
I0301 06:19:01.869748 1316 slave.cpp:922] Agent terminating
W0301 06:19:01.869748 1316 slave.cpp:3875] Ignoring shutdown framework ed234448-e169-4c04-93bc-5ddd2de8997a-0000 because it is terminating
I0301 06:19:01.869748 6516 master.cpp:10357] Removing task 9d020280-c221-4c61-b7ad-1afee8382366 with resources cpus(allocated: *):4; mem(allocated: *):2048; disk(allocated: *):1024; ports(allocated: *):[31000-32000] of framework ed234448-e169-4c04-93bc-5ddd2de8997a-0000 on agent ed234448-e169-4c04-93bc-5ddd2de8997a-S0 at slave(398)@10.3.1.5:55955 (build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I0301 06:19:01.872721 6516 master.cpp:1306] Agent ed234448-e169-4c04-93bc-5ddd2de8997a-S0 at slave(398)@10.3.1.5:55955 (build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net) disconnected
I0301 06:19:01.872721 6516 master.cpp:3276] Disconnecting agent ed234448-e169-4c04-93bc-5ddd2de8997a-S0 aI0301 06:19:01.169744 6488 exec.cpp:162] Version: 1.6.0
I0301 06:19:01.198719 2512 exec.cpp:236] Executor registered on agent ed234448-e169-4c04-93bc-5ddd2de8997a-S0
I0301 06:19:01.202718 1240 executor.cpp:176] Received SUBSCRIBED event
I0301 06:19:01.207742 1240 executor.cpp:180] Subscribed executor on build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net
I0301 06:19:01.208744 1240 executor.cpp:176] Received LAUNCH event
I0301 06:19:01.212718 1240 executor.cpp:648] Starting task 9d020280-c221-4c61-b7ad-1afee8382366
I0301 06:19:01.296744 1240 executor.cpp:483] Running 'D:\DCOS\mesos\src\mesos-containerizer.exe launch <POSSIBLY-SENSITIVE-DATA>'
I0301 06:19:01.835724 1240 executor.cpp:661] Forked command at 3796
I0301 06:19:01.873726 7912 exec.cpp:445] Executor asked to shutdown
I0301 06:19:01.874722 1240 executor.cpp:176] Received SHUTDOWN event
I0301 06:19:01.874722 1240 executor.cpp:758] Shutting down
I0301 06:19:01.874722 1240 executor.cpp:868] Sending SIGTERM to process tree at pid 3t slave(398)@10.3.1.5:55955 (build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I0301 06:19:01.873726 1132 hierarchical.cpp:344] Removed framework ed234448-e169-4c04-93bc-5ddd2de8997a-0000
I0301 06:19:01.872721 6752 containerizer.cpp:2338] Destroying container 5d36381f-2b59-464b-ab48-1139dc873e9d in RUNNING state
I0301 06:19:01.873726 6752 containerizer.cpp:2952] Transitioning the state of container 5d36381f-2b59-464b-ab48-1139dc873e9d from RUNNING to DESTROYING
I0301 06:19:01.873726 6516 master.cpp:3295] Deactivating agent ed234448-e169-4c04-93bc-5ddd2de8997a-S0 at slave(398)@10.3.1.5:55955 (build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I0301 06:19:01.873726 9144 hierarchical.cpp:766] Agent ed234448-e169-4c04-93bc-5ddd2de8997a-S0 deactivated
I0301 06:19:01.874722 6752 launcher.cpp:156] Asked to destroy container 5d36381f-2b59-464b-ab48-1139dc873e9d
I0301 06:19:01.939859 9144 containerizer.cpp:2791] Container 5d36381f-2b59-464b-ab48-1139dc873e9d has exited
I0301 06:19:01.971858 6444 master.cpp:1149] Master terminating
I0301 06:19:01.974859 1316 hierarchical.cpp:609] Removed agent ed234448-e169-4c04-93bc-5ddd2de8997a-S0
I0301 06:19:02.439860 3124 process.cpp:929] Stopped the socket accept loop
```
- Mesos Reviewbot Windows
On March 1, 2018, 5:17 a.m., Chun-Hung Hsiao wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65856/
> -----------------------------------------------------------
>
> (Updated March 1, 2018, 5:17 a.m.)
>
>
> Review request for mesos and Gilbert Song.
>
>
> Bugs: MESOS-8620
> https://issues.apache.org/jira/browse/MESOS-8620
>
>
> Repository: mesos
>
>
> Description
> -------
>
> This flag specifies a timeout for `mesos-fetcher` to wait before
> aborting if the download speed keeps below 1 bytes/sec. This would avoid
> containers to get stuck at FETCHING. The default value is 1 minute.
>
>
> Diffs
> -----
>
> include/mesos/fetcher/fetcher.proto 6a5d807221681853444ffd3ab42a23827604e550
> src/launcher/fetcher.cpp 2f42fa221a42fdc131a8a97ded4e9433ce51ea77
> src/slave/constants.hpp 030fb05186f7f360010bb7e5b4948faac69771cc
> src/slave/containerizer/fetcher.cpp a49411b7bac2d5a50a75d0b802842c2f61fe58c6
> src/slave/flags.hpp 0c67bf214ceb93ae7ff088bec2648fa26ddac59e
> src/slave/flags.cpp 943aaaf58b5f36555f0902019b8c5c6522ab7afc
>
>
> Diff: https://reviews.apache.org/r/65856/diff/1/
>
>
> Testing
> -------
>
> sudo make check
>
> Manually tested with Nginx servers that sleeps for 59 seconds and 1 mintue before serving artifacts.
>
>
> Thanks,
>
> Chun-Hung Hsiao
>
>