You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Andrei Budnik <ab...@mesosphere.com> on 2018/02/20 15:18:59 UTC
Review Request 65713: Handled hanging docker `stop`,
`inspect` commands in docker executor.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65713/
-----------------------------------------------------------
Review request for mesos, Alexander Rukletsov, Gilbert Song, and Greg Mann.
Bugs: MESOS-8574
https://issues.apache.org/jira/browse/MESOS-8574
Repository: mesos
Description
-------
Previosly, if `docker inspect` command hanged, the docker container
ended up in an unkillable state. This patch adds a timeout for inspect
command after receiving `killTask` analogically to `reaped` handler.
In addition we've added a timeout for `docker stop` command. If docker
`stop` or `inspect` command times out, we discard the related future,
thus the docker library kills previously spawned docker cli subprocess.
As a result, a scheduler can retry `killTask` operation to handle
nasty docker bugs that lead to hanging docker cli.
Diffs
-----
src/docker/executor.cpp 80e2d81169f0d4303ca1ddbcef9fa87fe52601fc
Diff: https://reviews.apache.org/r/65713/diff/1/
Testing
-------
Thanks,
Andrei Budnik
Re: Review Request 65713: Handled hanging docker `stop`,
`inspect` commands in docker executor.
Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65713/#review197779
-----------------------------------------------------------
PASS: Mesos patch 65713 was successfully built and tested.
Reviews applied: `['65713']`
All the build artifacts available at: http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65713
- Mesos Reviewbot Windows
On Feb. 20, 2018, 3:18 p.m., Andrei Budnik wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65713/
> -----------------------------------------------------------
>
> (Updated Feb. 20, 2018, 3:18 p.m.)
>
>
> Review request for mesos, Alexander Rukletsov, Gilbert Song, and Greg Mann.
>
>
> Bugs: MESOS-8574
> https://issues.apache.org/jira/browse/MESOS-8574
>
>
> Repository: mesos
>
>
> Description
> -------
>
> Previosly, if `docker inspect` command hanged, the docker container
> ended up in an unkillable state. This patch adds a timeout for inspect
> command after receiving `killTask` analogically to `reaped` handler.
> In addition we've added a timeout for `docker stop` command. If docker
> `stop` or `inspect` command times out, we discard the related future,
> thus the docker library kills previously spawned docker cli subprocess.
> As a result, a scheduler can retry `killTask` operation to handle
> nasty docker bugs that lead to hanging docker cli.
>
>
> Diffs
> -----
>
> src/docker/executor.cpp 80e2d81169f0d4303ca1ddbcef9fa87fe52601fc
>
>
> Diff: https://reviews.apache.org/r/65713/diff/1/
>
>
> Testing
> -------
>
> internal CI
>
> Manual testing:
> 1. Build docker from sources: http://oyvindsk.com/writing/docker-build-from-source
> 2. Modify `ContainerInspect` function from `docker/inspect.go`:
> ```
> func (daemon *Daemon) ContainerInspect(name string, size bool, version string) (interface{}, error) {
> + time.Sleep(10 * time.Second)
> ```
> 3. Modify `ContainerStop` function from `docker/stop.go`:
> ```
> func (daemon *Daemon) ContainerStop(name string, seconds *int) error {
> + rand.Seed(time.Now().UTC().UnixNano())
> + if rand.Intn(2) == 0 {
> + time.Sleep(20 * time.Second)
> + }
> ```
> 4. Rebuild docker: `sudo make build && sudo make binary`
> 5. Stop system docker daemon: `sudo service docker stop`
> 6. Start modified docker daemon: `sudo ./bundles/binary-daemon/dockerd-dev`
> 7. Modify `src/cli/execute.cpp`:
> a) Add `delay(Seconds(15), self(), &Self::retryKill, task->task_id(), offer.agent_id());` after https://github.com/apache/mesos/blob/072ea2787ffca6f2a6dcb2d636f68c51823d6665/src/cli/execute.cpp#L606
> b) Add a new method `retryKill` to `CommandScheduler`:
> ```
> void retryKill(const TaskID& taskId, const AgentID& agentId)
> {
> killTask(taskId, agentId);
> delay(Seconds(6), self(), &Self::retryKill, taskId, agentId);
> }
> ```
> 8. Rebuild mesos
> 9. Run mesos master: `./bin/mesos-master.sh --work_dir='var/master-1'`
> 10. Run mesos agent: `GLOG_v=1 ./bin/mesos-agent.sh --resources="cpus:10000;mem:1000000" --work_dir='/home/abudnik/mesos/build/var/agent-1' --containerizers="docker,mesos" --master="127.0.1.1:5050"`
> 11. Submit a task for the docker executor: `./src/mesos-execute --master="127.0.1.1:5050" --name="a" --containerizer=docker --docker_image="ubuntu:xenial" --command="sleep 9999"`
>
>
> Thanks,
>
> Andrei Budnik
>
>