You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Sergey Galkin (JIRA)" <ji...@apache.org> on 2016/03/22 14:36:25 UTC

[jira] [Commented] (MESOS-4999) Mesos (or Marathon) lost tasks

    [ https://issues.apache.org/jira/browse/MESOS-4999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15206358#comment-15206358 ] 

Sergey Galkin commented on MESOS-4999:
--------------------------------------

I can't kill RUNNING task  through MesosAPI
{code}
$ cat  /tmp/mesos                                                                                                      
{ "framework_id"    : {"value": "5445dbdc-c58a-4f78-aef2-9ab129a640fa-0000"},
  "type"            : "KILL",
  "kill"            : {"task_id":  {"value" : "66e562a95c285ce39f37693061a46c2e.d3637b20-f020-11e5-9d8f-3cfdfe9c6364"}}
}
$ curl -v -H Content-Type:application/json -XPOST -d@/tmp/mesos http://172.20.8.34:8080/env-testing/mesos/api/v1/scheduler
Note: Unnecessary use of -X or --request, POST is already inferred.
*   Trying 172.20.8.34...
* Connected to 172.20.8.34 (172.20.8.34) port 8080 (#0)
> POST /env-testing/mesos/api/v1/scheduler HTTP/1.1
> Host: 172.20.8.34:8080
> User-Agent: curl/7.47.1
> Accept: */*
> Content-Type:application/json
> Content-Length: 226
> 
* upload completely sent off: 226 out of 226 bytes
< HTTP/1.1 202 Accepted
< Server: nginx/1.4.6 (Ubuntu)
< Date: Tue, 22 Mar 2016 13:35:10 GMT
< Content-Length: 0
< Connection: keep-alive
< 
* Connection #0 to host 172.20.8.34 left intact
{code}
do nothing

> Mesos (or Marathon) lost tasks
> ------------------------------
>
>                 Key: MESOS-4999
>                 URL: https://issues.apache.org/jira/browse/MESOS-4999
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 0.27.2
>         Environment: mesos - 0.27.0
> marathon - 0.15.2
> 189 mesos slaves with Ubuntu 14.04.2 on HP ProLiant DL380 Gen9,
> CPU - 2 x Intel(R) Xeon(R) CPU E5-2680 v3 @2.50GHz (48 cores (with hyperthreading))
> RAM - 264G,
> Storage - 3.0T on RAID on HP Smart Array P840 Controller,
> HDD - 12 x HP EH0600JDYTL
> Network - 2 x Intel Corporation Ethernet 10G 2P X710,
>            Reporter: Sergey Galkin
>         Attachments: mesos-nodes.png
>
>
> After a lot of create/delete application  with docker instances  through Marathon API I have a lot of lost tasks after last *deleting all application in Marathon*.
> They are divided into three types
> 1. Tasks hangs in STAGED status. I don't see this tasks in 'docker ps' on the slave and _service docker restart_ on mesos slave did not fix these tasks.
> 2. RUNNING because docker hangs and can't delete these instances  (a lot of 
> {code}
> Killing docker task
> Shutting down
> Killing docker task
> Shutting down
> {code}
>  in stdout,  
> _docker stop ID_ hangs and these tasks can be fixed by _service docker restart_ on mesos slave.
> 3. RUNNING after _service docker restart_ on mesos slave.
> Screenshot attached 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)