You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Chandni Singh (JIRA)" <ji...@apache.org> on 2018/08/01 22:33:01 UTC

[jira] [Updated] (YARN-8611) With restart policy set to ON_FAILURE, the service state sometimes doesn't reach STABLE state

     [ https://issues.apache.org/jira/browse/YARN-8611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chandni Singh updated YARN-8611:
--------------------------------
    Summary: With restart policy set to ON_FAILURE, the service state sometimes doesn't reach STABLE state  (was: With restart policy set to ON_FAILURE, the service state doesn't reach STABLE state)

> With restart policy set to ON_FAILURE, the service state sometimes doesn't reach STABLE state
> ---------------------------------------------------------------------------------------------
>
>                 Key: YARN-8611
>                 URL: https://issues.apache.org/jira/browse/YARN-8611
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Chandni Singh
>            Priority: Major
>
> - Launched a docker based sleeper service with {{restart_policy = ON_FAILURE}}.
>  - There are container failures but eventually both the component instances reach {{READY}} state
>  - However the SERVICE state remains {{STARTED}}
> Below is the service status json:
> {code:java}
>     "components": [
>         {
>             "artifact": {
>                 "id": "hadoop/centos:6",
>                 "type": "DOCKER"
>             },
>             "configuration": {
>                 "env": {
>                     "YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL": "true",
>                     "YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE": "true"
>                 },
>                 "files": [],
>                 "properties": {
>                     "docker.network": "host"
>                 }
>             },
>             "containers": [
>                 {
>                     "bare_host": “{host1}“,
>                     "component_instance_name": "ping-1",
>                     "hostname": "ping-1.s.hbase.ycluster",
>                     "id": "container_e02_1533070786532_0005_01_000003",
>                     "ip": "172.26.111.21",
>                     "launch_time": 1533159861113,
>                     "state": "READY"
>                 },
>                 {
>                     "bare_host": “{host2}“,
>                     "component_instance_name": "ping-0",
>                     "hostname": "ping-0.s.hbase.ycluster",
>                     "id": "container_e02_1533070786532_0005_01_000007",
>                     "ip": "172.26.111.21",
>                     "launch_time": 1533160113627,
>                     "state": "READY"
>                 }
>             ],
>             "dependencies": [],
>             "launch_command": "sleep 90000",
>             "name": "ping",
>             "number_of_containers": 2,
>             "quicklinks": [],
>             "resource": {
>                 "additional": {},
>                 "cpus": 1,
>                 "memory": "256"
>             },
>             "restart_policy": "ON_FAILURE",
>             "run_privileged_container": false,
>             "state": "STABLE"
>         }
>     ],
>     "configuration": {
>         "env": {},
>         "files": [],
>         "properties": {}
>     },
>     "id": "application_1533070786532_0005",
>     "kerberos_principal": {
>         "keytab": "...",
>         "principal_name": "..."
>     },
>     "lifetime": -1,
>     "name": "s",
>     "quicklinks": {},
>     "state": "STARTED",
>     "version": "1"
> }{code}
> The service state needs to become {{STABLE}} since all the component instances are {{READY}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org