You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Eric Yang (JIRA)" <ji...@apache.org> on 2018/08/01 22:33:01 UTC

[jira] [Commented] (YARN-8160) Yarn Service Upgrade: Support upgrade of service that use docker containers

    [ https://issues.apache.org/jira/browse/YARN-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16566098#comment-16566098 ] 

Eric Yang commented on YARN-8160:
---------------------------------

The current upgrade per instance command is almost working.  There seems to be some bugs when I test the API.  First, I launch an service that looks like this:

{code}
{
  "name": "sleeper-service",
  "kerberos_principal" : {
    "principal_name" : "hbase/_HOST@EXAMPLE.COM",
    "keytab" : "file:///etc/security/keytabs/hbase.service.keytab"
  },
  "version": "1",
  "components" :
  [
    {
      "name": "ping",
      "number_of_containers": 2,
      "artifact": {
        "id": "hadoop/centos:6",
        "type": "DOCKER"
      },
      "launch_command": "sleep,9000",
      "resource": {
        "cpus": 1,
        "memory": "256"
      },
      "configuration": {
        "env": {
          "YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL":"true",
          "YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE":"true"
        },
        "properties": {
          "docker.network": "host"
        }
      }
    }
  ]
}
{code}

After the application is launched, yarnfile is updated with a new docker image version, and launch command changed from sleep,90000 to sleep,90.

{code}
{
  "name": "sleeper-service",
  "kerberos_principal" : {
    "principal_name" : "hbase/_HOST@EXAMPLE.COM",
    "keytab" : "file:///etc/security/keytabs/hbase.service.keytab"
  },
  "version": "2",
  "components" :
  [
    {
      "name": "ping",
      "number_of_containers": 2,
      "artifact": {
        "id": "hadoop/centos:latest",
        "type": "DOCKER"
      },
      "launch_command": "sleep,90",
      "resource": {
        "cpus": 1,
        "memory": "256"
      },
      "configuration": {
        "env": {
          "YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL":"true",
          "YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE":"true"
        },
        "properties": {
          "docker.network": "host"
        }
      }
    }
  ]
}
{code}

Then proceeded with yarn app -upgrade sleeper -initiate yarnfile.v2, and yarn app -upgrade sleeper -instances ping-0,ping-1.
In the container log, it shows:

{code}
Docker run command: /usr/bin/docker run --name=container_e02_1533070786532_0006_01_000002 --user=1013:1001 --security-opt=no-new-privileges --net=host -v /usr/local/hadoop-3.2.0-SNAPSHOT/logs/userlogs/application_1533070786532_0006/container_e02_1533070786532_0006_01_000002:/usr/local/hadoop-3.2.0-SNAPSHOT/logs/userlogs/application_1533070786532_0006/container_e02_1533070786532_0006_01_000002:rw -v /tmp/hadoop-yarn/nm-local-dir/usercache/hbase/appcache/application_1533070786532_0006:/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/appcache/application_1533070786532_0006:rw -v /tmp/hadoop-yarn/nm-local-dir/filecache:/tmp/hadoop-yarn/nm-local-dir/filecache:ro -v /tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache:/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache:ro --cap-drop=ALL --cap-add=SYS_CHROOT --cap-add=MKNOD --cap-add=SETFCAP --cap-add=SETPCAP --cap-add=FSETID --cap-add=CHOWN --cap-add=AUDIT_WRITE --cap-add=SETGID --cap-add=NET_RAW --cap-add=FOWNER --cap-add=SETUID --cap-add=DAC_OVERRIDE --cap-add=KILL --cap-add=NET_BIND_SERVICE --hostname=ping-0.s1.hbase.ycluster --group-add 1001 --group-add 982 --env-file /tmp/hadoop-yarn/nm-local-dir/nmPrivate/application_1533070786532_0006/container_e02_1533070786532_0006_01_000002/docker.container_e02_1533070786532_0006_01_0000026435836068142984694.env hadoop/centos:6 sleep 90000 
Launching docker container...
Docker run command: /usr/bin/docker run --name=container_e02_1533070786532_0006_01_000002 --user=1013:1001 --security-opt=no-new-privileges --net=host -v /usr/local/hadoop-3.2.0-SNAPSHOT/logs/userlogs/application_1533070786532_0006/container_e02_1533070786532_0006_01_000002:/usr/local/hadoop-3.2.0-SNAPSHOT/logs/userlogs/application_1533070786532_0006/container_e02_1533070786532_0006_01_000002:rw -v /tmp/hadoop-yarn/nm-local-dir/usercache/hbase/appcache/application_1533070786532_0006:/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/appcache/application_1533070786532_0006:rw -v /tmp/hadoop-yarn/nm-local-dir/filecache:/tmp/hadoop-yarn/nm-local-dir/filecache:ro -v /tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache:/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache:ro --cap-drop=ALL --cap-add=SYS_CHROOT --cap-add=MKNOD --cap-add=SETFCAP --cap-add=SETPCAP --cap-add=FSETID --cap-add=CHOWN --cap-add=AUDIT_WRITE --cap-add=SETGID --cap-add=NET_RAW --cap-add=FOWNER --cap-add=SETUID --cap-add=DAC_OVERRIDE --cap-add=KILL --cap-add=NET_BIND_SERVICE --hostname=ping-0.s1.hbase.ycluster --group-add 1001 --group-add 982 --env-file /tmp/hadoop-yarn/nm-local-dir/nmPrivate/application_1533070786532_0006/container_e02_1533070786532_0006_01_000002/docker.container_e02_1533070786532_0006_01_000002254751351532328192.env hadoop/centos:latest sleep 90000 
{code}

The container is relaunched using centos:latest image instead of centos:6.  This is verified using docker inspect, and docker exec to verify that container image has changed.  However, launch command did not reflect the correct changes.

> Yarn Service Upgrade: Support upgrade of service that use docker containers 
> ----------------------------------------------------------------------------
>
>                 Key: YARN-8160
>                 URL: https://issues.apache.org/jira/browse/YARN-8160
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Chandni Singh
>            Assignee: Chandni Singh
>            Priority: Major
>              Labels: Docker
>
> Ability to upgrade dockerized  yarn native services.
> Ref: YARN-5637
> *Background*
> Container upgrade is supported by the NM via {{reInitializeContainer}} api. {{reInitializeContainer}} does *NOT* change the ContainerId of the upgraded container.
> NM performs the following steps during {{reInitializeContainer}}:
> - kills the existing process
> - cleans up the container
> - launches another container with the new {{ContainerLaunchContext}}
> NOTE: {{ContainerLaunchContext}} holds all the information that needs to upgrade the container.
> With {{reInitializeContainer}}, the following does *NOT* change
> - container ID. This is not created by NM. It is provided to it and here RM is not creating another container allocation.
> - {{localizedResources}} this stays the same if the upgrade does *NOT* require additional resources IIUC.
>  
> The following changes with {{reInitializeContainer}}
> - the working directory of the upgraded container changes. It is *NOT* a relaunch. 
> *Changes required in the case of docker container*
> - {{reInitializeContainer}} seems to not be working with Docker containers. Investigate and fix this.
> - [Future change] Add an additional api to NM to pull the images and modify {{reInitializeContainer}} to trigger docker container launch without pulling the image first which could be based on a flag.
>     -- When the service upgrade is initialized, we can provide the user with an option to just pull the images  on the NMs.
>     -- When a component instance is upgrade, it calls the {{reInitializeContainer}} with the flag pull-image set to false, since the NM will have already pulled the images.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org