You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Eric Yang (JIRA)" <ji...@apache.org> on 2018/09/28 16:33:01 UTC

[jira] [Commented] (SPARK-23717) Leverage docker support in Hadoop 3

    [ https://issues.apache.org/jira/browse/SPARK-23717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16632100#comment-16632100 ] 

Eric Yang commented on SPARK-23717:
-----------------------------------

It is possible to run standalone Spark in YARN without any code modification to spark.  Here is an example yarnfile that I used to run mesosphere generated docker image and it ran fine:

{code}
{
  "name": "spark",
  "kerberos_principal" : {
    "principal_name" : "spark/_HOST@EXAMPLE.COM",
    "keytab" : "file:///etc/security/keytabs/spark.service.keytab"
  },
  "version": "0.1",
  "components" :
  [
    {
      "name": "driver",
      "number_of_containers": 1,
      "artifact": {
        "id": "mesosphere/spark:latest",
        "type": "DOCKER"
      },
      "launch_command": "bash,-c,sleep 30 && ./sbin/start-master.sh",
      "resource": {
        "cpus": 1,
        "memory": "256"
      },
      "run_privileged_container": true,
      "configuration": {
        "env": {
          "YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE":"true",
          "SPARK_NO_DAEMONIZE":"true",
          "JAVA_HOME":"/usr/lib/jvm/jre1.8.0_131"
        },
        "properties": {
          "docker.network": "host"
        }
      }
    },
    {
      "name": "executor",
      "number_of_containers": 2,
      "artifact": {
        "id": "mesosphere/spark:latest",
        "type": "DOCKER"
      },
      "launch_command": "bash,-c,sleep 30 && ./sbin/start-slave.sh spark://driver-0.spark.spark.ycluster:7077",
      "resource": {
        "cpus": 1,
        "memory": "256"
      },
      "run_privileged_container": true,
      "dependencies": [ "driver" ],
      "configuration": {
        "env": {
          "YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE":"true",
          "SPARK_NO_DAEMONIZE":"true",
          "JAVA_HOME":"/usr/lib/jvm/jre1.8.0_131"
        },
        "properties": {
          "docker.network": "host"
        }
      }
    }
  ]
}
{code}

The reason for 30 seconds sleep is to ensure RegistryDNS has been refreshed and updated to respond to DNS queries.  The sleep could be a lot shorter like 3 seconds.  I did not spend much time to try to fine tune the DNS wait time.  Further enhancement to pass in keytab and krb5.conf can enable access to secure HDFS, that would be exercise for the readers of this JIRA.

> Leverage docker support in Hadoop 3
> -----------------------------------
>
>                 Key: SPARK-23717
>                 URL: https://issues.apache.org/jira/browse/SPARK-23717
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core, YARN
>    Affects Versions: 2.4.0
>            Reporter: Mridul Muralidharan
>            Priority: Major
>
> The introduction of docker support in Apache Hadoop 3 can be leveraged by Apache Spark for resolving multiple long standing shortcoming - particularly related to package isolation.
> It also allows for network isolation, where applicable, allowing for more sophisticated cluster configuration/customization.
> This jira will track the various tasks for enhancing spark to leverage container support.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org