You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Vinod Kumar Vavilapalli (JIRA)" <ji...@apache.org> on 2019/05/14 04:05:00 UTC
[jira] [Commented] (YARN-9549) Not able to run pyspark in docker
driver container on Yarn3
[ https://issues.apache.org/jira/browse/YARN-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839061#comment-16839061 ]
Vinod Kumar Vavilapalli commented on YARN-9549:
-----------------------------------------------
This is certainly a configuration issue. Please post your yarn-site.xml.
You are better off hitting the user lists first for issues like these.
> Not able to run pyspark in docker driver container on Yarn3
> -----------------------------------------------------------
>
> Key: YARN-9549
> URL: https://issues.apache.org/jira/browse/YARN-9549
> Project: Hadoop YARN
> Issue Type: Bug
> Components: yarn
> Affects Versions: 3.1.2
> Environment: Hadoop 3.1.1.3.1.0.0-78
> spark version 2.3.2.3.1.0.0-78
> Using Scala version 2.11.8, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_211
> Server: Docker Engine - Community Version: 18.09.6
> Reporter: Jack Zhu
> Priority: Critical
> Attachments: Dockerfile, test.py
>
>
> I follow [https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-site/DockerContainers.html] to build up a spark docker image to run pyspark, there isn't a good document describe how to use spark-submit pyspark job to a hadoop3 cluster, so I use below command to launch my simple python job:
> PYSPARK_DRIVER_PYTHON=/usr/local/bin/python3.5 spark-submit --master yarn --deploy-mode cluster --num-executors 3 --executor-memory 1g --conf spark.executorEnv.YARN_CONTAINER_RUNTIME_TYPE=docker --conf spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=local/spark:v1.0.8 --conf spark.yarn.AppMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=local/spark:v1.0.8 --conf spark.yarn.AppMasterEnv.YARN_CONTAINER_RUNTIME_TYPE=docker ./test.py
>
> in the test.py, it only simply collect the hostname from the executor, and check whether the python job run in a container or not.
> I found that the driver always run direct on the host, not run in the container, as a result we need to keep python version in docker image consistent with the nodemanager, this is meanless to use docker to package all the dependencies.
>
> The spark job can be run successfully, below is the std output:
> Log Type: stdout
> Log Upload Time: Tue May 14 02:07:06 +0000 2019
> Log Length: 141
> host.test.com
> False ============>going to print all the container names. [True, True, True, True, True, True, True, True, True]
> please see attached Dockfile and test.py
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org