You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@toree.apache.org by "heyang wang (JIRA)" <ji...@apache.org> on 2018/06/14 12:54:00 UTC

[jira] [Updated] (TOREE-476) PySpark Magic failed on yarn cluster mode

     [ https://issues.apache.org/jira/browse/TOREE-476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

heyang wang updated TOREE-476:
------------------------------
    Description: 
I am trying to use PySpark by the magic %%Pysark in Scala notebook following [https://github.com/apache/incubator-toree/blob/master/etc/examples/notebooks/magic-tutorial.ipynb]. In spark local mode, the example can work fine. However when run in yarn mode, the Spark Executor complain about not finding pyspark.

 

[After seeing the source code, I came to understand that the %%Pyspark magic actually use the same spark context created by the spark-submit  command run by Toree. The  spark context  created this way by default doesn't contain any setting related to Python or Pyspark and cause the spark executor to complain when run in yarn mode. I have to add _--conf 'spark.executorEnv.PYTHONPATH'='/usr/local/spark-2.3.0-bin-hadoop2.7/python:/usr/local/spark-2.3.0-bin-hadoop2.7/python/lib/py4j-0.10.6-src.zip'_ manually to __TOREE_SPARK_OPTS__ on kernel.json to make the magic work in yarn mode.

 

I think it would be good to add this setting by default or document it somewhere since running the %%PySpark magic in yarn mode is way more powerful than local mode.

  was:
I am trying to use PySpark by the magic %%Pysark in Scala notebook following [this example|[https://github.com/apache/incubator-toree/blob/master/etc/examples/notebooks/magic-tutorial.ipynb].] In spark local mode, the example can work fine. However when run in yarn mode, the Spark Executor complain about not finding pyspark.

 

[After seeing the source code, I came to understand that the %%Pyspark magic actually use the same spark context created by the spark-submit  command run by Toree. The  spark context  created this way by default doesn't contain any setting related to Python or Pyspark and cause the spark executor to complain when run in yarn mode. I have to add _--conf 'spark.executorEnv.PYTHONPATH'='/usr/local/spark-2.3.0-bin-hadoop2.7/python:/usr/local/spark-2.3.0-bin-hadoop2.7/python/lib/py4j-0.10.6-src.zip'_ manually to __TOREE_SPARK_OPTS__ on kernel.json to make the magic work in yarn mode.

 

I think it would be good to add this setting by default or document it somewhere since running the %%PySpark magic in yarn mode is way more powerful than local mode.


> PySpark Magic failed on yarn cluster mode
> -----------------------------------------
>
>                 Key: TOREE-476
>                 URL: https://issues.apache.org/jira/browse/TOREE-476
>             Project: TOREE
>          Issue Type: Bug
>    Affects Versions: 0.2.0
>            Reporter: heyang wang
>            Priority: Major
>
> I am trying to use PySpark by the magic %%Pysark in Scala notebook following [https://github.com/apache/incubator-toree/blob/master/etc/examples/notebooks/magic-tutorial.ipynb]. In spark local mode, the example can work fine. However when run in yarn mode, the Spark Executor complain about not finding pyspark.
>  
> [After seeing the source code, I came to understand that the %%Pyspark magic actually use the same spark context created by the spark-submit  command run by Toree. The  spark context  created this way by default doesn't contain any setting related to Python or Pyspark and cause the spark executor to complain when run in yarn mode. I have to add _--conf 'spark.executorEnv.PYTHONPATH'='/usr/local/spark-2.3.0-bin-hadoop2.7/python:/usr/local/spark-2.3.0-bin-hadoop2.7/python/lib/py4j-0.10.6-src.zip'_ manually to __TOREE_SPARK_OPTS__ on kernel.json to make the magic work in yarn mode.
>  
> I think it would be good to add this setting by default or document it somewhere since running the %%PySpark magic in yarn mode is way more powerful than local mode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)