You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@kyuubi.apache.org by GitBox <gi...@apache.org> on 2022/12/03 15:13:58 UTC

[GitHub] [incubator-kyuubi] turboFei opened a new issue, #3896: [Improvement] Support to execute python language with yarn cluster mode

turboFei opened a new issue, #3896:
URL: https://github.com/apache/incubator-kyuubi/issues/3896

   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   
   
   ### Search before asking
   
   - [X] I have searched in the [issues](https://github.com/apache/incubator-kyuubi/issues?q=is%3Aissue) and found no similar issues.
   
   
   ### What would you like to be improved?
   
   Support to execute python language with yarn cluster mode
   
   
   <img width="528" alt="image" src="https://user-images.githubusercontent.com/6757692/205447872-692e8b24-3db6-470e-bdb0-7414f08fd7ea.png">
   
   ### How should we improve?
   
   Seems need upload pyspark lib when initializing the spark engine.
   
   ### Are you willing to submit PR?
   
   - [ ] Yes. I can submit a PR independently to improve.
   - [ ] Yes. I would be willing to submit a PR with guidance from the Kyuubi community to improve.
   - [ ] No. I cannot submit a PR at this time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [incubator-kyuubi] turboFei commented on issue #3896: [Improvement] Support to execute python language with yarn cluster mode

Posted by GitBox <gi...@apache.org>.
turboFei commented on issue #3896:
URL: https://github.com/apache/incubator-kyuubi/issues/3896#issuecomment-1336753299

   > whether this addArchive can be used for uploading Python interpreter and dependencies
   
   sure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [incubator-kyuubi] cfmcgrady commented on issue #3896: [Improvement] Support to execute python language with yarn cluster mode

Posted by GitBox <gi...@apache.org>.
cfmcgrady commented on issue #3896:
URL: https://github.com/apache/incubator-kyuubi/issues/3896#issuecomment-1336388212

   cc @bowenliang123 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [incubator-kyuubi] turboFei commented on issue #3896: [Improvement] Support to execute python language with yarn cluster mode

Posted by GitBox <gi...@apache.org>.
turboFei commented on issue #3896:
URL: https://github.com/apache/incubator-kyuubi/issues/3896#issuecomment-1336430083

   temp workaround
   ```
   --hivevar  spark.yarn.dist.archives=viewfs://apollo-rno/apps/b_stf/spark-3.1.1.0.12.0-bin-ebay.zip#spark_home \
   --hivevar spark.yarn.appMasterEnv.SPARK_HOME=spark_home
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [incubator-kyuubi] turboFei commented on issue #3896: [Improvement] Support to execute python language with yarn cluster mode

Posted by GitBox <gi...@apache.org>.
turboFei commented on issue #3896:
URL: https://github.com/apache/incubator-kyuubi/issues/3896#issuecomment-1336432384

   Yes,it is a static conf.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [incubator-kyuubi] bowenliang123 commented on issue #3896: [Improvement] Support to execute python language with yarn cluster mode

Posted by GitBox <gi...@apache.org>.
bowenliang123 commented on issue #3896:
URL: https://github.com/apache/incubator-kyuubi/issues/3896#issuecomment-1336440082

   > We can add a parameters for spark python language mode, for the spark archive. so that we can add archive for python mode on need.
   > 
   > If SPARK_HOME is not defined, invoke SparkContext::addArchive to add archive file and use it as SPARK_HOME.
   > 
   > ```
   >   @Experimental
   >   def addArchive(path: String): Unit = {
   >     addFile(path, false, false, isArchive = true)
   >   }
   > ```
   
   Sounds good. 
   But pyspark also requires Python environment on cluster nodes, whether this `addArchive` can be used for uploading Python interpreter and dependencies as well?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [incubator-kyuubi] turboFei closed issue #3896: [Improvement] Support to execute python language with yarn cluster mode

Posted by GitBox <gi...@apache.org>.
turboFei closed issue #3896: [Improvement] Support to execute python language with yarn cluster mode
URL: https://github.com/apache/incubator-kyuubi/issues/3896


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [incubator-kyuubi] bowenliang123 commented on issue #3896: [Improvement] Support to execute python language with yarn cluster mode

Posted by GitBox <gi...@apache.org>.
bowenliang123 commented on issue #3896:
URL: https://github.com/apache/incubator-kyuubi/issues/3896#issuecomment-1336421144

   It's a known unsatisfied capability as initial pyspark support was introduced.
   The challenge for preparing pyspark env for the python session worker, that comes with,
   1. make cluster mode upload the pyspark module or python folder 
   2. or install pyspark to python on every node in cluster mode
   3. correctly identify SPARK_HOME with pyspark module and tell the engine use it
   
   A possible temporary workaround in my mind is to fallback to pyspark installed on cluster nodes, and defer error message.
   1. allow empty $SPARK_HOME env before executing `execute_python.py`
   2. rely on python env on cluster nodes for the pyspark spport.
   3. check `if "pyspark" not in sys.modules:` with empty $SPARK_HOME env and thow the error message


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [incubator-kyuubi] turboFei commented on issue #3896: [Improvement] Support to execute python language with yarn cluster mode

Posted by GitBox <gi...@apache.org>.
turboFei commented on issue #3896:
URL: https://github.com/apache/incubator-kyuubi/issues/3896#issuecomment-1336180310

   cc @cfmcgrady 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [incubator-kyuubi] turboFei commented on issue #3896: [Improvement] Support to execute python language with yarn cluster mode

Posted by GitBox <gi...@apache.org>.
turboFei commented on issue #3896:
URL: https://github.com/apache/incubator-kyuubi/issues/3896#issuecomment-1336436076

   We can add a parameters for spark python language mode, for the spark archive.
   so that we can add archive for python mode on need.
   
   If SPARK_HOME is not defined,  invoke SparkContext::addArchive to add archive file and use it as SPARK_HOME.
   ```
     @Experimental
     def addArchive(path: String): Unit = {
       addFile(path, false, false, isArchive = true)
     }
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [incubator-kyuubi] bowenliang123 commented on issue #3896: [Improvement] Support to execute python language with yarn cluster mode

Posted by GitBox <gi...@apache.org>.
bowenliang123 commented on issue #3896:
URL: https://github.com/apache/incubator-kyuubi/issues/3896#issuecomment-1336431611

   > temp workaround
   > 
   > ```
   > --hivevar spark.yarn.dist.archives=hdfs:/path/to/spark.zip#spark_home \
   > --hivevar spark.yarn.appMasterEnv.SPARK_HOME=spark_home
   > ```
   And this must set before switching to python language.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org