You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@dolphinscheduler.apache.org by GitBox <gi...@apache.org> on 2021/01/04 04:31:32 UTC

[GitHub] [incubator-dolphinscheduler] zhaorongsheng opened a new issue #4365: [Question] some questions about spark datasource

zhaorongsheng opened a new issue #4365:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/4365


   **For better global communication, Please describe it in English. If you feel the description in English is not clear, then you can append description in Chinese(just for Mandarin(CN)), thx! **
   **Describe the question**
   DS supports running sql in spark datasource. Is there any document about the detail mechanism of scheduling sql running in spark.
   As we know, sparkSql only support yarn-client mode and in this mode the client server need to be well configured. How does DS handle this problem?
   
   Looking forward to your reply~ thanks~


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-dolphinscheduler] zhuangchong commented on issue #4365: [Question] some questions about spark datasource

Posted by GitBox <gi...@apache.org>.

zhuangchong commented on issue #4365:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/4365#issuecomment-753752705


   Dolphin's spark datasource refers to SparkThriftServer, SparkThriftServer is offered by the spark, the user can through the JDBC/ODBC connection ThriftServer to access SparkSQL data, similar HiveServer2.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-dolphinscheduler] zhuangchong commented on issue #4365: [Question] some questions about spark datasource

Posted by GitBox <gi...@apache.org>.

zhuangchong commented on issue #4365:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/4365#issuecomment-754337233


   Yes, And Spark Thrift Server doesn't do as well as Hive in terms of permissions. I developed a toolkit that uses shell calls .it create context and read SQL statements in files to execute using code.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-dolphinscheduler] zhaorongsheng edited a comment on issue #4365: [Question] some questions about spark datasource

Posted by GitBox <gi...@apache.org>.

zhaorongsheng edited a comment on issue #4365:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/4365#issuecomment-757894563


   @zhuangchong Has your toolkit been merged into master?
   As we know spark driver needs more resource. If we use `sparkSql -f xx` the spark driver will run in `WorkerServer `. In this way the `WorkerServer` needs more resource. How should you handle this? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-dolphinscheduler] zhaorongsheng commented on issue #4365: [Question] some questions about spark datasource

Posted by GitBox <gi...@apache.org>.

zhaorongsheng commented on issue #4365:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/4365#issuecomment-757894563


   Has your toolkit merge into master?
   As we know spark driver needs more resource. If we use `sparkSql -f xx` the spark driver will run in `WorkerServer `. In this way the `WorkerServer` needs more resource. How should you handle this? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-dolphinscheduler] zhaorongsheng commented on issue #4365: [Question] some questions about spark datasource

Posted by GitBox <gi...@apache.org>.

zhaorongsheng commented on issue #4365:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/4365#issuecomment-753941415


   @zhuangchong I think the spark thrift server is not stable. Is there any other way to schedule sql by spark, e.g. `sparkSql -f sql.txt`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-dolphinscheduler] zhaorongsheng edited a comment on issue #4365: [Question] some questions about spark datasource

Posted by GitBox <gi...@apache.org>.

zhaorongsheng edited a comment on issue #4365:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/4365#issuecomment-757894563


   Has your toolkit been merged into master?
   As we know spark driver needs more resource. If we use `sparkSql -f xx` the spark driver will run in `WorkerServer `. In this way the `WorkerServer` needs more resource. How should you handle this? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org