You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@dolphinscheduler.apache.org by GitBox <gi...@apache.org> on 2022/08/04 12:24:59 UTC

[GitHub] [dolphinscheduler] gcnyin opened a new issue, #11304: [Bug] [Spakr-SQL] Can not run Spark-SQL task in k8s mode

gcnyin opened a new issue, #11304:
URL: https://github.com/apache/dolphinscheduler/issues/11304

   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar issues.
   
   
   ### What happened
   
   I'v already create the dolphin-scheduler in k8s using helm, and have added the spark in `/opt/soft/spark1` and `/opt/soft/spark2`.
   
   Spark version: 3.2.2.
   
   K8s version: 1.23.
   
   I submitted a spark-sql task and started, it threw an error. But I can run it correctly in my local machine.
   
   It looks like dolphin-scheduler doesn't read correctly `SPARK1_HOME` env variable.
   
   ```
   [LOG-PATH]: /opt/dolphinscheduler/logs/20220802/6402332966880_2-1-11.log, [HOST]:  Host{address='dolphinscheduler-worker-2.dolphinscheduler-worker-headless:1234', ip='dolphinscheduler-worker-2.dolphinscheduler-worker-headless', port=1234}
   [INFO] 2022-08-02 22:48:15.754 +0800 [taskAppId=TASK-20220802-6402332966880_2-1-11] TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.spark.SparkTask:[69] - spark task params {"localParams":[],"rawScript":"create table ods_space\n(\n    project_uuid binary,\n    space_uuid binary,\n    type string,\n    is_deleted boolean\n) using jdbc options (\n    dbtable = \"xxx\",\n    driver = \"com.mysql.cj.jdbc.Driver\",\n    url = \"jdbc:mysql://xxx:3306\",\n    user = \"xxx\",\n    password = \"xxx\"\n);\n\ncreate table ods_space\n(\n    project_uuid binary,\n    date date,\n    count int\n) using jdbc options (\n    dbtable = \"xxx\",\n    driver = \"com.mysql.cj.jdbc.Driver\",\n    url = \"jdbc:mysql://xxx:3306\",\n    user = \"xxx\",\n    password = \"xxx\"\n);\n\ninsert into ods_space\nfrom (select project_uuid,\n            count(1)\n    from ods_space\n    where type = 'room'\n    group by project_uuid);\n","resourceList":[],"programType":"SQL","mainClass":"","deployMode"
 :"local","appName":"sta_analysis_job","sparkVersion":"SPARK2","driverCores":1,"driverMemory":"512M","numExecutors":2,"executorMemory":"2G","executorCores":2}
   [INFO] 2022-08-02 22:48:15.770 +0800 [taskAppId=TASK-20220802-6402332966880_2-1-11] TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.spark.SparkTask:[240] - raw script : create table ods_space
   (
       project_uuid binary,
       space_uuid binary,
       type string
   ) using jdbc options (
       dbtable = "xxx",
       driver = "com.mysql.cj.jdbc.Driver",
       url = "jdbc:mysql://xxx:3306",
       user = "xxx",
       password = "xxx"
   );
   
   create table ods_space
   (
       project_uuid binary,
       date date,
       count int
   ) using jdbc options (
       dbtable = "xxx",
       driver = "com.mysql.cj.jdbc.Driver",
       url = "jdbc:mysql://xxx:3306",
       user = "xxx",
       password = "xxx"
   );
   
   insert into ods_space
   from (select project_uuid,
               count(1)
       from ods_space
       where type = 'room'
       group by project_uuid);
   
   [INFO] 2022-08-02 22:48:15.771 +0800 [taskAppId=TASK-20220802-6402332966880_2-1-11] TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.spark.SparkTask:[241] - task execute path : /tmp/dolphinscheduler/exec/process/6397603455328/6402332966880_2/1/11
   [INFO] 2022-08-02 22:48:15.776 +0800 [taskAppId=TASK-20220802-6402332966880_2-1-11] TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.spark.SparkTask:[130] - spark task command: ${SPARK_HOME2}/bin/spark-sql --master local --driver-cores 1 --driver-memory 512M --num-executors 2 --executor-cores 2 --executor-memory 2G --name sta_analysis_job -f /tmp/dolphinscheduler/exec/process/6397603455328/6402332966880_2/1/11/1_11_node.sql
   [INFO] 2022-08-02 22:48:15.777 +0800 [taskAppId=TASK-20220802-6402332966880_2-1-11] TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.spark.SparkTask:[85] - tenantCode user:tenant-01, task dir:1_11
   [INFO] 2022-08-02 22:48:15.777 +0800 [taskAppId=TASK-20220802-6402332966880_2-1-11] TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.spark.SparkTask:[90] - create command file:/tmp/dolphinscheduler/exec/process/6397603455328/6402332966880_2/1/11/1_11.command
   [INFO] 2022-08-02 22:48:15.777 +0800 [taskAppId=TASK-20220802-6402332966880_2-1-11] TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.spark.SparkTask:[116] - command : #!/bin/sh
   BASEDIR=$(cd `dirname $0`; pwd)
   cd $BASEDIR
   source /opt/dolphinscheduler/conf/dolphinscheduler_env.sh
   ${SPARK_HOME2}/bin/spark-sql --master local --driver-cores 1 --driver-memory 512M --num-executors 2 --executor-cores 2 --executor-memory 2G --name sta_analysis_job -f /tmp/dolphinscheduler/exec/process/6397603455328/6402332966880_2/1/11/1_11_node.sql
   [INFO] 2022-08-02 22:48:15.802 +0800 [taskAppId=TASK-20220802-6402332966880_2-1-11] TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.spark.SparkTask:[290] - task run command: sudo -u tenant-01 sh /tmp/dolphinscheduler/exec/process/6397603455328/6402332966880_2/1/11/1_11.command
   [INFO] 2022-08-02 22:48:15.805 +0800 [taskAppId=TASK-20220802-6402332966880_2-1-11] TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.spark.SparkTask:[181] - process start, process id is: 201
   [INFO] 2022-08-02 22:48:15.816 +0800 [taskAppId=TASK-20220802-6402332966880_2-1-11] TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.spark.SparkTask:[205] - process has exited, execute path:/tmp/dolphinscheduler/exec/process/6397603455328/6402332966880_2/1/11, processId:201 ,exitStatusCode:127 ,processWaitForStatus:true ,processExitValue:127
   [INFO] 2022-08-02 22:48:16.805 +0800 [taskAppId=TASK-20220802-6402332966880_2-1-11] TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.spark.SparkTask:[63] -  -> /tmp/dolphinscheduler/exec/process/6397603455328/6402332966880_2/1/11/1_11.command: 4: /tmp/dolphinscheduler/exec/process/6397603455328/6402332966880_2/1/11/1_11.command: source: not found
   	/tmp/dolphinscheduler/exec/process/6397603455328/6402332966880_2/1/11/1_11.command: 5: /tmp/dolphinscheduler/exec/process/6397603455328/6402332966880_2/1/11/1_11.command: /bin/spark-sql: not found
   [INFO] 2022-08-02 22:48:16.808 +0800 [taskAppId=TASK-20220802-6402332966880_2-1-11] TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.spark.SparkTask:[57] - FINALIZE_SESSION
   ```
   
   I used soft-link to link `${SPARK_HOME2}/bin/spark-sql` to `/bin/spark-sql`, can fix it, but then, threw another similar error:
   
   ```
   ${SPARK_HOME2}/bin/spark-submmit not found
   ```
   
   ### What you expected to happen
   
   Running the spark-sql task correctly.
   
   ### How to reproduce
   
   Create any spark-sql task and run.
   
   ### Anything else
   
   none
   
   ### Version
   
   3.0.0-beta-2
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] gcnyin closed issue #11304: [Bug] [Spakr-SQL] Can not run Spark-SQL task in k8s mode

Posted by GitBox <gi...@apache.org>.
gcnyin closed issue #11304: [Bug] [Spakr-SQL] Can not run Spark-SQL task in k8s mode
URL: https://github.com/apache/dolphinscheduler/issues/11304


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] gcnyin commented on issue #11304: [Bug] [Spakr-SQL] Can not run Spark-SQL task in k8s mode

Posted by GitBox <gi...@apache.org>.
gcnyin commented on issue #11304:
URL: https://github.com/apache/dolphinscheduler/issues/11304#issuecomment-1207226363

   <img width="1202" alt="Screen Shot 2022-08-06 at 22 40 11" src="https://user-images.githubusercontent.com/53973962/183253616-6ccc0180-2359-48a3-9d5e-df09287745ca.png">
   
   I think I may find the reason.
   
   The script I highlighted is the key point.
   
   I manually ran it in Dolphin-Scheduler Worker shell, and found the `$SPARK_HOME1` env variable is empty.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] github-actions[bot] commented on issue #11304: [Bug] [Spakr-SQL] Can not run Spark-SQL task in k8s mode

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #11304:
URL: https://github.com/apache/dolphinscheduler/issues/11304#issuecomment-1205187097

   Thank you for your feedback, we have received your issue, Please wait patiently for a reply.
   * In order for us to understand your request as soon as possible, please provide detailed information、version or pictures.
   * If you haven't received a reply for a long time, you can [join our slack](https://s.apache.org/dolphinscheduler-slack) and send your question to channel `#troubleshooting`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org