You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Mike Chan (JIRA)" <ji...@apache.org> on 2019/03/24 15:38:00 UTC

[jira] [Updated] (SPARK-27264) spark sql released all executor but the job is not done

     [ https://issues.apache.org/jira/browse/SPARK-27264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Chan updated SPARK-27264:
------------------------------
    Environment: Azure HDinsight spark 2.4 on Azure storage SQL: Read and Join some data and finally write result to a Hive metastore; query executed on jupyterhub; while the pre-migration cluster is a jupyter (non-hub)  (was: Azure HDinsight spark 2.4 on Azure storage SQL: Read and Join some data and finally write result to a Hive metastore)

> spark sql released all executor but the job is not done
> -------------------------------------------------------
>
>                 Key: SPARK-27264
>                 URL: https://issues.apache.org/jira/browse/SPARK-27264
>             Project: Spark
>          Issue Type: Question
>          Components: SQL
>    Affects Versions: 2.4.0
>         Environment: Azure HDinsight spark 2.4 on Azure storage SQL: Read and Join some data and finally write result to a Hive metastore; query executed on jupyterhub; while the pre-migration cluster is a jupyter (non-hub)
>            Reporter: Mike Chan
>            Priority: Major
>
> I have a spark sql that used to execute < 10 mins now running at 3 hours after a cluster migration and need to deep dive on what it's actually doing. I'm new to spark and please don't mind if I'm asking something unrelated.
> Increased spark.executor.memory but no luck. Env: Azure HDinsight spark 2.4 on Azure storage SQL: Read and Join some data and finally write result to a Hive metastore
> The sparl.sql ends with below code: .write.mode("overwrite").saveAsTable("default.mikemiketable")
> Application Behavior: Within the first 15 mins, it loads and complete most tasks (199/200); left only 1 executor process alive and continually to shuffle read / write data. Because now it only leave 1 executor, we need to wait 3 hours until this application finish. [!https://i.stack.imgur.com/6hqvh.png!|https://i.stack.imgur.com/6hqvh.png]
> Left only 1 executor alive [!https://i.stack.imgur.com/55162.png!|https://i.stack.imgur.com/55162.png]
> Not sure what's the executor doing: [!https://i.stack.imgur.com/TwhuX.png!|https://i.stack.imgur.com/TwhuX.png]
> From time to time, we can tell the shuffle read increased: [!https://i.stack.imgur.com/WhF9A.png!|https://i.stack.imgur.com/WhF9A.png]
> Therefore I increased the spark.executor.memory to 20g, but nothing changed. From Ambari and YARN I can tell the cluster has many resources left. [!https://i.stack.imgur.com/pngQA.png!|https://i.stack.imgur.com/pngQA.png]
> Release of almost all executor [!https://i.stack.imgur.com/pA134.png!|https://i.stack.imgur.com/pA134.png]
> Any guidance is greatly appreciated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org