You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jialin LIu (JIRA)" <ji...@apache.org> on 2018/11/28 02:30:00 UTC

[jira] [Created] (SPARK-26197) Spark master fails to detect driver process pause

Jialin LIu created SPARK-26197:
----------------------------------

             Summary: Spark master fails to detect driver process pause
                 Key: SPARK-26197
                 URL: https://issues.apache.org/jira/browse/SPARK-26197
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.3.2
            Reporter: Jialin LIu


I was using Spark 2.3.2 with standalone cluster and submit job using cluster mode. After I submit the job, I deliberately pause the driver process (throughout shell command "kill -stop (driver process id) ") to see if the master can detect this problem. The result shows that the driver will never stop. All the executors will try to talk back to driver and will give up in 10 minutes. Master can detect executor failures and try to reassign new executor process to redo the job. New executor will try to create RPC connection with driver and will fail in 2 minutes. Master will endlessly spawn new executors without detecting driver failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org