You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by RussellSpitzer <gi...@git.apache.org> on 2018/01/17 20:20:51 UTC

[GitHub] spark pull request #20298: [SPARK-22976][Core]: Cluster mode driver dir remo...

GitHub user RussellSpitzer opened a pull request:

    https://github.com/apache/spark/pull/20298

    [SPARK-22976][Core]: Cluster mode driver dir removed while running

    ## What changes were proposed in this pull request?
    
    The clean up logic on the worker perviously determined the liveness of a
    particular applicaiton based on whether or not it had running executors.
    This would fail in the case that a directory was made for a driver
    running in cluster mode if that driver had no running executors on the
    same machine. To preserve driver directories we consider both executors
    and running drivers when checking directory liveness.
    
    ## How was this patch tested?
    
    Manually started up two node cluster with a single core on each node. Turned on worker directory cleanup and set the interval to 1 second and liveness to one second. Without the patch the driver directory is removed immediately after the app is launched. With the patch it is not
    
    
    ### Without Patch
    ```
    INFO  2018-01-05 23:48:24,693 Logging.scala:54 - Asked to launch driver driver-20180105234824-0000
    INFO  2018-01-05 23:48:25,293 Logging.scala:54 - Changing view acls to: cassandra
    INFO  2018-01-05 23:48:25,293 Logging.scala:54 - Changing modify acls to: cassandra
    INFO  2018-01-05 23:48:25,294 Logging.scala:54 - Changing view acls groups to:
    INFO  2018-01-05 23:48:25,294 Logging.scala:54 - Changing modify acls groups to:
    INFO  2018-01-05 23:48:25,294 Logging.scala:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(cassandra); groups with view permissions: Set(); users  with modify permissions: Set(cassandra); groups with modify permissions: Set()
    INFO  2018-01-05 23:48:25,330 Logging.scala:54 - Copying user jar file:/home/automaton/writeRead-0.1.jar to /var/lib/spark/worker/driver-20180105234824-0000/writeRead-0.1.jar
    INFO  2018-01-05 23:48:25,332 Logging.scala:54 - Copying /home/automaton/writeRead-0.1.jar to /var/lib/spark/worker/driver-20180105234824-0000/writeRead-0.1.jar
    INFO  2018-01-05 23:48:25,361 Logging.scala:54 - Launch Command: "/usr/lib/jvm/jdk1.8.0_40//bin/java" ....
    ****
    INFO  2018-01-05 23:48:56,577 Logging.scala:54 - Removing directory: /var/lib/spark/worker/driver-20180105234824-0000  ### << Cleaned up
    ****
    -- 
    One minute passes while app runs (app has 1 minute sleep built in)
    --
    
    WARN  2018-01-05 23:49:58,080 ShuffleSecretManager.java:73 - Attempted to unregister application app-20180105234831-0000 when it is not registered
    INFO  2018-01-05 23:49:58,081 ExternalShuffleBlockResolver.java:163 - Application app-20180105234831-0000 removed, cleanupLocalDirs = false
    INFO  2018-01-05 23:49:58,081 ExternalShuffleBlockResolver.java:163 - Application app-20180105234831-0000 removed, cleanupLocalDirs = false
    INFO  2018-01-05 23:49:58,082 ExternalShuffleBlockResolver.java:163 - Application app-20180105234831-0000 removed, cleanupLocalDirs = true
    INFO  2018-01-05 23:50:00,999 Logging.scala:54 - Driver driver-20180105234824-0000 exited successfully
    ```
    
    With Patch
    ```
    INFO  2018-01-08 23:19:54,603 Logging.scala:54 - Asked to launch driver driver-20180108231954-0002
    INFO  2018-01-08 23:19:54,975 Logging.scala:54 - Changing view acls to: automaton
    INFO  2018-01-08 23:19:54,976 Logging.scala:54 - Changing modify acls to: automaton
    INFO  2018-01-08 23:19:54,976 Logging.scala:54 - Changing view acls groups to:
    INFO  2018-01-08 23:19:54,976 Logging.scala:54 - Changing modify acls groups to:
    INFO  2018-01-08 23:19:54,976 Logging.scala:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(automaton); groups with view permissions: Set(); users  with modify permissions: Set(automaton); groups with modify permissions: Set()
    INFO  2018-01-08 23:19:55,029 Logging.scala:54 - Copying user jar file:/home/automaton/writeRead-0.1.jar to /var/lib/spark/worker/driver-20180108231954-0002/writeRead-0.1.jar
    INFO  2018-01-08 23:19:55,031 Logging.scala:54 - Copying /home/automaton/writeRead-0.1.jar to /var/lib/spark/worker/driver-20180108231954-0002/writeRead-0.1.jar
    INFO  2018-01-08 23:19:55,038 Logging.scala:54 - Launch Command: ......
    INFO  2018-01-08 23:21:28,674 ShuffleSecretManager.java:69 - Unregistered shuffle secret for application app-20180108232000-0000
    INFO  2018-01-08 23:21:28,675 ExternalShuffleBlockResolver.java:163 - Application app-20180108232000-0000 removed, cleanupLocalDirs = false
    INFO  2018-01-08 23:21:28,675 ExternalShuffleBlockResolver.java:163 - Application app-20180108232000-0000 removed, cleanupLocalDirs = false
    INFO  2018-01-08 23:21:28,681 ExternalShuffleBlockResolver.java:163 - Application app-20180108232000-0000 removed, cleanupLocalDirs = true
    INFO  2018-01-08 23:21:31,703 Logging.scala:54 - Driver driver-20180108231954-0002 exited successfully
    *****
    INFO  2018-01-08 23:21:32,346 Logging.scala:54 - Removing directory: /var/lib/spark/worker/driver-20180108231954-0002 ### < Happening AFTER the Run completes rather than during it
    *****
    ```
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/RussellSpitzer/spark SPARK-22976-master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20298.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20298
    
----
commit 38916f769252938fbce891cf1d21972e50a01181
Author: Russell Spitzer <ru...@...>
Date:   2018-01-17T20:13:57Z

    [SPARK-22976][Core]: Cluster mode driver dir removed while running
    
    The clean up logic on the worker perviously determined the liveness of a
    particular applicaiton based on whether or not it had running executors.
    This would fail in the case that a directory was made for a driver
    running in cluster mode if that driver had no running executors on the
    same machine. To preserve driver directories we consider both executors
    and running drivers when checking directory liveness.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20298: [SPARK-22976][Core]: Cluster mode driver dir removed whi...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/20298
  
    LGTM for the fix.
    
    @zsxwing would you please take another look?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20298: [SPARK-22976][Core]: Cluster mode driver dir removed whi...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/20298
  
    Merging to master/2.3, thanks for the fix!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20298: [SPARK-22976][Core]: Cluster mode driver dir removed whi...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/20298
  
    Jenkins, retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20298: [SPARK-22976][Core]: Cluster mode driver dir removed whi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20298
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86365/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20298: [SPARK-22976][Core]: Cluster mode driver dir removed whi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20298
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86307/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20298: [SPARK-22976][Core]: Cluster mode driver dir removed whi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20298
  
    **[Test build #86365 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86365/testReport)** for PR 20298 at commit [`38916f7`](https://github.com/apache/spark/commit/38916f769252938fbce891cf1d21972e50a01181).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20298: [SPARK-22976][Core]: Cluster mode driver dir removed whi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20298
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20298: [SPARK-22976][Core]: Cluster mode driver dir removed whi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20298
  
    **[Test build #86307 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86307/testReport)** for PR 20298 at commit [`38916f7`](https://github.com/apache/spark/commit/38916f769252938fbce891cf1d21972e50a01181).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20298: [SPARK-22976][Core]: Cluster mode driver dir removed whi...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/20298
  
    Jenkins, retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20298: [SPARK-22976][Core]: Cluster mode driver dir removed whi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20298
  
    **[Test build #86315 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86315/testReport)** for PR 20298 at commit [`38916f7`](https://github.com/apache/spark/commit/38916f769252938fbce891cf1d21972e50a01181).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20298: [SPARK-22976][Core]: Cluster mode driver dir removed whi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20298
  
    **[Test build #86307 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86307/testReport)** for PR 20298 at commit [`38916f7`](https://github.com/apache/spark/commit/38916f769252938fbce891cf1d21972e50a01181).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20298: [SPARK-22976][Core]: Cluster mode driver dir removed whi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20298
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20298: [SPARK-22976][Core]: Cluster mode driver dir removed whi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20298
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20298: [SPARK-22976][Core]: Cluster mode driver dir removed whi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20298
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20298: [SPARK-22976][Core]: Cluster mode driver dir removed whi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20298
  
    **[Test build #86315 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86315/testReport)** for PR 20298 at commit [`38916f7`](https://github.com/apache/spark/commit/38916f769252938fbce891cf1d21972e50a01181).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20298: [SPARK-22976][Core]: Cluster mode driver dir removed whi...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/20298
  
    ok to test.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20298: [SPARK-22976][Core]: Cluster mode driver dir removed whi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20298
  
    **[Test build #86365 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86365/testReport)** for PR 20298 at commit [`38916f7`](https://github.com/apache/spark/commit/38916f769252938fbce891cf1d21972e50a01181).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20298: [SPARK-22976][Core]: Cluster mode driver dir removed whi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20298
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86315/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20298: [SPARK-22976][Core]: Cluster mode driver dir removed whi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20298
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20298: [SPARK-22976][Core]: Cluster mode driver dir removed whi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20298
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20298: [SPARK-22976][Core]: Cluster mode driver dir remo...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/20298


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org