You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by zsxwing <gi...@git.apache.org> on 2018/10/18 20:45:11 UTC

[GitHub] spark pull request #22770: [SPARK-25771][PYSPARK]Fix improper synchronizatio...

GitHub user zsxwing opened a pull request:

    https://github.com/apache/spark/pull/22770

    [SPARK-25771][PYSPARK]Fix improper synchronization in PythonWorkerFactory

    ## What changes were proposed in this pull request?
    
    Fix the following issues in PythonWorkerFactory
    - MonitorThread.run uses a wrong lock.
    - `createSimpleWorker` misses `synchronized` when updating `simpleWorkers`.
    
    Other changes are just to improve the code style to make the thread-safe contract clear.
    
    ## How was this patch tested?
    
    (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
    (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zsxwing/spark pwf

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22770.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22770
    
----
commit 0369de2323640377c0f990bb47ebe112654ca498
Author: Shixiong Zhu <zs...@...>
Date:   2018-10-18T20:39:04Z

    fix improper synchronization

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22770: [SPARK-25771][PYSPARK]Fix improper synchronization in Py...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22770
  
    Please fill `How was this patch tested?` as well in the PR description.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22770: [SPARK-25771][PYSPARK]Fix improper synchronizatio...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22770#discussion_r226515494
  
    --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala ---
    @@ -31,15 +32,15 @@ import org.apache.spark.security.SocketAuthHelper
     import org.apache.spark.util.{RedirectThread, Utils}
     
     private[spark] class PythonWorkerFactory(pythonExec: String, envVars: Map[String, String])
    -  extends Logging {
    +  extends Logging { self =>
     
       import PythonWorkerFactory._
     
       // Because forking processes from Java is expensive, we prefer to launch a single Python daemon,
       // pyspark/daemon.py (by default) and tell it to fork new workers for our tasks. This daemon
       // currently only works on UNIX-based systems now because it uses signals for child management,
       // so we can also fall back to launching workers, pyspark/worker.py (by default) directly.
    -  val useDaemon = {
    +  private val useDaemon = {
    --- End diff --
    
    `daemonModule` and `workerModule` too.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22770: [SPARK-25771][PYSPARK]Fix improper synchronizatio...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22770


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22770: [SPARK-25771][PYSPARK]Fix improper synchronization in Py...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22770
  
    **[Test build #97559 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97559/testReport)** for PR 22770 at commit [`0369de2`](https://github.com/apache/spark/commit/0369de2323640377c0f990bb47ebe112654ca498).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22770: [SPARK-25771][PYSPARK]Fix improper synchronizatio...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22770#discussion_r226459609
  
    --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala ---
    @@ -163,7 +172,9 @@ private[spark] class PythonWorkerFactory(pythonExec: String, envVars: Map[String
           try {
             val socket = serverSocket.accept()
             authHelper.authClient(socket)
    -        simpleWorkers.put(socket, worker)
    +        self.synchronized {
    --- End diff --
    
    this is fix 2.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22770: [SPARK-25771][PYSPARK]Fix improper synchronization in Py...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22770
  
    **[Test build #97559 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97559/testReport)** for PR 22770 at commit [`0369de2`](https://github.com/apache/spark/commit/0369de2323640377c0f990bb47ebe112654ca498).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22770: [SPARK-25771][PYSPARK]Fix improper synchronizatio...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22770#discussion_r227056168
  
    --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala ---
    @@ -31,15 +32,15 @@ import org.apache.spark.security.SocketAuthHelper
     import org.apache.spark.util.{RedirectThread, Utils}
     
     private[spark] class PythonWorkerFactory(pythonExec: String, envVars: Map[String, String])
    -  extends Logging {
    +  extends Logging { self =>
     
       import PythonWorkerFactory._
     
       // Because forking processes from Java is expensive, we prefer to launch a single Python daemon,
       // pyspark/daemon.py (by default) and tell it to fork new workers for our tasks. This daemon
       // currently only works on UNIX-based systems now because it uses signals for child management,
       // so we can also fall back to launching workers, pyspark/worker.py (by default) directly.
    -  val useDaemon = {
    +  private val useDaemon = {
    --- End diff --
    
    Fixing these since I'm touching a lot of fields in this file.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22770: [SPARK-25771][PYSPARK]Fix improper synchronization in Py...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the issue:

    https://github.com/apache/spark/pull/22770
  
    Thanks for reviewing this. Merging to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22770: [SPARK-25771][PYSPARK]Fix improper synchronization in Py...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22770
  
    Please feel `How was this patch tested?` as well in the PR description.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22770: [SPARK-25771][PYSPARK]Fix improper synchronization in Py...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22770
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97559/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22770: [SPARK-25771][PYSPARK]Fix improper synchronizatio...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22770#discussion_r226515461
  
    --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala ---
    @@ -31,15 +32,15 @@ import org.apache.spark.security.SocketAuthHelper
     import org.apache.spark.util.{RedirectThread, Utils}
     
     private[spark] class PythonWorkerFactory(pythonExec: String, envVars: Map[String, String])
    -  extends Logging {
    +  extends Logging { self =>
     
       import PythonWorkerFactory._
     
       // Because forking processes from Java is expensive, we prefer to launch a single Python daemon,
       // pyspark/daemon.py (by default) and tell it to fork new workers for our tasks. This daemon
       // currently only works on UNIX-based systems now because it uses signals for child management,
       // so we can also fall back to launching workers, pyspark/worker.py (by default) directly.
    -  val useDaemon = {
    +  private val useDaemon = {
    --- End diff --
    
    Why are we fixing this? Looks not directly related to the fix and this is already private class.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22770: [SPARK-25771][PYSPARK]Fix improper synchronization in Py...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22770
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22770: [SPARK-25771][PYSPARK]Fix improper synchronizatio...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22770#discussion_r226459546
  
    --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala ---
    @@ -278,7 +289,7 @@ private[spark] class PythonWorkerFactory(pythonExec: String, envVars: Map[String
     
         override def run() {
           while (true) {
    -        synchronized {
    +        self.synchronized {
    --- End diff --
    
    this is fix 1


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22770: [SPARK-25771][PYSPARK]Fix improper synchronization in Py...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22770
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22770: [SPARK-25771][PYSPARK]Fix improper synchronization in Py...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22770
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4102/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org