You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by davies <gi...@git.apache.org> on 2015/02/14 00:40:53 UTC

[GitHub] spark pull request: [SPARK-5363] [PySpark] check ending mark in no...

GitHub user davies opened a pull request:

    https://github.com/apache/spark/pull/4601

    [SPARK-5363] [PySpark] check ending mark in non-block way

    There is chance of dead lock that the Python process is waiting for ending mark from JVM, but which is eaten by corrupted stream.
    
    This PR checks the ending mark from Python in non-block way, so it will not blocked by Python process.
    
    There is a small chance that the ending mark is sent by Python process but not available right now, then Python process will not be used.
    
    cc @JoshRosen @pwendell 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/davies/spark freeze

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/4601.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4601
    
----
commit 05e1085aeed4378343ae88538cfb1c5bd2fa587f
Author: Davies Liu <da...@databricks.com>
Date:   2015-02-13T23:27:49Z

    check ending mark in non-block way

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5363] [PySpark] check ending mark in no...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4601#discussion_r24786498
  
    --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala ---
    @@ -144,11 +144,24 @@ private[spark] class PythonRDD(
                     stream.readFully(update)
                     accumulator += Collections.singletonList(update)
                   }
    +
                   // Check whether the worker is ready to be re-used.
    -              if (stream.readInt() == SpecialLengths.END_OF_STREAM) {
    -                if (reuse_worker) {
    -                  env.releasePythonWorker(pythonExec, envVars.toMap, worker)
    -                  released = true
    +              if (reuse_worker) {
    +                // It has a high possibility that the ending mark is already available,
    +                // And current task should not be blocked by checking it
    +
    +                if (stream.available() >= 4) {
    +                  val ending = stream.readInt()
    +                  if (ending == SpecialLengths.END_OF_STREAM) {
    +                    env.releasePythonWorker(pythonExec, envVars.toMap, worker)
    +                    released = true
    +                    logInfo(s"Communication with worker ended cleanly, re-use it: $worker")
    --- End diff --
    
    I'd like use INFO here, then we can expect one logging for each task, saying it's re-used or not.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5363] [PySpark] check ending mark in no...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4601#issuecomment-74602935
  
      [Test build #27600 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27600/consoleFull) for   PR 4601 at commit [`890329c`](https://github.com/apache/spark/commit/890329c5cb1346e46f5b47c086def5bbb294329c).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5363] [PySpark] check ending mark in no...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4601#discussion_r24824713
  
    --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala ---
    @@ -144,11 +144,24 @@ private[spark] class PythonRDD(
                     stream.readFully(update)
                     accumulator += Collections.singletonList(update)
                   }
    +
                   // Check whether the worker is ready to be re-used.
    -              if (stream.readInt() == SpecialLengths.END_OF_STREAM) {
    -                if (reuse_worker) {
    -                  env.releasePythonWorker(pythonExec, envVars.toMap, worker)
    -                  released = true
    +              if (reuse_worker) {
    +                // It has a high possibility that the ending mark is already available,
    +                // And current task should not be blocked by checking it
    +
    +                if (stream.available() >= 4) {
    --- End diff --
    
    Yeah, let's revert and continue to investigate.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5363] [PySpark] check ending mark in no...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4601#discussion_r24801227
  
    --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala ---
    @@ -144,11 +144,24 @@ private[spark] class PythonRDD(
                     stream.readFully(update)
                     accumulator += Collections.singletonList(update)
                   }
    +
                   // Check whether the worker is ready to be re-used.
    -              if (stream.readInt() == SpecialLengths.END_OF_STREAM) {
    -                if (reuse_worker) {
    -                  env.releasePythonWorker(pythonExec, envVars.toMap, worker)
    -                  released = true
    +              if (reuse_worker) {
    +                // It has a high possibility that the ending mark is already available,
    +                // And current task should not be blocked by checking it
    +
    +                if (stream.available() >= 4) {
    --- End diff --
    
    @JoshRosen  This does not work very well in practice, it's common to see some workers can not be re-used, I will try to find a better solution, or revert this? (because it seems that it did not solve the freeze problem). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5363] [PySpark] check ending mark in no...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/4601#issuecomment-74615791
  
    LGTM.  I'm going to merge this into `master` (1.4.0), `branch-1.3` (1.3.0), and `branch-1.2` (1.2.2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5363] [PySpark] check ending mark in no...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4601#discussion_r24785836
  
    --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala ---
    @@ -144,11 +144,24 @@ private[spark] class PythonRDD(
                     stream.readFully(update)
                     accumulator += Collections.singletonList(update)
                   }
    +
                   // Check whether the worker is ready to be re-used.
    -              if (stream.readInt() == SpecialLengths.END_OF_STREAM) {
    -                if (reuse_worker) {
    -                  env.releasePythonWorker(pythonExec, envVars.toMap, worker)
    -                  released = true
    +              if (reuse_worker) {
    +                // It has a high possibility that the ending mark is already available,
    +                // And current task should not be blocked by checking it
    +
    +                if (stream.available() >= 4) {
    +                  val ending = stream.readInt()
    +                  if (ending == SpecialLengths.END_OF_STREAM) {
    +                    env.releasePythonWorker(pythonExec, envVars.toMap, worker)
    +                    released = true
    +                    logInfo(s"Socket is ended cleanly, reuse it: $worker")
    +                  } else {
    +                    logInfo(s"Socket is not ended cleanly (ending with $ending), " +
    --- End diff --
    
    It's a normal case, could happen when user calls take(), I think it should be INFO.
    
    For the first one, INFO or DEBUG both work for me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5363] [PySpark] check ending mark in no...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/4601#issuecomment-74616323
  
    I merged this into `master` (1.4.0), `branch-1.3` (1.3.0), and `branch-1.2` (1.2.2), but did so _right_ before I noticed that there's [a comment](https://issues.apache.org/jira/browse/SPARK-5363?focusedCommentId=14323623&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14323623) on JIRA suggesting that this didn't fix the freeze.  I guess I was a bit too trigger-happy here since I wanted to try to squeeze a fix in for 1.3.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5363] [PySpark] check ending mark in no...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4601#discussion_r24784064
  
    --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala ---
    @@ -144,9 +144,12 @@ private[spark] class PythonRDD(
                     stream.readFully(update)
                     accumulator += Collections.singletonList(update)
                   }
    +
                   // Check whether the worker is ready to be re-used.
    -              if (stream.readInt() == SpecialLengths.END_OF_STREAM) {
    -                if (reuse_worker) {
    +              if (reuse_worker) {
    +                // Tt has a high possibility that the ending mark is already available,
    --- End diff --
    
    Typo: "Tt"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5363] [PySpark] check ending mark in no...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4601#discussion_r24785741
  
    --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala ---
    @@ -144,11 +144,24 @@ private[spark] class PythonRDD(
                     stream.readFully(update)
                     accumulator += Collections.singletonList(update)
                   }
    +
                   // Check whether the worker is ready to be re-used.
    -              if (stream.readInt() == SpecialLengths.END_OF_STREAM) {
    -                if (reuse_worker) {
    -                  env.releasePythonWorker(pythonExec, envVars.toMap, worker)
    -                  released = true
    +              if (reuse_worker) {
    +                // It has a high possibility that the ending mark is already available,
    +                // And current task should not be blocked by checking it
    +
    +                if (stream.available() >= 4) {
    +                  val ending = stream.readInt()
    +                  if (ending == SpecialLengths.END_OF_STREAM) {
    +                    env.releasePythonWorker(pythonExec, envVars.toMap, worker)
    +                    released = true
    +                    logInfo(s"Socket is ended cleanly, reuse it: $worker")
    --- End diff --
    
    Worker does not exit, it will be reused. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5363] [PySpark] check ending mark in no...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4601#issuecomment-74595638
  
      [Test build #27600 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27600/consoleFull) for   PR 4601 at commit [`890329c`](https://github.com/apache/spark/commit/890329c5cb1346e46f5b47c086def5bbb294329c).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5363] [PySpark] check ending mark in no...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4601#discussion_r24785408
  
    --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala ---
    @@ -144,11 +144,24 @@ private[spark] class PythonRDD(
                     stream.readFully(update)
                     accumulator += Collections.singletonList(update)
                   }
    +
                   // Check whether the worker is ready to be re-used.
    -              if (stream.readInt() == SpecialLengths.END_OF_STREAM) {
    -                if (reuse_worker) {
    -                  env.releasePythonWorker(pythonExec, envVars.toMap, worker)
    -                  released = true
    +              if (reuse_worker) {
    +                // It has a high possibility that the ending mark is already available,
    +                // And current task should not be blocked by checking it
    +
    +                if (stream.available() >= 4) {
    +                  val ending = stream.readInt()
    +                  if (ending == SpecialLengths.END_OF_STREAM) {
    +                    env.releasePythonWorker(pythonExec, envVars.toMap, worker)
    +                    released = true
    +                    logInfo(s"Socket is ended cleanly, reuse it: $worker")
    +                  } else {
    +                    logInfo(s"Socket is not ended cleanly (ending with $ending), " +
    --- End diff --
    
    The "ended cleanly" case can probably stay at `logInfo` (or maybe `logDebug`), but I think we should make this case and the other error-case into warnings so that they aren't swallowed at lower log levels.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5363] [PySpark] check ending mark in no...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4601#issuecomment-74347029
  
      [Test build #27463 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27463/consoleFull) for   PR 4601 at commit [`05e1085`](https://github.com/apache/spark/commit/05e1085aeed4378343ae88538cfb1c5bd2fa587f).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5363] [PySpark] check ending mark in no...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4601#issuecomment-74604676
  
      [Test build #27598 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27598/consoleFull) for   PR 4601 at commit [`656d544`](https://github.com/apache/spark/commit/656d544b899e498abbdcea36672dfdc63ce4e239).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5363] [PySpark] check ending mark in no...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/4601#issuecomment-74690195
  
    Reverted in `master` (1.4.0), `branch-1.3` (1.3.0), and `branch-1.2` (1.2.2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5363] [PySpark] check ending mark in no...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4601#discussion_r24835697
  
    --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala ---
    @@ -144,11 +144,24 @@ private[spark] class PythonRDD(
                     stream.readFully(update)
                     accumulator += Collections.singletonList(update)
                   }
    +
                   // Check whether the worker is ready to be re-used.
    -              if (stream.readInt() == SpecialLengths.END_OF_STREAM) {
    -                if (reuse_worker) {
    -                  env.releasePythonWorker(pythonExec, envVars.toMap, worker)
    -                  released = true
    +              if (reuse_worker) {
    +                // It has a high possibility that the ending mark is already available,
    +                // And current task should not be blocked by checking it
    +
    +                if (stream.available() >= 4) {
    --- End diff --
    
    Great, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5363] [PySpark] check ending mark in no...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4601#issuecomment-74602902
  
      [Test build #27602 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27602/consoleFull) for   PR 4601 at commit [`e15a8c3`](https://github.com/apache/spark/commit/e15a8c352be330a422620ea4c051c8db48064861).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5363] [PySpark] check ending mark in no...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4601#issuecomment-74353231
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27463/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5363] [PySpark] check ending mark in no...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4601#issuecomment-74602823
  
      [Test build #27602 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27602/consoleFull) for   PR 4601 at commit [`e15a8c3`](https://github.com/apache/spark/commit/e15a8c352be330a422620ea4c051c8db48064861).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5363] [PySpark] check ending mark in no...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4601#issuecomment-74602903
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27602/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5363] [PySpark] check ending mark in no...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/4601#issuecomment-74594328
  
    Can we add logging for the uncommon cases here?  I'd add a log message for the case where the next integer is not available and a second case for when it's not END_OF_STREAM (this log message should contain the actual integer received).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5363] [PySpark] check ending mark in no...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4601#discussion_r24785809
  
    --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala ---
    @@ -144,11 +144,24 @@ private[spark] class PythonRDD(
                     stream.readFully(update)
                     accumulator += Collections.singletonList(update)
                   }
    +
                   // Check whether the worker is ready to be re-used.
    -              if (stream.readInt() == SpecialLengths.END_OF_STREAM) {
    -                if (reuse_worker) {
    -                  env.releasePythonWorker(pythonExec, envVars.toMap, worker)
    -                  released = true
    +              if (reuse_worker) {
    +                // It has a high possibility that the ending mark is already available,
    +                // And current task should not be blocked by checking it
    +
    +                if (stream.available() >= 4) {
    +                  val ending = stream.readInt()
    +                  if (ending == SpecialLengths.END_OF_STREAM) {
    +                    env.releasePythonWorker(pythonExec, envVars.toMap, worker)
    +                    released = true
    +                    logInfo(s"Socket is ended cleanly, reuse it: $worker")
    --- End diff --
    
    Ah, right. It's not correct to say "Task finished", either, so I suppose this is fine as-is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5363] [PySpark] check ending mark in no...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4601#issuecomment-74602939
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27600/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5363] [PySpark] check ending mark in no...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4601#discussion_r24785867
  
    --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala ---
    @@ -144,11 +144,24 @@ private[spark] class PythonRDD(
                     stream.readFully(update)
                     accumulator += Collections.singletonList(update)
                   }
    +
                   // Check whether the worker is ready to be re-used.
    -              if (stream.readInt() == SpecialLengths.END_OF_STREAM) {
    -                if (reuse_worker) {
    -                  env.releasePythonWorker(pythonExec, envVars.toMap, worker)
    -                  released = true
    +              if (reuse_worker) {
    +                // It has a high possibility that the ending mark is already available,
    +                // And current task should not be blocked by checking it
    +
    +                if (stream.available() >= 4) {
    +                  val ending = stream.readInt()
    +                  if (ending == SpecialLengths.END_OF_STREAM) {
    +                    env.releasePythonWorker(pythonExec, envVars.toMap, worker)
    +                    released = true
    +                    logInfo(s"Socket is ended cleanly, reuse it: $worker")
    --- End diff --
    
    Or `Communication ends cleanly`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5363] [PySpark] check ending mark in no...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4601#issuecomment-74604681
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27598/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5363] [PySpark] check ending mark in no...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4601#discussion_r24785519
  
    --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala ---
    @@ -144,11 +144,24 @@ private[spark] class PythonRDD(
                     stream.readFully(update)
                     accumulator += Collections.singletonList(update)
                   }
    +
                   // Check whether the worker is ready to be re-used.
    -              if (stream.readInt() == SpecialLengths.END_OF_STREAM) {
    -                if (reuse_worker) {
    -                  env.releasePythonWorker(pythonExec, envVars.toMap, worker)
    -                  released = true
    +              if (reuse_worker) {
    +                // It has a high possibility that the ending mark is already available,
    +                // And current task should not be blocked by checking it
    +
    +                if (stream.available() >= 4) {
    +                  val ending = stream.readInt()
    +                  if (ending == SpecialLengths.END_OF_STREAM) {
    +                    env.releasePythonWorker(pythonExec, envVars.toMap, worker)
    +                    released = true
    +                    logInfo(s"Socket is ended cleanly, reuse it: $worker")
    --- End diff --
    
    Instead of "Socket is"..., maybe it would be clearer to say something like "Worker exited cleanly, so re-use it ($worker)".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5363] [PySpark] check ending mark in no...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/4601


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5363] [PySpark] check ending mark in no...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/4601#issuecomment-74602782
  
    @JoshRosen updated


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5363] [PySpark] check ending mark in no...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/4601#issuecomment-74599062
  
    Minor log-level nitpicking aside, this looks good to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5363] [PySpark] check ending mark in no...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4601#issuecomment-74594748
  
      [Test build #27598 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27598/consoleFull) for   PR 4601 at commit [`656d544`](https://github.com/apache/spark/commit/656d544b899e498abbdcea36672dfdc63ce4e239).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5363] [PySpark] check ending mark in no...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4601#issuecomment-74353223
  
      [Test build #27463 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27463/consoleFull) for   PR 4601 at commit [`05e1085`](https://github.com/apache/spark/commit/05e1085aeed4378343ae88538cfb1c5bd2fa587f).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  case class Params(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5363] [PySpark] check ending mark in no...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4601#discussion_r24785984
  
    --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala ---
    @@ -144,11 +144,24 @@ private[spark] class PythonRDD(
                     stream.readFully(update)
                     accumulator += Collections.singletonList(update)
                   }
    +
                   // Check whether the worker is ready to be re-used.
    -              if (stream.readInt() == SpecialLengths.END_OF_STREAM) {
    -                if (reuse_worker) {
    -                  env.releasePythonWorker(pythonExec, envVars.toMap, worker)
    -                  released = true
    +              if (reuse_worker) {
    +                // It has a high possibility that the ending mark is already available,
    +                // And current task should not be blocked by checking it
    +
    +                if (stream.available() >= 4) {
    +                  val ending = stream.readInt()
    +                  if (ending == SpecialLengths.END_OF_STREAM) {
    +                    env.releasePythonWorker(pythonExec, envVars.toMap, worker)
    +                    released = true
    +                    logInfo(s"Socket is ended cleanly, reuse it: $worker")
    --- End diff --
    
    Good idea; how about "Communication with worker ended cleanly"?  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org