You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by zsxwing <gi...@git.apache.org> on 2016/01/06 19:28:55 UTC

[GitHub] spark pull request: [SPARK-12617][PySpark]Move Py4jCallbackConnect...

GitHub user zsxwing opened a pull request:

    https://github.com/apache/spark/pull/10621

    [SPARK-12617][PySpark]Move Py4jCallbackConnectionCleaner to Streaming

    Move Py4jCallbackConnectionCleaner to Streaming because the callback server starts only in StreamingContext.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zsxwing/spark SPARK-12617-2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10621.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10621
    
----
commit 329a78bdd4b4c41f466b217af98fbabdc1cc87d1
Author: Shixiong Zhu <sh...@databricks.com>
Date:   2016-01-06T18:27:28Z

    Move Py4jCallbackConnectionCleaner to Streaming

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12617][PySpark]Move Py4jCallbackConnect...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the pull request:

    https://github.com/apache/spark/pull/10621#issuecomment-169441845
  
    Merging to master and 1.6


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12617][PySpark]Move Py4jCallbackConnect...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10621#issuecomment-169423645
  
    **[Test build #48867 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48867/consoleFull)** for PR 10621 at commit [`329a78b`](https://github.com/apache/spark/commit/329a78bdd4b4c41f466b217af98fbabdc1cc87d1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12617][PySpark]Move Py4jCallbackConnect...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10621#issuecomment-169439946
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48867/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12617][PySpark]Move Py4jCallbackConnect...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10621#discussion_r48990790
  
    --- Diff: python/pyspark/streaming/context.py ---
    @@ -32,6 +33,63 @@
     __all__ = ["StreamingContext"]
     
     
    +class Py4jCallbackConnectionCleaner(object):
    +
    +    """
    +    A cleaner to clean up callback connections that are not closed by Py4j. See SPARK-12617.
    +    It will scan all callback connections every 30 seconds and close the dead connections.
    +    """
    +
    +    def __init__(self, gateway):
    +        self._gateway = gateway
    +        self._stopped = False
    +        self._timer = None
    +        self._lock = RLock()
    +
    +    def start(self):
    +        if self._stopped:
    +            return
    +
    +        def clean_closed_connections():
    +            from py4j.java_gateway import quiet_close, quiet_shutdown
    +
    +            callback_server = self._gateway._callback_server
    +            if callback_server:
    --- End diff --
    
    Add a defensive check


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10621: [SPARK-12617][PySpark]Move Py4jCallbackConnectionCleaner...

Posted by cpalomaressbd <gi...@git.apache.org>.
Github user cpalomaressbd commented on the issue:

    https://github.com/apache/spark/pull/10621
  
    Hi,
    
    One question, maybe is stupid question I dont know and I ask sorry in advance. We are workin with hortonworks, in the beginning of the project we were working with HDP 2.4.0 which one has the spark 1.6.0 and wich one has a bug in pyspark, this bug wich one you say resolved in:
    
    **zsxwing commented on 22 Feb 2016**
    
    The first solution we have done was upgrade to HDP 2.4.3 wich one has spark 1.6.2 and wich one teorically must to have the patch to this problem, but Surprise we have the same bug without the patch.
    
    Ok, Maybe the problem is with hortonworks and I should write in other forum, but I was in the oficial website of spark:
    
    https://spark.apache.org/downloads.html
    
    If you choose the oficial version of spark 1.6.2 and you download this version, wich one was released June 25 2016, if I go to pyspark file, the bug continue in this installation!!!! I am sorry but I dont understadn and We are desperated with this situation.
    
    Should I appply the patch in source and recompile the code? How can I do it step to step?
    
    Thanks in Advance.
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12617][PySpark]Move Py4jCallbackConnect...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the pull request:

    https://github.com/apache/spark/pull/10621#issuecomment-187328406
  
    @sarathj if you want to upgrade py4j to 0.9.1, you can just check-pick this patch: https://github.com/zsxwing/spark/commit/a3e3e1755897d21247399a3bc40336bde8e1d8b3
    
    If you don't want to upgrade py4j, just check-pick the following two patches should be enough:
    
    https://github.com/apache/spark/commit/f31d0fd9ea12bfe94434671fbcfe3d0e06a4a97d
    https://github.com/apache/spark/commit/d821fae0ecca6393d3632977797d72ba594d26a9


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12617][PySpark]Move Py4jCallbackConnect...

Posted by sarathj <gi...@git.apache.org>.
Github user sarathj commented on the pull request:

    https://github.com/apache/spark/pull/10621#issuecomment-187058482
  
    @zsxwing 
    With 1.6 also I ran into the same exception -` java.io.IOException: py4j.Py4JException: Cannot obtain a new communication`. 
    
    I would like to apply patch by using 0.9.1 version of py4j. Could you please let me know, how you applied the fix. Also I could not find `py4j-0.9.1-src.zip` link directly. Do I need to extract from the https://pypi.python.org/pypi/py4j and zip again?
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12617][PySpark]Move Py4jCallbackConnect...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the pull request:

    https://github.com/apache/spark/pull/10621#issuecomment-169418588
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12617][PySpark]Move Py4jCallbackConnect...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the pull request:

    https://github.com/apache/spark/pull/10621#issuecomment-169412217
  
    CC @davies 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12617][PySpark]Move Py4jCallbackConnect...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the pull request:

    https://github.com/apache/spark/pull/10621#issuecomment-185083006
  
    > @zsxwing 
    > Could you confirm that the latest version of py4j (0.9.1) got packaged with spark 1.5.2. 
    > The spark that got installed using AWS and the 1.5.2 tag (https://github.com/apache/spark/tree/v1.5.2/python/lib) contains 0.8.2.1.
    > 
    > Let me know, If I have missed anything..
    
    @sarathjiguru this bug exists in 1.5.2. You need to apply the patches by yourself for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12617][PySpark]Move Py4jCallbackConnect...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the pull request:

    https://github.com/apache/spark/pull/10621#issuecomment-169441936
  
    and 1.5


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12617][PySpark]Move Py4jCallbackConnect...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/10621


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12617][PySpark]Move Py4jCallbackConnect...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/10621#issuecomment-169415528
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12617][PySpark]Move Py4jCallbackConnect...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10621#issuecomment-169417430
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48864/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12617][PySpark]Move Py4jCallbackConnect...

Posted by sarathjiguru <gi...@git.apache.org>.
Github user sarathjiguru commented on the pull request:

    https://github.com/apache/spark/pull/10621#issuecomment-185044972
  
    @zsxwing 
    Could you confirm that the latest version of py4j (0.9.1) got packaged with spark 1.5.2. 
    The spark that got installed using AWS and the 1.5.2 tag (https://github.com/apache/spark/tree/v1.5.2/python/lib) contains 0.8.2.1. 
    
    Let me know, If I have missed anything..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12617][PySpark]Move Py4jCallbackConnect...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10621#issuecomment-169417428
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12617][PySpark]Move Py4jCallbackConnect...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10621#issuecomment-169439709
  
    **[Test build #48867 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48867/consoleFull)** for PR 10621 at commit [`329a78b`](https://github.com/apache/spark/commit/329a78bdd4b4c41f466b217af98fbabdc1cc87d1).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class Py4jCallbackConnectionCleaner(object):`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12617][PySpark]Move Py4jCallbackConnect...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10621#issuecomment-169439943
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org