You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by holdenk <gi...@git.apache.org> on 2016/05/11 21:59:07 UTC

[GitHub] spark pull request: [SPARK-15061][WIP][PySpark] Upgrade to Py4J 0....

GitHub user holdenk opened a pull request:

    https://github.com/apache/spark/pull/13064

    [SPARK-15061][WIP][PySpark] Upgrade to Py4J 0.10.1

    ## What changes were proposed in this pull request?
    
    This upgrades to Py4J 0.10.1 which reduces syscal overhead in Java gateway ( see https://github.com/bartdag/py4j/issues/201 ). Related https://issues.apache.org/jira/browse/SPARK-6728 .
    
    
    ## How was this patch tested?
    
    Existing doctests & unit tests pass


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/holdenk/spark SPARK-15061-upgrade-to-py4j-0.10.1

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13064.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13064
    
----
commit 808b162ed7dc0747f71d4bcdf02f98fdfd0c2677
Author: Holden Karau <ho...@us.ibm.com>
Date:   2016-05-11T21:23:23Z

    Upgrade to py4j 0.10.1

commit 62adff809d96f296d45d3f47976f0f3c9082f10a
Author: Holden Karau <ho...@us.ibm.com>
Date:   2016-05-11T21:55:17Z

    Upgrade to py4j 0.10.1

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15061][PySpark] Upgrade to Py4J 0.10.1

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the pull request:

    https://github.com/apache/spark/pull/13064#issuecomment-218836044
  
    So the release notes since the last version we used are:
    
    0.10.1: 
    > Major performance fix: the Python side is now using default buffering when reading responses from the Java side. This is particularly important if you transfer large parameters (large strings or byte arrays). A simple benchmark found that repeatedly sending 10 MB strings went from 99 seconds to 1 second. Thanks to @kaytwo for finding this bug and suggesting a fix.
    > Both the Java and the Python libraries are now available as OSGi bundles. Thanks to kichwacoders for funding the work.
    > The 0.10.0 jar uploaded to PyPI wrongly required Java 8. The Java compatibility has been restored to 1.6. Thansk to @agronholm for finding this bug.
    > Added the __version__ attribute in the py4j package to conform to PEP396. Thanks to @lessthanoptimal for reporting this bug.
    > tickets closed for 0.10.1 release
    > 
    
    0.10.0:
    
    > Added a new threading model that is more efficient with indirect recursion between Java and Python and that enables users to control which thread will execute calls. Thanks to kichwacoders for funding the implementation and providing the initial idea.
    > Added TLS support to encrypt the communication between both sides. Thanks to @njwhite.
    > Added initial byte stream support so Python can consume Java byte streams more efficiently. Support is still preliminary and subject to change in the future, but it provides a good base to build on. See these Python unit test and Java example class for a small example. Thanks to @njwhite.
    > Java side: converted build script from ant to gradle. Introduced Java coding conventions and static code analysis. See Java Coding Conventions for more details.
    > Java side: it is now possible to build a osgi bundle and an Eclipse update site from Py4J source. See using Py4J with Eclipse
    > tickets closed for 0.10.0 release


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15061][PySpark] Upgrade to Py4J 0.10.1

Posted by nchammas <gi...@git.apache.org>.
Github user nchammas commented on the pull request:

    https://github.com/apache/spark/pull/13064#issuecomment-218846114
  
    Yeah, I was just curious. I'm guessing any improvements will be focused more on the RDD-side of the API rather than on DataFrames.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15061][PySpark] Upgrade to Py4J 0.10.1

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/13064#issuecomment-218976774
  
    Merged to master/2.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15061][WIP][PySpark] Upgrade to Py4J 0....

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13064#issuecomment-218626743
  
    **[Test build #58412 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58412/consoleFull)** for PR 13064 at commit [`62adff8`](https://github.com/apache/spark/commit/62adff809d96f296d45d3f47976f0f3c9082f10a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15061][PySpark] Upgrade to Py4J 0.10.1

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the pull request:

    https://github.com/apache/spark/pull/13064#issuecomment-218845352
  
    @nchammas We could do this - but we haven't bothered in any of the previous upgrades.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15061][PySpark] Upgrade to Py4J 0.10.1

Posted by nchammas <gi...@git.apache.org>.
Github user nchammas commented on the pull request:

    https://github.com/apache/spark/pull/13064#issuecomment-218844473
  
    Exciting! Is there a micro-benchmark we can run that would show any performance improvement from this? (Or perhaps a full spark-perf run would show the difference.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15061][WIP][PySpark] Upgrade to Py4J 0....

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13064#issuecomment-218626888
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15061][WIP][PySpark] Upgrade to Py4J 0....

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13064#issuecomment-218603843
  
    **[Test build #58412 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58412/consoleFull)** for PR 13064 at commit [`62adff8`](https://github.com/apache/spark/commit/62adff809d96f296d45d3f47976f0f3c9082f10a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15061][WIP][PySpark] Upgrade to Py4J 0....

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13064#issuecomment-218626892
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58412/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15061][PySpark] Upgrade to Py4J 0.10.1

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/13064


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15061][PySpark] Upgrade to Py4J 0.10.1

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/13064#issuecomment-218689011
  
    That LGTM, though I don't have much insight into what changed. The change itself is fine, and I suspect we do want to have the latest fixes and improvements for this component.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org