You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "holdenk (JIRA)" <ji...@apache.org> on 2016/10/16 11:57:20 UTC

[jira] [Created] (SPARK-17960) Upgrade to Py4J 0.10.4

holdenk created SPARK-17960:
-------------------------------

             Summary: Upgrade to Py4J 0.10.4
                 Key: SPARK-17960
                 URL: https://issues.apache.org/jira/browse/SPARK-17960
             Project: Spark
          Issue Type: Improvement
          Components: PySpark
            Reporter: holdenk
            Priority: Trivial


In general we should try and keep up to date with Py4J's new releases. The changes in this one are small ( https://github.com/bartdag/py4j/milestone/21?closed=1 ) and shouldn't impact Spark in any significant way so I'm going to tag this as a starter issue for someone looking to get a deeper understanding of how PySpark works.

Upgrading Py4J can be a bit tricky compared to updating other packages in general the steps are:
1) Upgrade the Py4J version on the Java side
2) Update the py4j src zip file we bundle with Spark
3) Make sure everything still works (especially the streaming tests because we do weird things to make streaming work and its the most likely place to break during a Py4J upgrade).

You can see how these bits have been done in past releases by looking in the git log for the last time we changed the Py4J version numbers. Sometimes even for "compatible" releases like this one we may need to make some small code changes in side of PySpark because we hook into Py4Js internals, but I don't think this should be the case here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org