You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by tdas <gi...@git.apache.org> on 2015/02/26 05:22:12 UTC

[GitHub] spark pull request: [SPARK-6027][SPARK-5546] Fixed --jar not worki...

GitHub user tdas opened a pull request:

    https://github.com/apache/spark/pull/4779

    [SPARK-6027][SPARK-5546] Fixed --jar not working for KafkaUtils and improved error message

    The problem with SPARK-6027 in short is that JARs like the kafka-assembly.jar does not work in python as the added JAR is not visible in the classloader used by Py4J. Py4J uses Class.forName(), which does not uses the systemclassloader, but the JARs are only visible in the Thread's contextclassloader. So this back uses the context class loader to create the KafkaUtils dstream object. 
    
    Also improves the error message. This PR assumes that #4754 is going to make --packages work for pyspark (which is what is referred to in the error message as 1. ).
    
    @davies 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tdas/spark kafka-python-fix

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/4779.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4779
    
----
commit 7b88be8e2898068e9dd2fba14a96e377e9ed219c
Author: Tathagata Das <ta...@gmail.com>
Date:   2015-02-26T03:59:29Z

    Fixed --jar not working for KafkaUtils and improved error message

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6027][SPARK-5546] Fixed --jar not worki...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4779#discussion_r25406083
  
    --- Diff: python/pyspark/streaming/kafka.py ---
    @@ -63,20 +63,32 @@ def createStream(ssc, zkQuorum, groupId, topics, kafkaParams={},
             jparam = MapConverter().convert(kafkaParams, ssc.sparkContext._gateway._gateway_client)
    --- End diff --
    
    remove java_import()


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6027][SPARK-5546] Fixed --jar not worki...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/4779#issuecomment-76126254
  
    @jerryshao the previous way required --driver-class-path as well as --jars (or --driver-class-path and --executor-class-path). The existing message was incorrect, @davies and I missed it in the original PR because we probably never tested in distributed mode. Instead, --jars is supposed to be a single solution for adding any jar to both driver and executor. It works for Scala/Java but does not work correctly for Python, hence this workaround. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6027][SPARK-5546] Fixed --jar and --pac...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/4779#issuecomment-76280876
  
    LGTM as discussed with @tdas offline we're gonna change it to `private`. This is going into master and 1.3.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6027][SPARK-5546] Fixed --jar not worki...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4779#discussion_r25447545
  
    --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala ---
    @@ -532,3 +532,31 @@ object KafkaUtils {
         )
       }
     }
    +
    +/**
    + * This is a helper class that wraps the KafkaUtils.createStream() into more
    + * Python-friendly class and function so that it can be easily
    + * instantiated and called from Python's KafkaUtils (see SPARK-6027).
    + *
    + * The zero-arg constructor helps instantiate this class from the Class object
    + * classOf[KafkaUtilsPythonHelper].newInstance(), and the createStream()
    + * takes care of known parameters instead of passing them from Python
    + */
    +private[kafka]
    --- End diff --
    
    does it still work if this is strictly `private`? Just wondering


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6027][SPARK-5546] Fixed --jar not worki...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/4779#issuecomment-76126720
  
    LGTM, the helper class actually simplify things.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6027][SPARK-5546] Fixed --jar not worki...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4779#issuecomment-76120577
  
      [Test build #27980 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27980/consoleFull) for   PR 4779 at commit [`7b88be8`](https://github.com/apache/spark/commit/7b88be8e2898068e9dd2fba14a96e377e9ed219c).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class KafkaUtilsPythonHelper `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6027][SPARK-5546] Fixed --jar not worki...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4779#issuecomment-76126281
  
      [Test build #27984 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27984/consoleFull) for   PR 4779 at commit [`c1fdf35`](https://github.com/apache/spark/commit/c1fdf35a45584ebf9f330339405524f6753eb808).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6027][SPARK-5546] Fixed --jar and --pac...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/4779


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6027][SPARK-5546] Fixed --jar not worki...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4779#issuecomment-76132711
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27984/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6027][SPARK-5546] Fixed --jar not worki...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4779#issuecomment-76132705
  
      [Test build #27984 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27984/consoleFull) for   PR 4779 at commit [`c1fdf35`](https://github.com/apache/spark/commit/c1fdf35a45584ebf9f330339405524f6753eb808).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class KafkaUtilsPythonHelper `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6027][SPARK-5546] Fixed --jar not worki...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4779#issuecomment-76133227
  
      [Test build #27987 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27987/consoleFull) for   PR 4779 at commit [`fb16b04`](https://github.com/apache/spark/commit/fb16b0410e4b078c00382089ad7e6b0227d8d13c).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class KafkaUtilsPythonHelper `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6027][SPARK-5546] Fixed --jar not worki...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4779#discussion_r25406085
  
    --- Diff: python/pyspark/streaming/kafka.py ---
    @@ -63,20 +63,32 @@ def createStream(ssc, zkQuorum, groupId, topics, kafkaParams={},
             jparam = MapConverter().convert(kafkaParams, ssc.sparkContext._gateway._gateway_client)
             jlevel = ssc._sc._getJavaStorageLevel(storageLevel)
     
    -        def getClassByName(name):
    -            return ssc._jvm.org.apache.spark.util.Utils.classForName(name)
    -
             try:
    -            array = getClassByName("[B")
    -            decoder = getClassByName("kafka.serializer.DefaultDecoder")
    -            jstream = ssc._jvm.KafkaUtils.createStream(ssc._jssc, array, array, decoder, decoder,
    -                                                       jparam, jtopics, jlevel)
    -        except Py4JError, e:
    +            helperClass = ssc._jvm.java.lang.Thread.currentThread().getContextClassLoader().loadClass("org.apache.spark.streaming.kafka.KafkaUtilsPythonHelper")
    --- End diff --
    
    too long


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6027][SPARK-5546] Fixed --jar not worki...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4779#issuecomment-76127023
  
      [Test build #27987 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27987/consoleFull) for   PR 4779 at commit [`fb16b04`](https://github.com/apache/spark/commit/fb16b0410e4b078c00382089ad7e6b0227d8d13c).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6027][SPARK-5546] Fixed --jar not worki...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4779#issuecomment-76133237
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27987/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6027][SPARK-5546] Fixed --jar not worki...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4779#issuecomment-76120523
  
      [Test build #27980 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27980/consoleFull) for   PR 4779 at commit [`7b88be8`](https://github.com/apache/spark/commit/7b88be8e2898068e9dd2fba14a96e377e9ed219c).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6027][SPARK-5546] Fixed --jar not worki...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:

    https://github.com/apache/spark/pull/4779#issuecomment-76122914
  
    Hi @tdas , a stupid question, why previous way of `--driver-class-path` is not good, since if we use `--jars`, we have to create every helper function for Python, I'm a little confused.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6027][SPARK-5546] Fixed --jar and --pac...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4779#discussion_r25465873
  
    --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala ---
    @@ -532,3 +532,31 @@ object KafkaUtils {
         )
       }
     }
    +
    +/**
    + * This is a helper class that wraps the KafkaUtils.createStream() into more
    + * Python-friendly class and function so that it can be easily
    + * instantiated and called from Python's KafkaUtils (see SPARK-6027).
    + *
    + * The zero-arg constructor helps instantiate this class from the Class object
    + * classOf[KafkaUtilsPythonHelper].newInstance(), and the createStream()
    + * takes care of known parameters instead of passing them from Python
    + */
    +private[kafka]
    --- End diff --
    
    I am not sure if I will be able to call from python if this is strictly private. I havent tested it though. I dont think that;s a big deal, as long as its hidden from public view


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6027][SPARK-5546] Fixed --jar not worki...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4779#issuecomment-76120578
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27980/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org